This student research team is building Northeastern’s own AI assistant: NUGPT

This student research team is building Northeastern’s own AI assistant: NUGPT

By Madelaine Millar

With almost two thousand instructors and almost fifty thousand enrolled students spread across thirteen global campuses, Northeastern University is a very big school with many interconnected websites. With so much information and so many services nested under Northeastern’s digital umbrella, it can take a minute for a student to find the answers to their unique questions, like whether a certain class will be offered next semester or which professors have taught the course in the past. 

How lucky, then, that the school is also full of enterprising problem-solvers like Harshika Santoshi and her research teammates Panchami Laxminarayan Baleri, Rohan Benjamin Varghese, Shreevidhya Shambanna, and Yadhukrishnan Pankajakshan.

For the past year, Harshika and her teammates at the AI and Data Club have been working on a Northeastern-specific LLM chatbot called NUGPT to help students answer questions about their school quickly, easily, and accurately. After receiving an award for their work at the Northeastern in Silicon Valley student research showcase at the end of 2024, they are working to enlarge the model’s database in the hopes the tool can eventually be integrated into the school’s official website. 

Santoshi and her three co-founders–Rohan, Shreevidhya, and Dharun Suryaa Nagarajan–created the AI and Data Club in the spring of 2024, with the goal of helping students to bridge the gap between the AI and data skills they were learning in the classroom and the application of those tools in real life. To do that, they needed a large-scale, long-term project where their members could put their new knowledge into practice. When Alan Eng – the club’s faculty advisor, and Northeastern in Silicon Valley’s Director of Strategic Partnerships – brought up the idea of an intelligent chatbot designed for Northeastern, Santoshi knew it was just what they needed. 

“Not only Northeastern, but other universities also have sub links and sub links and sub links. A person needs like 10 or 15 minutes to find the right section,” Santoshi said. Not only did she see NUGPT as a way to fix that problem, but the project would also help club members develop industry-relevant skills; “Since ChatGPT has come out, most companies are looking to build something like that for their customers, or for an internal team.” 

Under Eng’s supervision, the AI and Data Club self-organized a team of about ten project managers, back-end engineers, LLM coders, and UX/UI designers and got to work. Almost immediately, they ran into the biggest hurdle of the entire project. 

“The first challenge was getting the data,” explained Santoshi, who is currently pursuing her MS in Data Science. “We scraped the data off the websites that are publicly available, but converting it to text and scraping it was not that reliable. When you convert from a website to text, sometimes something is going to go missing, sometimes extra brackets appear. And some websites don’t allow scraping, so the data was a major challenge.”

But the club’s goal was to learn to apply what they’d learned, so they applied their skills in data hygiene and cleaned and organized what they’d scraped into a usable database. They encountered other problems too – like needing to change their embedding method to return more accurate results – but each time, the team treated NUGPT as a learning opportunity. By the time the student research showcase rolled around, they had a working model returning 85% accuracy on basic Northeastern-related questions. For more complex questions like visa eligibility, the model simply directs querants to the right person to ask. 

At the end of each semester, Northeastern in Silicon Valley hosts a student research showcase, an application-based event where selected teams present posters of their research to fellow students, faculty, alumni, and industry partners. The rest of the team members were also presenting other research, but Harshika wanted to make sure NUGPT made it in front of the community’s eyes, so she decided to present NUGPT instead of one of her personal projects.

Harshika Santoshi accepting an award at the Student Research Showcase on behalf of the NUGPT team

”I thought it deserves visibility; it’s something that could really help,” Santoshi said. ”The people who came around to see my poster thought this was an excellent idea, and very useful for Northeastern. It answers questions like ‘Tell me about [course name]’ or ‘What are the prerequisites for this course.’ Everyone said they would like to see this deployed as a proper chatbot for Northeastern in Silicon Valley.”

In order to make that happen, the team needs to go another round with their old nemesis, data. This time, they’re working with Northeastern’s IT team to figure out if it will be possible to use the university’s databases to train their model further, without compromising security or privacy. Santoshi’s ultimate goal is to roll out NUGPT on the university’s official website. She wants to see her work benefit Northeastern both students across the global network, and her many friends right here in Silicon Valley.

”On the Silicon Valley campus, everyone knows everyone by name. It’s like one big family – faculty, staff, the Student Services team, you bump into people every day,” she said. ”The campus is intimate, because everyone knows everyone else. And that’s a big plus.”

While Santoshi is graduating in May of 2025, her three AI and Data Club co-founders are all planning to continue to develop the chatbot after she leaves. She hopes that NUGPT can be what she and her clubmates leave behind for the community she’s come to love. 

“It would be like something out of my dreams,” she said. “If this works out, incoming batches of students will use this for years to come. It will be something I will remember forever.”