Richard Yue’s Quest to Advance Machine Translation
The Align student developed a system that’s 36% better at translating acronyms than Google Translate
By Marcelle Santos | Photo by Adam Glanzman for Northeastern
Richard Yue compares his work as a researcher in the fields of natural language processing (NLP) and machine translation to training an enthusiastic young learner.
“You have a very motivated and smart teenager who wants to learn something. And you want to teach that teenager one little task. You’re not trying to teach them a whole subject, like Algebra. One little task. And you’re trying to teach them to do that very well in a short period of time,” he explains.
The teenager, in this case, is a neural network (a computational model inspired by the wiring in our brains); the task is the translation of acronyms.
“You basically say, listen, I want you to tell me what the acronym is for this term. I give you something like cardiorespiratory resuscitation, and I tell you that it’s CPR. I give you five examples of what an acronym looks like, then a list of 20 terms, and say, give me the acronym.”
The process, he says, is “not what people might think it is.” A big part of it is developing good materials for training (a neural network learns from examples; the more, the better), which means amassing large amounts of data and then cleaning and formatting it to make it easier to work with.
The other is giving tasks, feedback, and suggestions for improvement. For example, if the model says the acronym for Federal Bureau of Investigation is FBOI instead of FBI, Richard might tell it to ignore the first letter of a word if it’s a preposition next time.
“And the model says, OK, now I understand, give me another set. You give it 20 different ones. It does it again and this time it gets 100%. It gets all of them right.”
In practice, things are rarely this straightforward. Sometimes, for reasons he can’t immediately figure out, Richard can’t get through to the model or ends up leading it astray.
When that happens, he’s on his own, literally (he’s both the designer and the sole researcher on the project) and figuratively. “You sit and try to get it to work for two hours, and you’re like, why doesn’t it accept [the code]? Why is it giving me this wrong answer? You hit a wall. And then you have to spend another 24 hours training it again.”
He loves what he does.
“It’s so much what I enjoy doing that sometimes when I come home after a long day of studying or class, I think to myself, I could sit and watch TV or I could learn a little bit more about this NLP stuff. Almost every time I ask myself that question, I think, I kinda want to sit and do some NLP,” he says.
A first-year student in Align, Northeastern University’s bridge program for aspiring computer scientists without a background in technology, Richard published a paper on Large Language Models (LLMs), hallucinations, ChatGPT, and responsible AI with his research advisor, Khoury College of Computer Sciences professor Kenneth Church, last September. (It was the journal’s top-read article that month.)
Getting published by Cambridge University Press wasn’t the only milestone in his journey as an NLP researcher and practitioner. He also built a computational model that outperforms Google Translate in acronym translation.
He described the results he obtained after a year of research into machine translation to a tech-minded audience at a recent Silicon Valley campus graduate Open House: “We ended up with a sub 3% validation loss, and an 85.9% accuracy in test, compared with Google’s 50% error rate. That’s a 36% increase.”
“Google is a company with a lot of resources,” he told prospective Align students there. “If we’re able to do 36% better than them on this task, we’re very encouraged by those results.”
The problem with acronym translation
Acronyms (and other short forms, including initialisms and abbreviations), are notoriously hard to translate. When used without the terms they are referencing, they can be extremely hard to decipher. Some are language-specific and become unclear and/or unreadable when translated.
Although human translators struggle with their translation, the task is even harder for machines. “There are a lot of inherent rules to it,” Richard explains.
For example, acronyms are typically formed by the first letter of each word in a phrase, are written in uppercase, and exclude articles, conjunctions, and prepositions. (NASA, which stands for National Aeronautics and Space Administration, is a good example.)
The list of exceptions to those rules is equally long. Take Richard’s example, CPR, an acronym that incorporates both the “C” and the “P” in the word “cardiopulmonary”, and ROC, which preserves the preposition “of” in “Republic of Congo.”
To further complicate things, new acronyms pop up every day, and the same acronym could mean different things depending on the industry. (Sometimes, even within the same field: One researcher found 18 uses for the acronym UA in medicine).
Yet the proper understanding of short forms and their meanings matters more than we think. Case in point: Misread abbreviations on patient charts lead to medication prescribing errors.
But perhaps the main reason for caring about acronyms is that they are so prevalent — a study showed a steady increase in their use in scientific communication over time, especially in the titles and abstracts of scientific papers.
Their presence alone can hinder the dissemination of knowledge — a problem that, according to Richard, could be made worse by subpar automatic translations.
“Think of somebody in the US who wants to do research in medicine. They’re trying to find some disease cure. If there’s a really good translation algorithm out there that translates things correctly, they can take some text in French, get the abstract, and if the translation is good enough, decide to pay a human translator to get the full translation.”
“But if I read it and it’s horrible and I can’t understand what it’s about, then I’m gonna miss out on all these papers out there that might be part of the solution, part of a cure,” he explains.
Domain knowledge
A professional translator, Richard knew that acronym translation was a “major recurring issue” in translation tools before joining the Align Master of Science in Computer Science program.
When he got the chance to do computer science research, he chose to focus on this important, but often overlooked subtask within machine translation. “I wanted to push the field forward by addressing something that nobody had attempted,” he said at the Open House.
To tackle this challenge, he had to “move into the world of NLP” — a field at the intersection of computer science and linguistics — and “program” his domain knowledge via machine learning.
“It’s better than writing a thousand rules out by hand and trying to program that manually,” he told prospective Align students. “Neural nets really are your best friend.”
Before moving to Silicon Valley to study computer science, Richard (originally from Northern Virginia), lived in France. There, he earned a Master’s degree in Translation and worked professionally as a translator between French and English.
During the pandemic, he saw an opportunity to return to the US and apply his expertise in linguistics and translation toward a career in technology, which he had always been interested in.
“I was thinking, how do I keep my language skills and all this linguistics study that I did in translation, while I move to a computer-y type career? And that was sort of a hint at Align.”
He was drawn to the program’s openness to and appreciation for diverse backgrounds. “The idea is that these diverse degrees make for powerful tech careers. It’s true for me and others around me, too.”
At the Open House, he assured prospective Align students that, far from a disadvantage, their diverse perspectives are an asset to the tech industry. “That’s going to maybe help you make the next big break in the field of computer science,” he said.