Thomas Wolf, the Co-Founder & Chief Scientist from Hugging Face, joins us on Engati CX to discuss the challenges of computational linguistics and how to overcome them. Hugging face is all about democratizing NLP, one commit at a time. Thomas leads the science team, with a focus on addressing the most complex technical situations around deploying robust NLP applications.
Computational linguistics: the challenges — an interview with Thomas Wolf
Some highlights from our conversation
Can you tell us something about Hugging Face, NLPL and its objectives?
Hugging Face is a community driven initiative that strives to make NLP more accessible to applications & developers in this fast changing space. NLPL, or the Nordic Language Processing Laboratory, is a group of research labs in Northern Europe, across Denmark, Finland, and Norway.
It’s a winter school where grad students, professors, and masters students can follow a week of intensive classes on NLP, and transfer learning, i.e., how to use pre-trained language models.
It’s a great place for students to explore Hugging Face’s models. There are also over a thousand translation models developed by Helsinki-NLP, which is a part of the NLPL group.
These models translate better than Google Translate. So if you’re interested in translating rarer or low-resource languages, we urge you to check out the Helsinki-NLP models on the Hugging Face hub.
In March (2020) you gave a talk on computational linguistics. What do you think are the major challenges of computational linguistics and how can we overcome them?
There are a lot of challenges in computational linguistics right now. Current state-of-the-art models are huge and hard to use in applications. And as the number of parameters become larger, the cost increases as well.
We wanted to tackle the question of efficiency at Hugging Face - developing distilled and smaller models seems like the way to go, but it raises more questions.
One of the first questions that comes up often regards robustness. While the model is trained with public datasets, they often fail in real-life applications. It’s not like how humans fail, it’s more unexpected.
Leaders and others deploying don’t expect the machine to fail, firstly. Secondly, when you assess why the machine has failed, it can become frustrating. Take typos, for instance. Models are extremely sensitive to typos which is a concern because real-life has typos all the time.
We can refer to training unbalanced datasets or a dataset that has easy heuristics, as spuriousness.
If a dataset isn’t designed carefully enough, it can’t provide solutions for real-life problems. Because they’ve been trained to learn heuristics and shortcuts, it’ll find the easy way out because it’s easier to learn. For example, the thought process of the machine will be as follows:
“Oh, there is a moderated class so I should always predict this moderated class.”
This is a clear example of lazy programming. And it’s something we have to avoid.
The lack of common sense
And now there are more general problems such as the lack of common sense. Take GPT-3 as an example.
GPT-3 can say a lot of silly things, but it’s because they’ve only been trained in text. There are a lot of things we don’t say in text, so there are many things GPT-3 doesn’t have access to.
Actually, there was an article by the MIT Tech Review that we contributed to overcoming this. In the beginning of the article, we asked GPT-3, “what colour is a sheep?” And GPT-3 wasn’t sure. It hesitated between black and white because of how often we use “black sheep” as an idiomatic term in English.
We normally see white sheep, but black sheep is rare which is why it has earned this significance in our language. This concept is too obvious to say, so we don’t write it out for GPT-3. But because it’s so significant in our language, GPT-3 is unsure and hesitates between white and black.
The last problem is in regards to continual learning. In academia, when building these models, usually one would train these models once and for all. BERT was built and then trained once, for example, and so was GPT-3.
Because it was trained once, and only recently, GPT-3 knows nothing about COVID-19 - which is somewhat tragic, because of its impact on our world and on businesses, as we speak. Adding new knowledge pieces to models is a challenge called continual learning. It’s an area where researchers are trying to find ways to make this model evolve by adding new knowledge.
While these models work a lot better than the previous generation of text processing models, like the NLP model, there are still a lot of challenges. Hugging Face is tackling some of these challenges. While there are many possibilities to overcome these challenges, there’s no clear solution right now. But it is possible if we can keep chipping away at the problem.
Do you think it’s possible to do some continual learning at a smaller scale?
It is possible, but it’s still in development at Hugging Face. Our idea is to use models with retrieval components.
These types of models have two parts - one being the classical neural network and the other part being a database that’s generated by the neural network. It’s like having the training dataset inside the model, but smarter.
The model processes the training dataset so that it can be queried. So each time the model will query the closest piece of information in the database, and the rest are labelled as additional information.
In addition, the database can be updated. Take Wikipedia for example, you can update and process the newest version of Wikipedia with updated information so that the model can handle new data.
Now, there's no need to retrain the full model, researchers and enterprises can just update the parts they need.
We’re not sure if this will be the ultimate solution but we believe this hybrid retrieval will be efficient and an interesting step in the right direction.
The model at Hugging Face is called RAG, which stands for Retrieval-Augmented Generation. There are a couple of models out there - the first being REAL by Google, followed by DPR by Facebook. These models are primarily developed by Google and Facebook.
Are there any new developments in NLP? What’s the future of NLP looking like?
There are many new developments, according to Thomas. One that he mentioned earlier was the retrieval model. Research is currently focused on trying to make current models more efficient.
There are a lot of new architectural variants of transformers. Many are more efficient transformers going out today that process sentences faster and more effectively.
Another interesting area is the emergence of multi-model models. The Hugging Face library has started to develop models that process both text and image as a way to add common sense to the model.
Adding images enhances the model’s understanding of the world. You can compare this to how humans work. Humans use lots of modality to access the world and to understand what’s happening with images, sound, and touch.
Natural Language and text are not the only methods of understanding the world. As soon as we acknowledge this, the world will move forward. But the future's looking bright.
Currently, NLP is the hottest field in AI and we hope that all AI can come together under the NLP umbrella. NLP is a way to store and process knowledge but language is how humans store and express knowledge. Researchers need to connect that to AI, and then they need to connect that to images, and our other senses.
How does this tie into voice assistance? Is the system still able to recognize that?
Voice is very tricky. But even for humans, it’s a challenge. Take sarcasm, for example. How do humans successfully understand if someone is actually being sarcastic or not? It’s a guess. So, there’s still a long way to go.
How can organizations get started on their journey towards adopting AI and NLP?
There are a lot of ways to get started, but organizations can assess where they want to start. Ask yourself if you want to build the technology in house. If not, Hugging Face is a good place to start. The barrier of entry is a little high, but there are developments on trying to make it more accessible through forums, tutorials, and videos.
Hugging Face was originally built for equipping researchers in NLP with the tools they needed to succeed. But as they gain popularity, more and more people are becoming interested. So now the focus is on making the technology more accessible.
A simple step towards adopting NLP is by using a BERT model to process text, but higher business tasks involve a higher step and an understanding of the technology. Once organizations have understood BERT, they can take the steps to go beyond with rapidly evolving & available technology & resources.
Any final thoughts you’d like to leave our audience with?
For a long time NLP was in the realm of “If, then else,” much like programming and hard engineering concepts. But now we’re seeing that AI can work in NLP, and that the data-driven approach is starting to work. And as these models process texts more efficiently, we’ll see more uses of these models in the mainstream.