Tools for production deployment of deep learning NLU models
GPT-3 has been in the news lately. This huge NLU model can write stories, create news, complete sentences, search semantically, summarise text and solve many higher cognitive tasks. Unfortunately, the GPT-3 model is closed source.
Over the last 2 years many such transformer-based models have been released in open source, such as Google’s BERT. These are large sized language models that can perform tasks such as semantic search, question answering, text classification and summarisation. This opens up tremendous opportunities to automate such tasks and perform them at high speed.
The two main challenges that are faced in taking such language models and solving a language related problem are: a) Fine tuning the model on an appropriate dataset pertaining to the downstream task
- Fine tuning the model on an appropriate dataset pertaining to the downstream task
- Deploying and running the model in a production environment.
Deploying and running the model in a production environment.
In this article we will cover ways of overcoming the second challenge.
Production deployment of an AI model is like deployment of any other software. The following are typical tasks that have to be undertaken to deploy AI models in production:
- Automated build – of the model and application that wraps the model
- Automated deployment in development, QA, staging and production environments
- Infrastructure setup such as servers, load balancers, health check, and monitoring
- Performance testing
- Scalability testing
- Reliability testing
- High Availability setup
These tasks are part of the CI/CD process that is needed for development and deployment of any software.
It is often wrongly assumed that developing an AI model is the holy grail of a career in this area. Nothing can be further from the truth.
Most organizations will look for skills that encompass the skills required to perform the above tasks. These will be core software engineering and AI/ML Ops skills in addition to knowledge of developing AI/ML models.
Let’s take the example of deploying a BERT based model and outline the tools required for the above tasks. Automated builds can be easily set up with a tool such as Jenkins. Triggers can be set up to build the application and/or the NLU model.
Tensorflow Serving Docker container is a great mechanism to package the NLU model based on BERT and make it available for inferences. Various size variants of BERT models exist, ranging from large sized to small sized models. Hugging face provides a wide variety of ready-made NLU models based on BERT.
By using a Docker container for the model, deployment and scalability is a piece of cake by using Docker Compose. Kubernetes for docker is another great tool. These tools also make it possible to set up active/active or active/passive high availability setup for the NLU model.
Using docker based containers makes it possible to easily setup development, QA, staging and production environments.
Depending on the performance and load requirement either CPU or GPU servers can be chosen to host the Tensorflow serving containers. Explore the power of using multi-core CPU machines for inferencing to keep cost per inference under control. By using smaller NLU models based on Distilbert, it is possible to serve 1+ billion daily inference requests on CPUs.
Performance, scalability and reliability (PSR) is of utmost importance as part of the journey to deploy a NLU model in production, especially if the NLU model has to provide inferences in real time e.g. like in a Conversional AI platform like Engati.
Tests for performance based on expected concurrent load, scalability based on load patterns and reliability based on SLA requirement have to be designed and executed is staging environment with the same configuration as production in order to find possible failure points and fix them. JMeter is a widely used tool for performing such tests.
Deploying a NLU model in production is non-trivial and needs expertise in many areas and tools in order to be successful. Many AI/ML models fail by not paying attention to these tasks and purely focusing on developing the AI/ML model.
The future of Conversational AI is going to revolutionize the way we do business. Become a part of the action by registering with the best Conversational AI platform.
Get started with Engati today!
Engage and retain your customers using Engati. Try it for free!Set it up in 7 mins!
Engati powers 45,000+ chatbot & live chat solutions in 50+ languages across the world.
We aim to empower you to create the best customer experiences you could imagine.
So, are you ready to create unbelievably smooth experiences?