<!-- JSON-LD markup generated by Google Structured Data Markup Helper. --><script type="application/ld+json">{  "@context" : "http://schema.org",  "@type" : "Article",  "name" : "Tools for production deployment of deep learning NLU models",  "author" : {    "@type" : "Person",    "name" : "Anwesh Roy"  },  "image" : "https://global-uploads.webflow.com/5ef788f07804fb7d78a4127a/5f5e20e367a9c103850e59a0_Tools%20for%20production%20deployment%20of%20deep%20learning%20NLU%20models%20.png",  "articleSection" : "Production deployment of an AI model is like deployment of any other software. The following are typical tasks that have to be undertaken to deploy AI models in production",  "articleBody" : "Automated build – of the model and application that wraps the model </LI></OL><OL start=\"2\" role=\"list\"><LI>Packaging </LI></OL><OL start=\"3\" role=\"list\"><LI>Automated deployment in development, QA, staging and production environments </LI></OL><OL start=\"4\" role=\"list\"><LI>Infrastructure setup such as servers, load balancers, health check, and monitoring </LI></OL><OL start=\"5\" role=\"list\"><LI>Performance testing  </LI></OL><OL start=\"6\" role=\"list\"><LI>Scalability testing </LI></OL><OL start=\"7\" role=\"list\"><LI>Reliability testing </LI></OL><OL start=\"8\" role=\"list\"><LI>High Availability setup",  "url" : "http://www.engati.com/blog/tools-for-production-deployment-of-deep-learning-nlu-models",  "publisher" : {    "@type" : "Organization",    "name" : "Engati"  }}</script>

Tools for production deployment of deep learning NLU models

Anwesh Roy
|
3
min read
Tools for production deployment of deep learning NLU models

GPT-3 has been in the news lately.  This huge NLU model can write stories, create news, complete sentences, search semantically, summarize text and solve many higher cognitive tasks. Unfortunately, GPT-3 model is closed source.  

Over the last 2 years many such transformer-based models have been released in open source, such has Google’s BERT. These are large sized language models that can perform tasks such as semantic search, question answering, text classification and summarization. This opens up tremendous opportunity to automate such tasks and perform them at high speed.

The two main challenges that are faced in taking such language models and solving a language related problem are:

a) Fine tuning the model on an appropriate dataset pertaining to the downstream task

b) Deploying and running the model in a production environment.

In this article we will cover ways of overcoming the second challenge.

Production deployment of an AI model is like deployment of any other software. The following are typical tasks that have to be undertaken to deploy AI models in production:

  1. Automated build – of the model and application that wraps the model
  1. Packaging
  1. Automated deployment in development, QA, staging and production environments
  1. Infrastructure setup such as servers, load balancers, health check, and monitoring
  1. Performance testing  
  1. Scalability testing
  1. Reliability testing
  1. High Availability setup

These tasks are part of the CI/CD process that is needed for development and deployment of any software.

It is often wrongly assumed that developing an AI model is the holy grail of a career in this area. Nothing can be further from the truth.  

Most organizations will look for skills that encompass the skills required to perform the above tasks. These will be core software engineering and AI/ML Ops skills in addition to knowledge of developing AI/ML models.

Let’s take the example of deploying a BERT based model and outline the tools required for the above tasks. Automated build can be easily setup with a tool such as Jenkins. Triggers can be setup to build the application and/or the NLU model.

Page Break

Tensorflow Serving Docker container is a great mechanism to package the NLU model based on BERT and make it available for inferences. Various size variants of BERT models exist, ranging from large sized to small sized models. Huggingface provides a wide variety of ready-made NLU models based on BERT.

By using a Docker container for the model, deployment and scalability is a piece of cake by using Docker Compose. Kubernetes for docker is another great tool. These tools also make it possible to setup active/active or active/passive high availability setup for the NLU model.

Using docker based containers makes it possible to easily setup development, QA, staging and production environments.

Depending on the performance and load requirement either CPU or GPU servers can be chosen to host the Tensorflow serving containers. Explore the power of using multi-core CPU machines for inferencing to keep cost per inference under control. By using smaller NLU models based on Distilbert, it is possible to serve 1+ billion daily inference requests on CPUs.  

Performance, scalability and reliability (PSR) is of utmost importance as part of the journey to deploy a NLU model in production, especially if the NLU model has to provide inferences in real time e.g. like in a Conversional AI platform like Engati.  

Tests for performance based on expected concurrent load, scalability based on load patterns and reliability based on SLA requirement have to be designed and executed in staging environment with same configuration as production in order to find possible failure points and fix them. JMeter is a widely used tool for performing such tests.

Deploying a NLU model in production is non-trivial and needs expertise in many areas and tools in order to be successful. Many AI/ML models fail by not paying attention to these tasks and purely focusing on developing the AI/ML model.

Tags
No items found.
About Engati

Engati is a one-stop platform for delighted customers. With our intelligent bots, we help you create the smoothest of Customer Experiences. And now, we're helping you find those customers too. The award-winning Marketing Automation platform, LeadMi, received some major upgrades and joined our family as Engati Acquire. So, let's get started?

Get Started Free