<!-- JSON-LD markup generated by Google Structured Data Markup Helper. --><script type="application/ld+json">{  "@context" : "http://schema.org",  "@type" : "Article",  "name" : "Tools for production deployment of deep learning NLU models",  "author" : {    "@type" : "Person",    "name" : "Anwesh Roy"  },  "image" : "https://global-uploads.webflow.com/5ef788f07804fb7d78a4127a/5f5e20e367a9c103850e59a0_Tools%20for%20production%20deployment%20of%20deep%20learning%20NLU%20models%20.png",  "articleSection" : "Production deployment of an AI model is like deployment of any other software. The following are typical tasks that have to be undertaken to deploy AI models in production",  "articleBody" : "Automated build – of the model and application that wraps the model </LI></OL><OL start=\"2\" role=\"list\"><LI>Packaging </LI></OL><OL start=\"3\" role=\"list\"><LI>Automated deployment in development, QA, staging and production environments </LI></OL><OL start=\"4\" role=\"list\"><LI>Infrastructure setup such as servers, load balancers, health check, and monitoring </LI></OL><OL start=\"5\" role=\"list\"><LI>Performance testing  </LI></OL><OL start=\"6\" role=\"list\"><LI>Scalability testing </LI></OL><OL start=\"7\" role=\"list\"><LI>Reliability testing </LI></OL><OL start=\"8\" role=\"list\"><LI>High Availability setup",  "url" : "http://www.engati.com/blog/tools-for-production-deployment-of-deep-learning-nlu-models",  "publisher" : {    "@type" : "Organization",    "name" : "Engati"  }}</script>

Conversational Automation

Tools for production deployment of deep learning NLU models

Anwesh Roy
.
Sep 13
.

Table of contents

Key takeawaysCollaboration platforms are essential to the new way of workingEmployees prefer engati over emailEmployees play a growing part in software purchasing decisionsThe future of work is collaborativeMethodology

GPT-3 has been in the news lately.  This huge NLU model can write stories, create news, complete sentences, search semantically, summarise text and solve many higher cognitive tasks. Unfortunately, the GPT-3 model is closed source.  

Over the last 2 years many such transformer-based models have been released in open source, such as Google’s BERT. These are large sized language models that can perform tasks such as semantic search, question answering, text classification and summarisation. This opens up tremendous opportunities to automate such tasks and perform them at high speed.

The two main challenges that are faced in taking such language models and solving a language related problem are: a) Fine tuning the model on an appropriate dataset pertaining to the downstream task

  1. Fine tuning the model on an appropriate dataset pertaining to the downstream task
  2.  Deploying and running the model in a production environment.

Deploying and running the model in a production environment.

In this article we will cover ways of overcoming the second challenge.

Production deployment of an AI model is like deployment of any other software. The following are typical tasks that have to be undertaken to deploy AI models in production:

  1. Automated build – of the model and application that wraps the model
  1. Packaging
  1. Automated deployment in development, QA, staging and production environments
  1. Infrastructure setup such as servers, load balancers, health check, and monitoring
  1. Performance testing  
  1. Scalability testing
  1. Reliability testing
  1. High Availability setup

These tasks are part of the CI/CD process that is needed for development and deployment of any software.

It is often wrongly assumed that developing an AI model is the holy grail of a career in this area. Nothing can be further from the truth.  

Most organizations will look for skills that encompass the skills required to perform the above tasks. These will be core software engineering and AI/ML Ops skills in addition to knowledge of developing AI/ML models.

Let’s take the example of deploying a BERT based model and outline the tools required for the above tasks. Automated builds can be easily set up with a tool such as Jenkins. Triggers can be set up to build the application and/or the NLU model.

Tensorflow Serving Docker container is a great mechanism to package the NLU model based on BERT and make it available for inferences. Various size variants of BERT models exist, ranging from large sized to small sized models. Hugging face provides a wide variety of ready-made NLU models based on BERT.

By using a Docker container for the model, deployment and scalability is a piece of cake by using Docker Compose. Kubernetes for docker is another great tool. These tools also make it possible to set up active/active or active/passive high availability setup for the NLU model.

Using docker based containers makes it possible to easily setup development, QA, staging and production environments.

Depending on the performance and load requirement either CPU or GPU servers can be chosen to host the Tensorflow serving containers. Explore the power of using multi-core CPU machines for inferencing to keep cost per inference under control. By using smaller NLU models based on Distilbert, it is possible to serve 1+ billion daily inference requests on CPUs.  

Performance, scalability and reliability (PSR) is of utmost importance as part of the journey to deploy a NLU model in production, especially if the NLU model has to provide inferences in real time e.g. like in a Conversional AI platform like Engati.  

Tests for performance based on expected concurrent load, scalability based on load patterns and reliability based on SLA requirement have to be designed and executed is staging environment with the same configuration as production in order to find possible failure points and fix them. JMeter is a widely used tool for performing such tests.

Deploying a NLU model in production is non-trivial and needs expertise in many areas and tools in order to be successful. Many AI/ML models fail by not paying attention to these tasks and purely focusing on developing the AI/ML model.

The future of Conversational AI is going to revolutionize the way we do business. Become a part of the action by registering with the best Conversational AI platform.

Get started with Engati today!

Share
Share
Anwesh Roy

Andy is the Co-Founder and CIO of SwissCognitive - The Global AI Hub. He’s also the President of the Swiss IT Leadership Forum.

Andy is a digital enterprise leader and is transforming business strategies keeping the best interests of shareholders, customers, and employees in mind.

Follow him for your daily dose of AI news and thoughts on using AI to improve your business.

Catch our interview with Andy on AI in daily life

Continue Reading

Tools for production deployment of deep learning NLU models

Anwesh Roy
|
3
min read

GPT-3 has been in the news lately.  This huge NLU model can write stories, create news, complete sentences, search semantically, summarise text and solve many higher cognitive tasks. Unfortunately, the GPT-3 model is closed source.  

Over the last 2 years many such transformer-based models have been released in open source, such as Google’s BERT. These are large sized language models that can perform tasks such as semantic search, question answering, text classification and summarisation. This opens up tremendous opportunities to automate such tasks and perform them at high speed.

The two main challenges that are faced in taking such language models and solving a language related problem are: a) Fine tuning the model on an appropriate dataset pertaining to the downstream task

  1. Fine tuning the model on an appropriate dataset pertaining to the downstream task
  2.  Deploying and running the model in a production environment.

Deploying and running the model in a production environment.

In this article we will cover ways of overcoming the second challenge.

Production deployment of an AI model is like deployment of any other software. The following are typical tasks that have to be undertaken to deploy AI models in production:

  1. Automated build – of the model and application that wraps the model
  1. Packaging
  1. Automated deployment in development, QA, staging and production environments
  1. Infrastructure setup such as servers, load balancers, health check, and monitoring
  1. Performance testing  
  1. Scalability testing
  1. Reliability testing
  1. High Availability setup

These tasks are part of the CI/CD process that is needed for development and deployment of any software.

It is often wrongly assumed that developing an AI model is the holy grail of a career in this area. Nothing can be further from the truth.  

Most organizations will look for skills that encompass the skills required to perform the above tasks. These will be core software engineering and AI/ML Ops skills in addition to knowledge of developing AI/ML models.

Let’s take the example of deploying a BERT based model and outline the tools required for the above tasks. Automated builds can be easily set up with a tool such as Jenkins. Triggers can be set up to build the application and/or the NLU model.

Tensorflow Serving Docker container is a great mechanism to package the NLU model based on BERT and make it available for inferences. Various size variants of BERT models exist, ranging from large sized to small sized models. Hugging face provides a wide variety of ready-made NLU models based on BERT.

By using a Docker container for the model, deployment and scalability is a piece of cake by using Docker Compose. Kubernetes for docker is another great tool. These tools also make it possible to set up active/active or active/passive high availability setup for the NLU model.

Using docker based containers makes it possible to easily setup development, QA, staging and production environments.

Depending on the performance and load requirement either CPU or GPU servers can be chosen to host the Tensorflow serving containers. Explore the power of using multi-core CPU machines for inferencing to keep cost per inference under control. By using smaller NLU models based on Distilbert, it is possible to serve 1+ billion daily inference requests on CPUs.  

Performance, scalability and reliability (PSR) is of utmost importance as part of the journey to deploy a NLU model in production, especially if the NLU model has to provide inferences in real time e.g. like in a Conversional AI platform like Engati.  

Tests for performance based on expected concurrent load, scalability based on load patterns and reliability based on SLA requirement have to be designed and executed is staging environment with the same configuration as production in order to find possible failure points and fix them. JMeter is a widely used tool for performing such tests.

Deploying a NLU model in production is non-trivial and needs expertise in many areas and tools in order to be successful. Many AI/ML models fail by not paying attention to these tasks and purely focusing on developing the AI/ML model.

The future of Conversational AI is going to revolutionize the way we do business. Become a part of the action by registering with the best Conversational AI platform.

Get started with Engati today!

Tags
No items found.
About Engati

Engati powers 45,000+ chatbot & live chat solutions in 50+ languages across the world.

We aim to empower you to create the best customer experiences you could imagine. 

So, are you ready to create unbelievably smooth experiences?

Check us out!