Tech Corner

What's 'not' in GPT-3 memory?

Anwesh Roy
.
Jul 20
.
2-3 mins

Table of contents

Automate your business at $5/day with Engati

REQUEST A DEMO
What's not in GPT-3 memory

GPT-3 was released for public access almost a year ago. Since then, it has generated a lot of hype and discussion regarding commercial usage of such large language models.

Well, GPT-3 is as good as what it was trained on when it was released a year ago.

So what is in GPT-3’s memory? The following table details out the training data of GPT-3.

Source: Language Models are Few-Shot Learners

The challenges of GPT-3

As can be seen from the above table, GPT-3 has ‘seen’ text data from most of what was available in the internet, Wikipedia and other sources as of when the training data would have been created, probably more than a year ago.

This means it does not have the latest data available on the internet as of today.

This happens to be one of the biggest challenges of large language models. It is quite costly to keep updating these large models on a continual basis to keep them up to date with the latest happenings in the world.

It has been estimated that to train GPT-3, 355 GPU years of clock time was needed and the cost was $4.6m. 

There is no incremental training that can be imparted to such language models to learn from the latest news and other events happening around the world. Even if we undertake some sort of fine tuning of such pre-trained language models it may result in a common problem called ‘catastrophic forgetting’. The resulting fine tuned model will be a slightly modified version of the original pre-trained model and the knowledge that was gained with the pre-trained language model might be lost. This can result in poor model performance for the target tasks.

However Google and other search engines can easily crawl the internet and index the newly created pages to serve the latest data.

Large language models miss out on such an advantage that search engines have.

In the last year, Covid-19 has taken center stage and generated most of the news and other textual content on the internet. A lot of applications need to work on data centered around Covid-19 and vaccinations. So, it is quite likely that we may want to use GPT-3 to get some information about Covid-19.

Demonstrating GPT-3's "lack" of knowledge

1

Covid vaccines

The following text generation prompt was fed to the davinci engine of GPT-3.

engine="davinci",

prompt="The most popular covid-19 vaccines are ",

 

GPT-3’s output was as follows:

<OpenAIObject text_completion id=<****> at 0x7f4f9d40ce30> JSON: {

  "choices": [
    {
      "finish_reason": "stop",

      "index": 0,
 
     "logprobs": null,

      "text": "erythromycin, tetracycline, and penicillin."
    }
  ],

  "created": <****>,

  "id": "<****>",
  "model": "davinci:<****>",

  "object": "text_completion"

}

As can be seen from the above response GPT-3 is not aware of the latest vaccines for Covid-19 and the names of the vaccines it is suggesting are not the correct ones.

2

Total number of vaccines per country

Here we want GPT-3 to generate a SQL query to aggregate the total number of vaccines by country.

engine="davinci",

prompt="Create a SQL request to find total covid-19 vaccination by country:\n\nSELECT",


GPT-3’s output was as follows:

<OpenAIObject text_completion id=<****> at 0x7f4f9d397a70> JSON: {

  "choices": [
    {

      "finish_reason": "stop",

      "index": 0,

      "logprobs": null,

      "text": " country, sum(vaccination) AS total FROM `bigquery-public-data.samples.natality` WHERE vaccination = 'Covid-19' GROUP BY country ORDER BY total DESC"

    }

  ],

  "created": 1626617235,

  "id": "<****>",

  "model": "davinci:<****>",

  "object": "text_completion"

}

Although the above SQL query is functionally correct it is syntactically incorrect. GPT-3 incorrectly assumed there is a column called ‘vaccination’ in a public dataset of BigQuery.

When we run the generated query in BigQuery we get the following error.


Running the query in BigQuery


A simple solution

In Engati, customers can update their FAQs and documents in DocuSense anytime, without any additional cost, to reflect the latest changes made to those documents so that their users can get answers to their queries from the latest data accurately without hitting catastrophic forgetting issues or being served with incorrect information.

Anwesh Roy

Anwesh is the Senior Vice President of Engati. Driven by a passion to deliver value through AI-driven solutions, Anwesh is on a mission to mainstream Natural Language Processing (NLP), Natural Language Understanding (NLU), Natural Language Generation (NLG) and Data Analytics applications.

Close Icon
Request a Demo!
Get started on Engati with the help of a personalised demo.
Thanks for the information.
We will be shortly getting in touch with you.
Oops! something went wrong!
For any query reach out to us on contact@engati.com
Close Icon
Congratulations! Your demo is recorded.

Select an option on how Engati can help you.

I am looking for a conversational AI engagement solution for the web and other channels.

I would like for a conversational AI engagement solution for WhatsApp as the primary channel

I am an e-commerce store with Shopify. I am looking for a conversational AI engagement solution for my business

I am looking to partner with Engati to build conversational AI solutions for other businesses

continue
Finish
Close Icon
You're a step away from building your Al chatbot

How many customers do you expect to engage in a month?

Less Than 2000

2000-5000

More than 5000

Finish
Close Icon
Thanks for the information.

We will be shortly getting in touch with you.

Close Icon

Contact Us

Please fill in your details and we will contact you shortly.

Thanks for the information.
We will be shortly getting in touch with you.
Oops! Looks like there is a problem.
Never mind, drop us a mail at contact@engati.com