Tech Corner

What's 'not' in GPT-3 memory?

Anwesh Roy
.
Jul 19
.
2-3 mins

Table of contents

Key takeawaysCollaboration platforms are essential to the new way of workingEmployees prefer engati over emailEmployees play a growing part in software purchasing decisionsThe future of work is collaborativeMethodology

GPT-3 was released for public access almost a year ago. Since then, it has generated a lot of hype and discussion regarding commercial usage of such large language models.

Well, GPT-3 is as good as what it was trained on when it was released a year ago.

So what is in GPT-3’s memory? The following table details out the training data of GPT-3.

Source: Language Models are Few-Shot Learners

The challenges of GPT-3

As can be seen from the above table, GPT-3 has ‘seen’ text data from most of what was available in the internet, Wikipedia and other sources as of when the training data would have been created, probably more than a year ago.

This means it does not have the latest data available on the internet as of today.

This happens to be one of the biggest challenges of large language models. It is quite costly to keep updating these large models on a continual basis to keep them up to date with the latest happenings in the world.

It has been estimated that to train GPT-3, 355 GPU years of clock time was needed and the cost was $4.6m. 

There is no incremental training that can be imparted to such language models to learn from the latest news and other events happening around the world. Even if we undertake some sort of fine tuning of such pre-trained language models it may result in a common problem called ‘catastrophic forgetting’. The resulting fine tuned model will be a slightly modified version of the original pre-trained model and the knowledge that was gained with the pre-trained language model might be lost. This can result in poor model performance for the target tasks.

However Google and other search engines can easily crawl the internet and index the newly created pages to serve the latest data.

Large language models miss out on such an advantage that search engines have.

In the last year, Covid-19 has taken center stage and generated most of the news and other textual content on the internet. A lot of applications need to work on data centered around Covid-19 and vaccinations. So, it is quite likely that we may want to use GPT-3 to get some information about Covid-19.

Demonstrating GPT-3's "lack" of knowledge

1

Covid vaccines

The following text generation prompt was fed to the davinci engine of GPT-3.

engine="davinci",

prompt="The most popular covid-19 vaccines are ",

 

GPT-3’s output was as follows:

<OpenAIObject text_completion id=<****> at 0x7f4f9d40ce30> JSON: {

  "choices": [
    {
      "finish_reason": "stop",

      "index": 0,
 
     "logprobs": null,

      "text": "erythromycin, tetracycline, and penicillin."
    }
  ],

  "created": <****>,

  "id": "<****>",
  "model": "davinci:<****>",

  "object": "text_completion"

}

As can be seen from the above response GPT-3 is not aware of the latest vaccines for Covid-19 and the names of the vaccines it is suggesting are not the correct ones.

2

Total number of vaccines per country

Here we want GPT-3 to generate a SQL query to aggregate the total number of vaccines by country.

engine="davinci",

prompt="Create a SQL request to find total covid-19 vaccination by country:\n\nSELECT",


GPT-3’s output was as follows:

<OpenAIObject text_completion id=<****> at 0x7f4f9d397a70> JSON: {

  "choices": [
    {

      "finish_reason": "stop",

      "index": 0,

      "logprobs": null,

      "text": " country, sum(vaccination) AS total FROM `bigquery-public-data.samples.natality` WHERE vaccination = 'Covid-19' GROUP BY country ORDER BY total DESC"

    }

  ],

  "created": 1626617235,

  "id": "<****>",

  "model": "davinci:<****>",

  "object": "text_completion"

}

Although the above SQL query is functionally correct it is syntactically incorrect. GPT-3 incorrectly assumed there is a column called ‘vaccination’ in a public dataset of BigQuery.

When we run the generated query in BigQuery we get the following error.


Running the query in BigQuery


A simple solution

In Engati, customers can update their FAQs and documents in DocuSense anytime, without any additional cost, to reflect the latest changes made to those documents so that their users can get answers to their queries from the latest data accurately without hitting catastrophic forgetting issues or being served with incorrect information.

Share
Share

Anwesh Roy

Anwesh is the Senior Vice President of Engati. Driven by a passion to deliver value through AI-driven solutions, Anwesh is on a mission to mainstream Natural Language Processing (NLP), Natural Language Understanding (NLU), Natural Language Generation (NLG) and Data Analytics applications.

Andy is the Co-Founder and CIO of SwissCognitive - The Global AI Hub. He’s also the President of the Swiss IT Leadership Forum.

Andy is a digital enterprise leader and is transforming business strategies keeping the best interests of shareholders, customers, and employees in mind.

Follow him for your daily dose of AI news and thoughts on using AI to improve your business.

Catch our interview with Andy on AI in daily life

Continue Reading