<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "FAQPage",
"mainEntity": [{
"@type": "Question",
"name": "What is statistical language modeling?",
"text": "Statistical Language Modeling, or Language Modeling and LM for short, is the development of probabilistic models that can predict the next word in the sequence given the words that precede it."
}
},{
"@type": "Question",
"name": "What are the types of language models?",
"text": "1. N-Gram.
2. Exponential.
3. Continuous Space."
}
},{
"@type": "Question",
"name": "What are the drawbacks of statistical language models?",
"text": "1. Zero probabilities.
2. Exponential Growth.
3. Generalization."
}
}]
}
</script>

# Statistical Language Modeling

## What is statistical language modeling?

Statistical Language Modeling, or Language Modeling and LM for short, is the development of probabilistic models that can predict the next word in the sequence given the words that precede it.

A statistical language model learns the probability of word occurrence based on examples of text. Simpler models may look at a context of a short sequence of words, whereas larger models may work at the level of sentences or paragraphs. Most commonly, language models operate at the level of words.

## The types of language models

Statistical models include the development of probabilistic models that are able to predict the next word in the sequence, given the words that precede it. A number of statistical language models are in use already. Let’s take a look at some of those popular models:

### 1. N-Gram

This is one of the simplest approaches to language modelling. Here, a probability distribution for a sequence of ‘n’ is created, where ‘n’ can be any number and defines the size of the gram (or sequence of words being assigned a probability). If n=4, a gram may look like: “can you help me”. Basically, ‘n’ is the amount of context that the model is trained to consider. There are different types of N-Gram models such as unigrams, bigrams, trigrams, etc.

### 2. Exponential

This type of statistical model evaluates text by using an equation which is a combination of n-grams and feature functions. Here the features and parameters of the desired results are already specified. The model is based on the principle of entropy, which states that probability distribution with the most entropy is the best choice. Exponential models have fewer statistical assumptions which mean the chances of having accurate results are more.

### 3. Continuous Space

In this type of statistical model, words are arranged as a non-linear combination of weights in a neural network. The process of assigning weight to a word is known as word embedding. This type of model proves helpful in scenarios where the data set of words continues to become large and include unique words.

In cases where the data set is large and consists of rarely used or unique words, linear models such as n-gram do not work. This is because, with increasing words, the possible word sequences increase, and thus the patterns predicting the next word become weaker.

## Building a Simple Statistical Language Model

Language models start with a Markov Assumption. This is a simplifying assumption that the k+1st word is dependent on the previous k words. A 2nd order assumption results in a Bigram model. The models are training using Maximum Likelihood Estimations (MLE) of an existing corpus. The MLE approach then is simply a fraction of work counts.

• They are easy to train on a large corpus
• They work surprisingly well in most tasks!!
• However, they have some disadvantages

## Applications of statistical language modeling

Statistical language models are used to generate text in many similar natural language processing tasks, such as:

1. Speech Recognization

Voice assistants such as Siri and Alexa are examples of how language models help machines in processing speech audio.

1. Machine Translation

Google Translator and Microsoft Translate are examples of how NLP models can help in translating one language to another.

1. Sentiment Analysis

This helps in analyzing sentiments behind a phrase. This use case of NLP models is used in products that allow businesses to understand a customer’s intent behind opinions or attitudes expressed in the text. Hubspot’s Service Hub is an example of how language models can help in sentiment analysis.

1. Text Suggestions

Google services such as Gmail or Google Docs use language models to help users get text suggestions while they compose an email or create long text documents, respectively.

1. Parsing Tools

Parsing involves analyzing sentences or words that comply with syntax or grammar rules. Spell checking tools are perfect examples of language modelling and parsing.

## The drawbacks of statistical language models

### 1. Zero probabilities

If we have a tri-gram language model that conditions of two words and has a vocabulary of 10,000 words. The we have 10¹² triplets. If our training data has 10¹⁰ words, there are many triples that will never be observed in the training data and thus the basic MLE will assign zero probabilities to those events. And a zero-probability translates to infinite perplexity. To overcome this issue many techniques have been developed under the family of Smoothing Techniques. A good overview of these techniques is presented in this paper.

### 2. Exponential Growth

The second challenge is that the number of n-grams grows as an nth exponent of the vocabulary size. A 10,000-word vocabulary would have 10¹² tri-grams and a 100,000 word vocabulary will have 10¹⁵ trigrams.

### 3. Generalization

The last issue with MLE techniques is the lack of generalization. If the model sees the term ‘white horse’ in the training data but does not see ‘black horse’, the MLE will assign zero probability to ‘black horse’. (Thankfully, it will assign zero probability to Purple horse as well)

Engati powers 45,000+ chatbot & live chat solutions in 50+ languages across the world.

We aim to empower you to create the best customer experiences you could imagine.

So, are you ready to create unbelievably smooth experiences?

# Statistical Language Modeling

October 14, 2020

Key takeawaysCollaboration platforms are essential to the new way of workingEmployees prefer engati over emailEmployees play a growing part in software purchasing decisionsThe future of work is collaborativeMethodology

## What is statistical language modeling?

Statistical Language Modeling, or Language Modeling and LM for short, is the development of probabilistic models that can predict the next word in the sequence given the words that precede it.

A statistical language model learns the probability of word occurrence based on examples of text. Simpler models may look at a context of a short sequence of words, whereas larger models may work at the level of sentences or paragraphs. Most commonly, language models operate at the level of words.

## The types of language models

Statistical models include the development of probabilistic models that are able to predict the next word in the sequence, given the words that precede it. A number of statistical language models are in use already. Let’s take a look at some of those popular models:

### 1. N-Gram

This is one of the simplest approaches to language modelling. Here, a probability distribution for a sequence of ‘n’ is created, where ‘n’ can be any number and defines the size of the gram (or sequence of words being assigned a probability). If n=4, a gram may look like: “can you help me”. Basically, ‘n’ is the amount of context that the model is trained to consider. There are different types of N-Gram models such as unigrams, bigrams, trigrams, etc.

### 2. Exponential

This type of statistical model evaluates text by using an equation which is a combination of n-grams and feature functions. Here the features and parameters of the desired results are already specified. The model is based on the principle of entropy, which states that probability distribution with the most entropy is the best choice. Exponential models have fewer statistical assumptions which mean the chances of having accurate results are more.

### 3. Continuous Space

In this type of statistical model, words are arranged as a non-linear combination of weights in a neural network. The process of assigning weight to a word is known as word embedding. This type of model proves helpful in scenarios where the data set of words continues to become large and include unique words.

In cases where the data set is large and consists of rarely used or unique words, linear models such as n-gram do not work. This is because, with increasing words, the possible word sequences increase, and thus the patterns predicting the next word become weaker.

## Building a Simple Statistical Language Model

Language models start with a Markov Assumption. This is a simplifying assumption that the k+1st word is dependent on the previous k words. A 2nd order assumption results in a Bigram model. The models are training using Maximum Likelihood Estimations (MLE) of an existing corpus. The MLE approach then is simply a fraction of work counts.

• They are easy to train on a large corpus
• They work surprisingly well in most tasks!!
• However, they have some disadvantages

## Applications of statistical language modeling

Statistical language models are used to generate text in many similar natural language processing tasks, such as:

1. Speech Recognization

Voice assistants such as Siri and Alexa are examples of how language models help machines in processing speech audio.

1. Machine Translation

Google Translator and Microsoft Translate are examples of how NLP models can help in translating one language to another.

1. Sentiment Analysis

This helps in analyzing sentiments behind a phrase. This use case of NLP models is used in products that allow businesses to understand a customer’s intent behind opinions or attitudes expressed in the text. Hubspot’s Service Hub is an example of how language models can help in sentiment analysis.

1. Text Suggestions

Google services such as Gmail or Google Docs use language models to help users get text suggestions while they compose an email or create long text documents, respectively.

1. Parsing Tools

Parsing involves analyzing sentences or words that comply with syntax or grammar rules. Spell checking tools are perfect examples of language modelling and parsing.

## The drawbacks of statistical language models

### 1. Zero probabilities

If we have a tri-gram language model that conditions of two words and has a vocabulary of 10,000 words. The we have 10¹² triplets. If our training data has 10¹⁰ words, there are many triples that will never be observed in the training data and thus the basic MLE will assign zero probabilities to those events. And a zero-probability translates to infinite perplexity. To overcome this issue many techniques have been developed under the family of Smoothing Techniques. A good overview of these techniques is presented in this paper.

### 2. Exponential Growth

The second challenge is that the number of n-grams grows as an nth exponent of the vocabulary size. A 10,000-word vocabulary would have 10¹² tri-grams and a 100,000 word vocabulary will have 10¹⁵ trigrams.

### 3. Generalization

The last issue with MLE techniques is the lack of generalization. If the model sees the term ‘white horse’ in the training data but does not see ‘black horse’, the MLE will assign zero probability to ‘black horse’. (Thankfully, it will assign zero probability to Purple horse as well)

Share