<script type="application/ld+json">
{
 "@context": "https://schema.org",
 "@type": "FAQPage",
 "mainEntity": [{
   "@type": "Question",
   "name": "What is Zipf’s law?",
   "acceptedAnswer": {
     "@type": "Answer",
     "text": "American linguist George Kingsley Zipf noticed a peculiarity about the way we use words in a language. He found that very few words are used regularly, while most words are very rarely used. He then ranked the words according to their popularity and saw a pattern surfacing. The most popular word was used twice as much as the second most popular word and thrice as much as the third most frequently used word."
   }
 },{
   "@type": "Question",
   "name": "What is the formula for Zipf’s law?",
   "acceptedAnswer": {
     "@type": "Answer",
     "text": "According to Zipf’s law,

r x Prob(r) = A"
   }
 },{
   "@type": "Question",
   "name": "How do you verify Zipf’s law?",
   "acceptedAnswer": {
     "@type": "Answer",
     "text": "In order to verify Zipf’s law, we would need to calculate the frequency of every observation in a dataset, rank them all, and calculate r x freq(r), checking whether it is approximately the same for every observation in our dataset. It does not need to be an exact match for every single observation, but it should be a close match for most observations."
   }
 },{
   "@type": "Question",
   "name": "Do all languages follow Zipf’s law?",
   "acceptedAnswer": {
     "@type": "Answer",
     "text": "Not all languages follow Zipf’s law perfectly. But Zipf’s law holds at least approximately in almost all languages, including languages that are extinct and languages that have not been translated as yet."
   }
 }]
}
</script>

Zipf’s law

What is Zipf’s law?

American linguist George Kingsley Zipf noticed a peculiarity about the way we use words in a language. He found that very few words are used regularly, while most words are very rarely used. He then ranked the words according to their popularity and saw a pattern surfacing. The most popular word was used twice as much as the second most popular word and thrice as much as the third most frequently used word.

But he soon realized that this pattern was not limited to words in a language. 

The has been noticed across a wide range of datasets - including neural activity, firm sizes, city sizes, amino acid sequences, etc. and has been named Zipf’s law.

It establishes a relationship between rank order and frequency of occurrence. According to Zipf’s law, when we rank observations by their frequency, the frequency of a specific observation occurring is inversely proportional to its rank.

What is the formula for Zipf’s law?

Let us say that r is the rank of an observation. 

Prob(r) is the probability of the observation at rank r. 

freq(r) is the number of times the observation at rank r appears in the dataset.

N is the total number of observations in a dataset. It is not the number of unique observations.


We know that Prob(r) = freq(r)/N


According to Zipf’s law,

r x Prob(r) = A


A is a constant that is empirically determined from the data. In most situations, A=0.1


Zipf’s law is a statistical law, it holds true for most observations, but not all.


Since Prob(r) = freq(r)/N, Zipf’s law can be rewritten like this:

r x freq(r) = A x N


How do you verify Zipf’s law?

In order to verify Zipf’s law, we would need to calculate the frequency of every observation in a dataset, rank them all, and calculate r x freq(r), checking whether it is approximately the same for every observation in our dataset. It does not need to be an exact match for every single observation, but it should be a close match for most observations.

Keep in mind that Zipf’s law has the highest rate of errors for the most frequent and the least frequent observations. Avoid looking solely at those observations.

The best way to verify Zipf’s law is to plot it on a graph. 

Plot log(r) on the x-axis and log(freq(r)) on the y-axis of the graph. If we see a line with a slope of -1, it means that Zipf’s law holds for this dataset. In this situation, if the line intersects the x-axis at point A and the y-axis at point B, and O is the origin, then OA should be equal to OB.


Do all languages follow Zipf’s law?

Not all languages follow Zipf’s law perfectly. But Zipf’s law holds at least approximately in almost all languages, including languages that are extinct and languages that have not been translated as yet. 

About Engati

Engati powers 45,000+ chatbot & live chat solutions in 50+ languages across the world.

We aim to empower you to create the best customer experiences you could imagine. 

So, are you ready to create unbelievably smooth experiences?

Check us out!

Zipf’s law

October 14, 2020

Table of contents

Key takeawaysCollaboration platforms are essential to the new way of workingEmployees prefer engati over emailEmployees play a growing part in software purchasing decisionsThe future of work is collaborativeMethodology

What is Zipf’s law?

American linguist George Kingsley Zipf noticed a peculiarity about the way we use words in a language. He found that very few words are used regularly, while most words are very rarely used. He then ranked the words according to their popularity and saw a pattern surfacing. The most popular word was used twice as much as the second most popular word and thrice as much as the third most frequently used word.

But he soon realized that this pattern was not limited to words in a language. 

The has been noticed across a wide range of datasets - including neural activity, firm sizes, city sizes, amino acid sequences, etc. and has been named Zipf’s law.

It establishes a relationship between rank order and frequency of occurrence. According to Zipf’s law, when we rank observations by their frequency, the frequency of a specific observation occurring is inversely proportional to its rank.

What is the formula for Zipf’s law?

Let us say that r is the rank of an observation. 

Prob(r) is the probability of the observation at rank r. 

freq(r) is the number of times the observation at rank r appears in the dataset.

N is the total number of observations in a dataset. It is not the number of unique observations.


We know that Prob(r) = freq(r)/N


According to Zipf’s law,

r x Prob(r) = A


A is a constant that is empirically determined from the data. In most situations, A=0.1


Zipf’s law is a statistical law, it holds true for most observations, but not all.


Since Prob(r) = freq(r)/N, Zipf’s law can be rewritten like this:

r x freq(r) = A x N


How do you verify Zipf’s law?

In order to verify Zipf’s law, we would need to calculate the frequency of every observation in a dataset, rank them all, and calculate r x freq(r), checking whether it is approximately the same for every observation in our dataset. It does not need to be an exact match for every single observation, but it should be a close match for most observations.

Keep in mind that Zipf’s law has the highest rate of errors for the most frequent and the least frequent observations. Avoid looking solely at those observations.

The best way to verify Zipf’s law is to plot it on a graph. 

Plot log(r) on the x-axis and log(freq(r)) on the y-axis of the graph. If we see a line with a slope of -1, it means that Zipf’s law holds for this dataset. In this situation, if the line intersects the x-axis at point A and the y-axis at point B, and O is the origin, then OA should be equal to OB.


Do all languages follow Zipf’s law?

Not all languages follow Zipf’s law perfectly. But Zipf’s law holds at least approximately in almost all languages, including languages that are extinct and languages that have not been translated as yet. 

Share

Continue Reading