<script type="application/ld+json">
{
 "@context": "https://schema.org",
 "@type": "FAQPage",
 "mainEntity": [{
   "@type": "Question",
   "name": "What is normalization?",
   "acceptedAnswer": {
     "@type": "Answer",
     "text": "Normalization organizes attributes and relations of a database to ensure that database integrity constraints properly enforce their dependencies. It is accomplished by applying some formal rules either by the process of synthesis (creating a new database design) or decomposition (improving an existing database design)."
   }
 },{
   "@type": "Question",
   "name": "Why we need normalization?",
   "acceptedAnswer": {
     "@type": "Answer",
     "text": "When we normalize text, we attempt to reduce its randomness, bringing it closer to a predefined “standard.” This helps us reduce the amount of different information that the computer has to deal with and improves efficiency. Normalization techniques like stemming and lemmatization are to reduce inflectional forms and sometimes derivationally related forms of a word to a common base form."
   }
 }]
}
</script>

Normalization

What is normalization? 

Database normalization is the process of structuring a database, usually a relational database, by a series of so-called standard forms to reduce data redundancy and improve data integrity. 

Normalization organizes attributes and relations of a database to ensure that database integrity constraints properly enforce their dependencies. It is accomplished by applying some formal rules either by the process of synthesis (creating a new database design) or decomposition (improving an existing database design).

In NLP, normalization is a process that converts a list of words to a more uniform sequence. This is useful in preparing text for later processing. In addition, by transforming the words to a standard format, other operations can work with the data and not have to deal with issues that might compromise the process. For example, converting all words to lowercase will simplify the searching process.

The normalization process can improve text matching. For example, there are several ways that the term "modem router" can be expressed, such as modem and router, modem & router, modem/router, and modem-router. Normalizing these words to the common form makes it easier to supply the correct information to a shopper.

Understand that the normalization process might also compromise an NLP task. For example, converting to lowercase letters can decrease the reliability of searches when the case is important.

Why we need normalization?

When we normalize text, we attempt to reduce its randomness, bringing it closer to a predefined “standard.” This helps us reduce the amount of different information that the computer has to deal with and improves efficiency. Normalization techniques like stemming and lemmatization are to reduce inflectional forms and sometimes derivationally related forms of a word to a common base form.

 

How normalization affects chatbots?

Conversational Normalization is when the chatbot goes through processes to find common spelling or errors that could change the meaning.

There are five major steps involved when creating a chatbot—tokenizing, normalizing, recognizing entities, dependency parsing, and generation—for the chatbot to read, interpret, understand, and formulate and send a response. Let’s take a closer look.

  • Tokenizing: The chatbot starts by chopping up text into pieces (also called ‘tokens’) and removing punctuation.
  • Normalizing: Next, the bot finds common misspellings, slang, or typos in the text and converts these to its “normal” version.
  • Recognizing Entities: Now that the words are all normalized, the chatbot seeks to identify which type of thing is being referred to. For example, it would locate North America as a location, 67% as a percentage, and Google as an organization.
  • Dependency Parsing: For the next step, the bot splits the sentence into nouns, verbs, objects, punctuation, and common phrases.
  • Generation: Finally, the chatbot generates a number of responses using the information determined in all the other steps and selects the most appropriate response to send to the user.

A highly overlooked preprocessing step is text normalization. Text normalization is the process of transforming text into a canonical (standard) form. For example, the word “gooood” and “gud” can be transformed to “good,” its canonical form. Another example is the mapping of near-identical words such as “stopwords,” “stop-words” and “stop words” to just “stopwords.”

Text normalization is essential for noisy texts such as social media comments, text messages, and comments to blog posts where abbreviations, misspellings, and use of out-of-vocabulary words are prevalent. 

Normalization has even been effective for analyzing highly unstructured clinical texts where physicians take notes in non-standard ways. For example, it can be helpful for topic extraction where near-synonyms and spelling differences are expected. Take, for example, topic modeling, topic modeling, topic-modeling, topic-modeling, etc.

Unfortunately, unlike stemming and lemmatization, there isn’t a standard way to normalize texts. It typically depends on the task. For example, normalizing clinical texts would arguably be different from how you normalize SMS text messages.

Some common approaches to text normalization include dictionary mappings (easiest), statistical machine translation (SMT), and spelling-correction-based approaches.

About Engati

Engati powers 45,000+ chatbot & live chat solutions in 50+ languages across the world.

We aim to empower you to create the best customer experiences you could imagine. 

So, are you ready to create unbelievably smooth experiences?

Check us out!

Normalization

October 14, 2020

Table of contents

Key takeawaysCollaboration platforms are essential to the new way of workingEmployees prefer engati over emailEmployees play a growing part in software purchasing decisionsThe future of work is collaborativeMethodology

What is normalization? 

Database normalization is the process of structuring a database, usually a relational database, by a series of so-called standard forms to reduce data redundancy and improve data integrity. 

Normalization organizes attributes and relations of a database to ensure that database integrity constraints properly enforce their dependencies. It is accomplished by applying some formal rules either by the process of synthesis (creating a new database design) or decomposition (improving an existing database design).

In NLP, normalization is a process that converts a list of words to a more uniform sequence. This is useful in preparing text for later processing. In addition, by transforming the words to a standard format, other operations can work with the data and not have to deal with issues that might compromise the process. For example, converting all words to lowercase will simplify the searching process.

The normalization process can improve text matching. For example, there are several ways that the term "modem router" can be expressed, such as modem and router, modem & router, modem/router, and modem-router. Normalizing these words to the common form makes it easier to supply the correct information to a shopper.

Understand that the normalization process might also compromise an NLP task. For example, converting to lowercase letters can decrease the reliability of searches when the case is important.

Why we need normalization?

When we normalize text, we attempt to reduce its randomness, bringing it closer to a predefined “standard.” This helps us reduce the amount of different information that the computer has to deal with and improves efficiency. Normalization techniques like stemming and lemmatization are to reduce inflectional forms and sometimes derivationally related forms of a word to a common base form.

 

How normalization affects chatbots?

Conversational Normalization is when the chatbot goes through processes to find common spelling or errors that could change the meaning.

There are five major steps involved when creating a chatbot—tokenizing, normalizing, recognizing entities, dependency parsing, and generation—for the chatbot to read, interpret, understand, and formulate and send a response. Let’s take a closer look.

  • Tokenizing: The chatbot starts by chopping up text into pieces (also called ‘tokens’) and removing punctuation.
  • Normalizing: Next, the bot finds common misspellings, slang, or typos in the text and converts these to its “normal” version.
  • Recognizing Entities: Now that the words are all normalized, the chatbot seeks to identify which type of thing is being referred to. For example, it would locate North America as a location, 67% as a percentage, and Google as an organization.
  • Dependency Parsing: For the next step, the bot splits the sentence into nouns, verbs, objects, punctuation, and common phrases.
  • Generation: Finally, the chatbot generates a number of responses using the information determined in all the other steps and selects the most appropriate response to send to the user.

A highly overlooked preprocessing step is text normalization. Text normalization is the process of transforming text into a canonical (standard) form. For example, the word “gooood” and “gud” can be transformed to “good,” its canonical form. Another example is the mapping of near-identical words such as “stopwords,” “stop-words” and “stop words” to just “stopwords.”

Text normalization is essential for noisy texts such as social media comments, text messages, and comments to blog posts where abbreviations, misspellings, and use of out-of-vocabulary words are prevalent. 

Normalization has even been effective for analyzing highly unstructured clinical texts where physicians take notes in non-standard ways. For example, it can be helpful for topic extraction where near-synonyms and spelling differences are expected. Take, for example, topic modeling, topic modeling, topic-modeling, topic-modeling, etc.

Unfortunately, unlike stemming and lemmatization, there isn’t a standard way to normalize texts. It typically depends on the task. For example, normalizing clinical texts would arguably be different from how you normalize SMS text messages.

Some common approaches to text normalization include dictionary mappings (easiest), statistical machine translation (SMT), and spelling-correction-based approaches.

Share

Continue Reading