Knowledge extraction

Table of contents

Automate your business at $5/day with Engati

knowledge extraction

What is knowledge extraction?

The use of a linguistic representation for expressing knowledge acquired by learning systems is an important issue as regards to user understanding. Under this assumption, and to make sure that these systems will be welcome and used, several techniques have been developed by the artificial intelligence community, under both the symbolic and the connectionist approaches

Knowledge extraction is the creation of knowledge from structured (relational databases, XML) and unstructured (text, documents, images) sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must represent knowledge in a manner that facilitates inferencing. Although it is methodically similar to information extraction (NLP) and ETL (data warehouse), the main criteria is that the extraction result goes beyond the creation of structured information or the transformation into a relational schema. It requires either the reuse of existing formal knowledge (reusing identifiers or ontologies) or the generation of a schema based on the source data.

knowledge extraction
Source: Towards Data Science

Essentially, you could say that knowledge extraction is the process of making use of several sources of data and information in order to build up a cohesive knowledge bank. As part of the process, the extraction will often draw information from a wide range of sources of both structured data as well as unstructured data. When it is successful, the knowledge extraction process brings you solid data that can easily be read and interpreted by a program, enabling the end user to make use of that formal knowledge for whatever purpose they have in mind.

Various sources of data could be used in the process fo knowledge extraction. If we’re looking at structured data sources, knowledge could be extracted from several kinds of  relational databases or some sort of extensible markup language (XML) source. Even unstructured data sources like images, various forms of word processing documents, spreadsheets as well as text captured on programs like notepads could be used as part of the data extraction process.

Basically, if the program being used to manage the knowledge extraction process can read the sources of data, those sources can be utilized as sources that expand the potential for the project that is being advanced through the knowledge extraction and make it possible for the final knowledge produced to be usable.

What are some of the knowledge extraction techniques that you could use?

Knowledge Extraction Techniques
Knowledge Extraction Techniques

1. Knowledge graph completion: link prediction

Translating Embeddings for Modelling Multi-relational Data by Bordes et al. in 2013 is a first attempt of a dedicated method for KG completion. It learns an embedding for the entities and the relations in the same low-dimensional vector space. The objective function is such that it constraints entity e2 to be close to e1 + r. This is done by assigning a higher score to exist triplets than to random triplets obtained with negative sampling. This model is known as TransE and this work is related to the word2vecwork by Mikolov where relations between concepts naturally take the form of translations in the embedding space as seen in the picture here.

2. Triplet extraction from raw text

Triplet extraction can be done in a purely unsupervised way. Usually, the text is first parsed with several tools (such as TreeBank parser, MiniPar or OpenNLP parser) then the texts between entities (as well as the annotations from the parsers) are clustered and finally simplified. While attractive at the first look as no supervision is needed, there are a few drawbacks. 

First, it requires lots of tedious work to hand-craft rules which depend on the parser used. Moreover, the clusters found contain semantically related relations but they do not give us fine-grained implications. Typically, a cluster may contain “ is-capital-of “ and “ is-city-of “ which are semantically closed relations. However, with the unsupervised approach, we will fail to discover that “ is-capital-of “ implies the relation “ is-city-of “ and not the opposite.

3. Schema-based supervised learning

In this case, the available data is a collection of sentences where each sentence is annotated with the triplet extracted from it. This means that raw text aligned with a KG of the text. Two recent papers (both published in 2016) give cutting-edge solutions to this problem.

The End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures article by Miwa and Bansal shows an approach that uses two stacked networks: a Bidirectional LSTM for entity detection (it creates an embedding of the entities) and a Tree-based LSTM for the detection of the relation that links the entities found. The figure below from the original paper shows the architecture used.

What are the applications and uses of knowledge extraction?

Knowledge extraction has various applications. One of the most common of these applications is the capturing of data from an unstructured source and incorporation of it into some type of structured knowledge source. 

Yet another example of the way knowledge extraction can speed up the sharing of formal knowledge without needing to manually enter data that is already available from some other source is the use of knowledge extraction to extract data found in relational databases and utlize it for the purpose of creating new documents, or make use of electronic documents to import data into relational databases. This reuse of existing knowledge in a new format tends to be very helpful in a wide range of scenarios, making it possible to employ that knowledge in ways that may not have otherwise been possible with the existing source. This makes it possible for the user to create sources that are appropriate for a range of different applications instead of just those relevant to the original home of the formal knowledge.

It is also possible to use data extraction in order to use a huge data warehouse and import and export data is a rather easy manner as a way of creating a new source of data that can be used for a particular purpose.

Close Icon
Request a Demo!
Get started on Engati with the help of a personalised demo.
Thanks for the information.
We will be shortly getting in touch with you.
Oops! something went wrong!
For any query reach out to us on
Close Icon
Congratulations! Your demo is recorded.

Select an option on how Engati can help you.

I am looking for a conversational AI engagement solution for the web and other channels.

I would like for a conversational AI engagement solution for WhatsApp as the primary channel

I am an e-commerce store with Shopify. I am looking for a conversational AI engagement solution for my business

I am looking to partner with Engati to build conversational AI solutions for other businesses

Close Icon
You're a step away from building your Al chatbot

How many customers do you expect to engage in a month?

Less Than 2000


More than 5000

Close Icon
Thanks for the information.

We will be shortly getting in touch with you.

Close Icon

Contact Us

Please fill in your details and we will contact you shortly.

Thanks for the information.
We will be shortly getting in touch with you.
Oops! Looks like there is a problem.
Never mind, drop us a mail at