What is information retrieval?
Information retrieval (IR) is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other content-based indexing. Information retrieval is the science of searching for information in a document, searching for documents themselves, and also searching for the metadata that describes data, and for databases of texts, images or sounds.
Automated information retrieval systems are used to reduce what has been called information overload. An Information Retrieval system is a software system that provides access to books, journals, and other documents; stores and manages those documents. Web search engines are the most visible IR applications.
What is information retrieval example?
Librarians, professional searchers, etc., engage themselves in the activity of information retrieval but nowadays hundreds of millions of people engage in IR every day when they use web search engines. Information Retrieval is believed to be the dominant form of Information access. The IR system assists the users in finding the information they require but it does not explicitly return the answers to the question. It notifies regarding the existence and location of documents that might consist of the required information. Information retrieval also extends support to users in browsing or filtering document collection or processing a set of retrieved documents. The system searches over billions of documents stored on millions of computers. A spam filter, manual or automatic means are provided by the Email program for classifying the mails so that they can be placed directly into particular folders.
An IR system has the ability to represent, store, organize, and access information items. A set of keywords are required to search. Keywords are what people are searching for in search engines. These keywords summarize the description of the information.
What are the types of information retrieval?
Methods/Techniques in which information retrieval techniques are employed include:
- Adversarial information retrieval
- Automatic summarization
- Multi-document summarization
- Compound term processing
- Cross-lingual retrieval
- Document classification
- Spam filtering
- Question answering
What is information retrieval used for?
The aim of information retrieval is to provide the user with the “best possible'' information from a database. The problem of information retrieval is determining what constitutes the best possible information for a given user query. A common form of interaction for information retrieval is for the user query. These are then used by the information retrieval system to identify information that meets the user’s needs. For example, in a bibliographic database, a user might be interested in finding a thesis on some topic. The keywords extracted from the query would be an attempt to delineate that topic and then used to improve precision (ensuring that a significant proportion of the items retrieved are relevant to the user) and recall (ensuring that a significant proportion of the relevant items are retrieved).
Modern IR systems accept free-format natural language queries from users. A query is said to represent the “information need” of the user. Given a large collection of documents, a small subset containing one or more keywords from the query statement is retrieved by the IR system. The IR system usually employs some method to “predict” the relevance of a document. Documents retrieved are ranked in decreasing order of their predicted relevance.
Given a user query, a good information retrieval system would rank most of the relevant documents ahead of less relevant documents, thereby allowing the user to peruse relevant documents without having to wade through many irrelevant documents.
What are the three classic models in information retrieval systems?
An information model (IR) model can be classified into the following three models −
Classical IR Model
It is the simplest and easy to implement IR model. This model is based on mathematical knowledge that was easily recognized and understood as well. Boolean, Vector and Probabilistic are the three classical IR models.
Non-Classical IR Model
It is completely opposite to the classical IR model. Such kinds of IR models are based on principles other than similarity, probability, Boolean operations. Information logic model, situation theory model, and interaction models are examples of non-classical IR models.
Alternative IR Model
It is the enhancement of the classical IR model making use of some specific techniques from some other fields. Cluster model, fuzzy model, and latent semantic indexing (LSI) models are the example of alternative IR model.
What are the characteristics of information retrieval?
There are 12 characteristics of an Information Retrieval model:
- Search intermediary,
- Domain knowledge,
- Relevance feedback,
- Natural language interface,
- Graphical query language,
- Conceptual queries,
- Full-text IR,
- Field searching,
- Fuzzy queries,
- Hypertext integration,
- Machine learning,
- Ranked output
What are the components and features of Information retrieval systems?
1. Inverted Index
The primary data structure of most of the IR systems is in the form of inverted index. We can define an inverted index as a data structure that list, for every word, all documents that contain it and frequency of the occurrences in document. It makes it easy to search for ‘hits’ of a query word.
2. Stop Word Elimination
Stop words are those high frequency words that are deemed unlikely to be useful for searching. They have less semantic weights. All such kind of words are in a list called stop list. For example, articles “a”, “an”, “the” and prepositions like “in”, “of”, “for”, “at” etc. are the examples of stop words. The size of the inverted index can be significantly reduced by stop list. As per Zipf’s law, a stop list covering a few dozen words reduces the size of inverted index by almost half. On the other hand, sometimes the elimination of stop word may cause elimination of the term that is useful for searching. For example, if we eliminate the alphabet “A” from “Vitamin A” then it would have no significance.
Stemming, the simplified form of morphological analysis, is the heuristic process of extracting the base form of words by chopping off the ends of words. For example, the words laughing, laughs, laughed would be stemmed to the root word laugh.