Database

Inverted Index

Senthil Nayagan
Dec 25, 2022 - 1 Mins Read

What is an inverted index?

All modern information retrieval systems (IRS) rely heavily on inverted indexes, a kind of data structure that is essential to effective retrieval. It stores a mapping of words (or any search terms) to their locations in the database table or document. In other words, it maintains a key-value pair that identifies where in the database or document a given search phrase may be found. The purpose of an inverted index is to allow fast full-text searches, but at the expense of additional processing time whenever a new item is added to the database.

Term		Document/Table ID
large		1001, 1003
winter		1002
apple		1002, 1004
hot		1002, 1003, 1005

^{Table 1: Search Terms Mapped to Document/Table IDs.}

Now comes the obvious question: Which data structure is best? Can we use a fixed-size data structure such as an array for this? Well, fixed-size data structures, such as fixed-sized arrays, are impractical. If we consider a dynamic index in which new documents are added or existing ones are modified, it will be challenging to modify the sizes of the fixed-data structures. For these reasons, we need a linked list or equivalent data structure with a dynamically changing size. In standard information retrieval terminology, this list is called a posting list (also referred to as a postings file, or inverted file).

In memory, a posting list can be represented as a data structure such as a linked list or an array with variable length. One occurrence of a word-document pair (large => 1001, 1003) is referred to as a posting, and the sum of all the posting lists is then referred to as the postings.

Comments

comments powered by Disqus

Explore more like this

index inverted-index