This article is written assuming you have MongoDB and mongosh installed on your system and have basic knowledge about them.
MongoDB provides various types of indexes, one of which is text index, provided to support text search on string content.
To perform a text search operation you must have text index in your collection.
Since it is an index, one collection can consist of only one text index, but as other indices, text index may cover many fields.
In this article we will learn how to create text indexes and how to use them to perform search queries.
Run your MongoDB server on your system by running
In Linux terminal:$ sudo systemctl start mongod
In windows cmd:C:\mongodb\bin\mongod.exe
Now start mongosh by running:
In Linux terminal:$ mongosh
In windows cmd:C:\mongodb\bin\mongosh.exe
Now create a new database named movies by running:
use movies
This will create databases and switch to the current database. You can list all databases by running
show dbs
We will perform a search on the data of movies given below:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
[
{
title: "Bang Bang",
genres: ["Romance", "Action"],
runtime: 105,
rated: "R",
year: 2018,
directors: ["Siddhartha Anand"],
cast: ["Hritik Roshan", "Katrina Kaif"],
type: "movie"
},
{
title: "Fanaa",
genres: ["Drama", "Action", "Romance"],
runtime: 203,
rated: "R",
year: 2006,
directors: ["Kunal Kohli"],
cast: ["Amir Khan", "Kajol", "Rishi Kapoor"],
type: "movie"
},
{
title: "Robin Hood",
genres: ["Action", "Romance"],
runtime: 143,
rated: "R",
year: 1922,
directors: ["Allan Dwan"],
cast: ["Wallace Beery", "Sam De Grasse", "Enid Bennett"],
type: "movie"
}
]
So, let’s insert this data in the movies collection of movies database.
Switch mongosh to editor mode by running .editor , then write the given code and run it by pressing ctrl + D.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
db.movies.insertMany([
{
title: "Bang Bang",
genres: ["Romance", "Action"],
runtime: 105,
rated: "R",
year: 2018,
directors: ["Siddhartha Anand"],
cast: ["Hritik Roshan", "Katrina Kaif"],
type: "movie"
},
{
title: "Fanaa",
genres: ["Drama", "Action", "Romance"],
runtime: 203,
rated: "R",
year: 2006,
directors: ["Kunal Kohli"],
cast: ["Amir Khan", "Kajol", "Rishi Kapoor"],
type: "movie"
},
{
title: "Robin Hood",
genres: ["Action", "Romance"],
runtime: 143,
rated: "R",
year: 1922,
directors: ["Allan Dwan"],
cast: ["Sam De Grasse", "Wallace Beery", "Enid Bennett"],
type: "movie"
}
])
You must receive output as:
1 2 3 4 5 6 7 8 9
{ { acknowledged: true, insertedIds: { '0': ObjectId("6154697130c458dec8aacdd3"), '1': ObjectId("6154697130c458dec8aacdd4"), '2': ObjectId("6154697130c458dec8aacdd5") } }
This shows that our data was inserted successfully and the results show the ObjectId created for the inserted objects and acknowledged variable set to true.
Now let’s create text index by running the given code in mogosh.
1 2 3 4 5 6
db.movies.createIndex({ title: "text", directors: "text", cast: "text", genres: "text" })
Here we are creating text index using title, directors, cast, and genres by passing them as field: “text” .
The created index looks like this:
title_text_directors_text_cast_text_genres_text
We will use operator here, which is used to perform search queries on collections with text index.
will tokenize the search string by separating on the basis of whitespace and punctuation, and taking the logical OR of all such tokens in search string.
For example, you could use the given query to find all movies containing any terms among “Fanaa”, “Hritik”, and “Chris”:
db.movies.find( { $text: { $search: "Fanaa Hritik Chris" } } )
And you will receive output as:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
[
{
_id: ObjectId("6154697130c458dec8aacdd4"),
title: 'Fanaa',
genres: ['Drama', 'Action', 'Romance'],
runtime: 203,
rated: 'R',
year: 2006,
directors: ['Kunal Kohli'],
cast: ['Amir Khan', 'Kajol', 'Rishi Kapoor'],
type: 'movie'
},
{
_id: ObjectId("6154697130c458dec8aacdd3"),
title: 'Bang Bang',
genres: ['Romance', 'Action'],
runtime: 105,
rated: 'R',
year: 2018,
directors: ['Siddhartha Anand'],
cast: ['Hritik Roshan', 'Katrina Kaif'],
type: 'movie'
}
]
Because the first object contains the string Fanaa in title, and the second object contains Hritik string in cast, the third word, Chris, has no effect on our result as it is not present in any Object we inserted above.
We can modify our search string to perform more restricted searches as:
Exact phrase search can be performed similarly as it is done everywhere else with double quotes on search string, but we need to ignore them to not be processed as quotes depicting string value.
Hence our search query looks as:
db.movies.find( { $text: { $search: "\"Hritik Roshan\"" } } )
And it will give us the result as:
1
2
3
4
5
6
7
8
9
10
11
12
13
[
{
_id: ObjectId("6154697130c458dec8aacdd3"),
title: 'Bang Bang',
genres: ['Romance', 'Action'],
runtime: 105,
rated: 'R',
year: 2018,
directors: ['Siddhartha Anand'],
cast: ['Hritik Roshan', 'Katrina Kaif'],
type: 'movie'
}
]
You can exclude a word by appending - sign before the word. Suppose we want to search for an action movie which does not contain Hritik as cast, we can do it as:
db.movies.find( { $text: { $search: "Action -Hritik" } } )
You will get output as
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
[
{
_id: ObjectId("6154697130c458dec8aacdd5"),
title: 'Robin Hood',
genres: ['Action', 'Romance'],
runtime: 143,
rated: 'R',
year: 1922,
directors: ['Allan Dwan'],
cast: [‘Sam De Grasse’, ’Wallace Beery’, ‘Enid Bennett’],
type: 'movie'
},
{
_id: ObjectId("6154697130c458dec8aacdd4"),
title: 'Fanaa',
genres: ['Drama', 'Action', 'Romance'],
runtime: 203,
rated: 'R',
year: 2006,
directors: ['Kunal Kohli'],
cast: ['Amir Khan', 'Kajol', 'Rishi Kapoor'],
type: 'movie'
}
]
By default, results of the search are unsorted in MongoDB. If you want them sorted based on relevance score, you could write a query as:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
db.movies.find({ $text: { $search: "Action Amir -Hritik" } }, { score: { $meta: "textScore" } }) .sort({ score: { $meta: "textScore" } })
Here we are creating score variables using $meta operator with textScore value which will generate a score on the basis of how well a document matches the search query and then $sort will sort the results on that score.
On executing this you will get a result as follows:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
[
{
_id: ObjectId("6154697130c458dec8aacdd4"),
title: 'Fanaa',
genres: ['Drama', 'Action', 'Romance'],
runtime: 203,
rated: 'R',
year: 2006,
directors: ['Kunal Kohli'],
cast: ['Amir Khan', 'Kajol', 'Rishi Kapoor'],
type: 'movie',
score: 1.85
},
{
_id: ObjectId("6154697130c458dec8aacdd5"),
title: 'Robin Hood',
genres: ['Action', 'Romance'],
runtime: 143,
rated: 'R',
year: 1922,
directors: ['Allan Dwan'],
cast: ['Wallace Beery', 'Sam De Grasse', 'Enid Bennett'],
type: 'movie',
score: 1.1
}
]
Here you can notice that we have an extra score field in each object, this is the relevance score. As per our query the first object contains both “Amir” and “Action”, therefore its relevance score is higher compared to the second object where only “Action” string is matching Also, the overall result is sorted according to our relevance score.
Now, you are ready to implement search in your application.