Similar to SQL, MongoDB has aggregation pipelines and single aggregation methods to perform aggregation operations on documents in a collection.
In this article we will discuss how to implement simple aggregation methods as well as how to implement aggregation pipelines.
We will implement the aggregations on MongoDB command line mongosh.
If you don’t have MongoDB and mongosh already installed on your system, see how to install Mongod and mongosh.
There are three methods to perform aggregation in MongoDB collections:-
Single purpose aggregations
Aggregation pipelines
Map-reduce function (deprecated in MongoDB 5.0 so we won’t discuss this one)
We will discuss and implement every method, starting with single-purpose aggregation operations.
These methods provide simple aggregations such as max, count, distinct, etc., but lack the flexibility of a pipeline.
We will use given students data to perform aggregations.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
[
{
name: "Suraj Trivedi",
roll: 18,
branch: "ECE",
height: 164,
weight: 58,
current_year: 1,
current_semester: 1,
gender: "M"
},
{
name: "Nikhil Singh",
roll: 30,
branch: "CSE",
height: 160,
weight: 65,
current_year: 4,
current_semester: 7,
gender: "O"
},
{
name: "Rahul Pant",
roll: 17,
branch: "CE",
height: 158,
weight: 60,
current_year: 3,
current_semester: 5,
gender: "M"
},
{
name: "Swati Bhist",
roll: 11,
branch: "ECE",
height: 154,
weight: 51,
current_year: 3,
current_semester: 5,
gender: "F"
},
{
name: "Vinay Chatopadhyay",
roll: 4,
branch: "ME",
height: 161,
weight: 70,
current_year: 2,
current_semester: 3,
gender: "M"
},
{
name: "Nikita Sahu",
roll: 24,
branch: "CSE",
height: 157,
weight: 54,
current_year: 4,
current_semester: 7,
gender: "F"
},
{
name: "Ayush Agnihotri",
roll: 13,
branch: "EEE",
height: 164,
weight: 58,
current_year: 1,
current_semester: 2,
gender: "M"
},
{
name: "Manish Sinha",
roll: 9,
branch: "ME",
height: 152,
weight: 57,
current_year: 2,
current_semester: 3,
gender: "M"
},
{
name: "Preet Khurana",
roll: 8,
branch: "EEE",
height: 152,
weight: 50,
current_year: 1,
current_semester: 2,
gender: "F"
},
{
name: "Ubaid Ahmed",
roll: 10,
branch: "CSE",
height: 159,
weight: 62,
current_year: 4,
current_semester: 7,
gender: "M"
},
]
It’s time to write some code
Start mongod server and mongosh shell and create a new database.
use tutorial_db
This will also set the current database as tutorial_db.
Now create and store the above given data in the students collection in the tutorial database by running this in mongosh.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
db.students.insertMany([
{
name: "Suraj Trivedi",
roll: 18,
branch: "ECE",
height: 164,
weight: 58,
current_year: 1,
current_semester: 1,
gender: "M"
},
{
name: "Nikhil Singh",
roll: 30,
branch: "CSE",
height: 160,
weight: 65,
current_year: 4,
current_semester: 7,
gender: "O"
},
{
name: "Rahul Pant",
roll: 17,
branch: "CE",
height: 158,
weight: 60,
current_year: 3,
current_semester: 5,
gender: "M"
},
{
name: "Swati Bhist",
roll: 11,
branch: "ECE",
height: 154,
weight: 51,
current_year: 3,
current_semester: 5,
gender: "F"
},
{
name: "Vinay Chatopadhyay",
roll: 4,
branch: "ME",
height: 161,
weight: 70,
current_year: 2,
current_semester: 3,
gender: "M"
},
{
name: "Nikita Sahu",
roll: 24,
branch: "CSE",
height: 157,
weight: 54,
current_year: 4,
current_semester: 7,
gender: "F"
},
{
name: "Ayush Agnihotri",
roll: 13,
branch: "EEE",
height: 164,
weight: 58,
current_year: 1,
current_semester: 2,
gender: "M"
},
{
name: "Manish Sinha",
roll: 9,
branch: "ME",
height: 152,
weight: 57,
current_year: 2,
current_semester: 3,
gender: "M"
},
{
name: "Preet Khurana",
roll: 8,
branch: "EEE",
height: 152,
weight: 50,
current_year: 1,
current_semester: 2,
gender: "F"
},
{
name: "Ubaid Ahmed",
roll: 10,
branch: "CSE",
height: 159,
weight: 62,
current_year: 4,
current_semester: 7,
gender: "M"
},
])
You will get output as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
{ acknowledged: true, insertedIds: { '0': ObjectId("6166ea11ad02102ea04879b1"), '1': ObjectId("6166ea11ad02102ea04879b2"), '2': ObjectId("6166ea11ad02102ea04879b3"), '3': ObjectId("6166ea11ad02102ea04879b4"), '4': ObjectId("6166ea11ad02102ea04879b5"), '5': ObjectId("6166ea11ad02102ea04879b6"), '6': ObjectId("6166ea11ad02102ea04879b7"), '7': ObjectId("6166ea11ad02102ea04879b8"), '8': ObjectId("6166ea11ad02102ea04879b9"), '9': ObjectId("6166ea11ad02102ea04879ba") } }
This shows that operations were successful and we are ready to implement aggregations on the students data.
db.collection.distinct()
We can select distinct branch names by running:
db.students.distinct("branch")
This will result as below:
[ 'CE', 'CSE', 'ECE', 'EEE', 'ME' ]
db.collection.count()
You can count the male students by running:
db.students.count({gender:"M"})
Result:
6
The aggregation framework of MongoDB is based on the concept of data processing pipelines in which documents enter a series of stages, each stage makes some transformation, and then we get our aggregated result.
In our case, let’s suppose college placement is going on. Various IT companies require data of students who are in their last year and are studying in branch ECE or CSE, sorted according to their names. The aggregation query will retrieve the results as:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
db.students.aggregate([
{
$match: {
branch: {
$in: ["CSE", "ECE"]
},
current_year: 4
}
},
{
$sort: {
name: 1
}
}
])
This will yield output as:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
[
{
_id: ObjectId("6166ea11ad02102ea04879b2"),
name: 'Nikhil Singh',
roll: 30,
branch: 'CSE',
height: 160,
weight: 65,
current_year: 4,
current_semester: 7,
gender: 'O'
},
{
_id: ObjectId("6166ea11ad02102ea04879b6"),
name: 'Nikita Sahu',
roll: 24,
branch: 'CSE',
height: 157,
weight: 54,
current_year: 4,
current_semester: 7,
gender: 'F'
},
{
_id: ObjectId("6166ea11ad02102ea04879ba"),
name: 'Ubaid Ahmed',
roll: 10,
branch: 'CSE',
height: 159,
weight: 62,
current_year: 4,
current_semester: 7,
gender: 'M'
}
]
Which is exactly what we needed.
Let’s break down our aggregate query to understand how it worked.
We are passing an array of two objects to aggregate function, meaning we have implemented two stages.
Stage-1 :-
1
2
3
4
5
6
$match: {
branch: {
$in: ["CSE", "ECE"]
},
current_year: 4
}
This stage is a document filter. We are matching the branch which has value in [“CSE”, “ECE”]. This is equivalent to { branch = ”CSE” or branch=”ECE” } and with it we are also matching current_year = 4.
By this filter we will get students who are studying in year four and have branches ECE or CSE. This result will be passed to the second stage in the pipeline.
Stage-2 :-
1
2
3
$sort: {
name: 1
}
This stage receives the output of the filter stage and applies transformations to it. Here I have used $sort aggregation operator which sorts the documents. Its value contains {field_name: value} where field_name is the name of the field based on which we want to sort our documents.
Value against it is either
1 => sorted in ascending order or
-1 => sorted in descending order.
Now let’s try other operators.
In this section we want to find the maximum height among students. This can be achieved by pipeline as:
1 2 3 4 5 6 7 8 9 10 11 12 13
db.students.aggregate([ { $match: {} }, { $group: { _id: null, maxHeight: { $max: "$height" } } } ])
This will output the minimum height.
1 2 3 4 5 6 7 8 9 10 11 12 13
db.students.aggregate([ { $match: {} }, { $group: { _id: null, minHeight: { $min: "$height" } } } ])
This will yield the result of the average height of the students.
1 2 3 4 5 6 7 8 9 10 11 12 13
db.students.aggregate([ { $match: {} }, { $group: { _id: null, averageHeight: { $avg: "$height" } } } ])
You can see in the above examples we have used stage $group in the aggregation pipeline. It requires two fields, one is _id, which is required for distinct grouping for each document. The second is the accumulator field which has accumulator operators such as max, min, avg, count and so on.
As defined in the mongo doc project stage, this passes along the documents with the requested fields to the next stage in the pipeline. The specified fields can be existing fields from the input documents or newly computed fields.
It decides what to project and what to not.
For example, the first query we performed in the aggregation pipeline:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
db.students.aggregate([
{
$match: {
branch: {
$in: ["CSE", "ECE"]
},
current_year: 4
}
},
{
$sort: {
name: 1
}
}
])
This gives complete details of students but suppose we want only roll number and names, we can do so with project stage as:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
db.students.aggregate([
{
$match: {
branch: {
$in: ["CSE", "ECE"]
},
current_year: 4
}
},
{
$sort: {
name: 1
}
},
{
$project: {
"_id": 0,
"roll": 1,
"name": 1
}
}
])
And we get output as:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
[
{
name: 'Nikhil Singh',
roll: 30
},
{
name: 'Nikita Sahu',
roll: 24
},
{
name: 'Ubaid Ahmed',
roll: 10
}
]
Which is our expected result.
So that’s how we implement an aggregation pipeline. However, there are a lot of aggregation operators and normal operators which cannot be covered in a single article and can be used in a variety of ways to get desired aggregations.