December 16, 2021

Aggregations in MongoDB

Kamlendra Kumar Rathour421kamlendra

DURATION

13min

Single Purpose Aggregations:

These methods provide simple aggregations such as max, count, distinct, etc., but lack the flexibility of a pipeline.

We will use given students data to perform aggregations.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
[
  {
    name: "Suraj Trivedi",
    roll: 18,
    branch: "ECE",
    height: 164,
    weight: 58,
    current_year: 1,
    current_semester: 1,
    gender: "M"
 },
  {
    name: "Nikhil Singh",
    roll: 30,
    branch: "CSE",
    height: 160,
    weight: 65,
    current_year: 4,
    current_semester: 7,
    gender: "O"
 },
  {
    name: "Rahul Pant",
    roll: 17,
    branch: "CE",
    height: 158,
    weight: 60,
    current_year: 3,
    current_semester: 5,
    gender: "M"
 },
  {
    name: "Swati Bhist",
    roll: 11,
    branch: "ECE",
    height: 154,
    weight: 51,
    current_year: 3,
    current_semester: 5,
    gender: "F"
 },
  {
    name: "Vinay Chatopadhyay",
    roll: 4,
    branch: "ME",
    height: 161,
    weight: 70,
    current_year: 2,
    current_semester: 3,
    gender: "M"
 },
  {
    name: "Nikita Sahu",
    roll: 24,
    branch: "CSE",
    height: 157,
    weight: 54,
    current_year: 4,
    current_semester: 7,
    gender: "F"
 },
  {
    name: "Ayush Agnihotri",
    roll: 13,
    branch: "EEE",
    height: 164,
    weight: 58,
    current_year: 1,
    current_semester: 2,
    gender: "M"
 },
  {
    name: "Manish Sinha",
    roll: 9,
    branch: "ME",
    height: 152,
    weight: 57,
    current_year: 2,
    current_semester: 3,
    gender: "M"
 },
  {
    name: "Preet Khurana",
    roll: 8,
    branch: "EEE",
    height: 152,
    weight: 50,
    current_year: 1,
    current_semester: 2,
    gender: "F"
 },
  {
    name: "Ubaid Ahmed",
    roll: 10,
    branch: "CSE",
    height: 159,
    weight: 62,
    current_year: 4,
    current_semester: 7,
    gender: "M"
 },

]

It’s time to write some code
Start mongod server and mongosh shell and create a new database.

use tutorial_db

This will also set the current database as tutorial_db.

Now create and store the above given data in the students collection in the tutorial database by running this in mongosh.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
db.students.insertMany([
  {
    name: "Suraj Trivedi",
    roll: 18,
    branch: "ECE",
    height: 164,
    weight: 58,
    current_year: 1,
    current_semester: 1,
    gender: "M"
 },
  {
    name: "Nikhil Singh",
    roll: 30,
    branch: "CSE",
    height: 160,
    weight: 65,
    current_year: 4,
    current_semester: 7,
    gender: "O"
 },
  {
    name: "Rahul Pant",
    roll: 17,
    branch: "CE",
    height: 158,
    weight: 60,
    current_year: 3,
    current_semester: 5,
    gender: "M"
 },
  {
    name: "Swati Bhist",
    roll: 11,
    branch: "ECE",
    height: 154,
    weight: 51,
    current_year: 3,
    current_semester: 5,
    gender: "F"
 },
  {
    name: "Vinay Chatopadhyay",
    roll: 4,
    branch: "ME",
    height: 161,
    weight: 70,
    current_year: 2,
    current_semester: 3,
    gender: "M"
 },
  {
    name: "Nikita Sahu",
    roll: 24,
    branch: "CSE",
    height: 157,
    weight: 54,
    current_year: 4,
    current_semester: 7,
    gender: "F"
 },
  {
    name: "Ayush Agnihotri",
    roll: 13,
    branch: "EEE",
    height: 164,
    weight: 58,
    current_year: 1,
    current_semester: 2,
    gender: "M"
 },
  {
    name: "Manish Sinha",
    roll: 9,
    branch: "ME",
    height: 152,
    weight: 57,
    current_year: 2,
    current_semester: 3,
    gender: "M"
 },
  {
    name: "Preet Khurana",
    roll: 8,
    branch: "EEE",
    height: 152,
    weight: 50,
    current_year: 1,
    current_semester: 2,
    gender: "F"
 },
  {
    name: "Ubaid Ahmed",
    roll: 10,
    branch: "CSE",
    height: 159,
    weight: 62,
    current_year: 4,
    current_semester: 7,
    gender: "M"
 },

])

You will get output as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
{
  acknowledged: true,
  insertedIds: {
    '0': ObjectId("6166ea11ad02102ea04879b1"),
    '1': ObjectId("6166ea11ad02102ea04879b2"),
    '2': ObjectId("6166ea11ad02102ea04879b3"),
    '3': ObjectId("6166ea11ad02102ea04879b4"),
    '4': ObjectId("6166ea11ad02102ea04879b5"),
    '5': ObjectId("6166ea11ad02102ea04879b6"),
    '6': ObjectId("6166ea11ad02102ea04879b7"),
    '7': ObjectId("6166ea11ad02102ea04879b8"),
    '8': ObjectId("6166ea11ad02102ea04879b9"),
    '9': ObjectId("6166ea11ad02102ea04879ba")
  }
}

This shows that operations were successful and we are ready to implement aggregations on the students data.

db.collection.distinct()

We can select distinct branch names by running:

db.students.distinct("branch")

This will result as below:

[ 'CE', 'CSE', 'ECE', 'EEE', 'ME' ]

db.collection.count()

You can count the male students by running:

db.students.count({gender:"M"})

Result:
6

Aggregation Pipelines:

The aggregation framework of MongoDB is based on the concept of data processing pipelines in which documents enter a series of stages, each stage makes some transformation, and then we get our aggregated result.

In our case, let’s suppose college placement is going on. Various IT companies require data of students who are in their last year and are studying in branch ECE or CSE, sorted according to their names. The aggregation query will retrieve the results as:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
db.students.aggregate([
  {
    $match: {
      branch: {
        $in: ["CSE", "ECE"]
      },
      current_year: 4
    }
 },
  {
    $sort: {
      name: 1
    }
 }
])

This will yield output as:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
[
  {
    _id: ObjectId("6166ea11ad02102ea04879b2"),
    name: 'Nikhil Singh',
    roll: 30,
    branch: 'CSE',
    height: 160,
    weight: 65,
    current_year: 4,
    current_semester: 7,
    gender: 'O'
 },
  {
    _id: ObjectId("6166ea11ad02102ea04879b6"),
    name: 'Nikita Sahu',
    roll: 24,
    branch: 'CSE',
    height: 157,
    weight: 54,
    current_year: 4,
    current_semester: 7,
    gender: 'F'
 },
  {
    _id: ObjectId("6166ea11ad02102ea04879ba"),
    name: 'Ubaid Ahmed',
    roll: 10,
    branch: 'CSE',
    height: 159,
    weight: 62,
    current_year: 4,
    current_semester: 7,
    gender: 'M'
 }
]

Which is exactly what we needed.
Let’s break down our aggregate query to understand how it worked.
We are passing an array of two objects to aggregate function, meaning we have implemented two stages.

Stage-1 :-

1
2
3
4
5
6
$match: {
  branch: {
    $in: ["CSE", "ECE"]
  },
  current_year: 4
}

This stage is a document filter. We are matching the branch which has value in [“CSE”, “ECE”]. This is equivalent to { branch = ”CSE” or branch=”ECE” } and with it we are also matching current_year = 4.
By this filter we will get students who are studying in year four and have branches ECE or CSE. This result will be passed to the second stage in the pipeline.

Stage-2 :-

1
2
3
$sort: {
  name: 1
}

This stage receives the output of the filter stage and applies transformations to it. Here I have used $sort aggregation operator which sorts the documents. Its value contains {field_name: value} where field_name is the name of the field based on which we want to sort our documents.
Value against it is either
1 => sorted in ascending order or
-1 => sorted in descending order.

Now let’s try other operators.

1. MAX

In this section we want to find the maximum height among students. This can be achieved by pipeline as:

1
2
3
4
5
6
7
8
9
10
11
12
13
db.students.aggregate([
  {
    $match: {}
 },
  {
    $group: {
      _id: null,
      maxHeight: {
        $max: "$height"
      }
    }
 }
])

2. MIN

This will output the minimum height.

1
2
3
4
5
6
7
8
9
10
11
12
13
db.students.aggregate([
  {
    $match: {}
 },
  {
    $group: {
      _id: null,
      minHeight: {
        $min: "$height"
      }
    }
 }
])

3. AVG

This will yield the result of the average height of the students.

1
2
3
4
5
6
7
8
9
10
11
12
13
db.students.aggregate([
  {
    $match: {}
 },
  {
    $group: {
      _id: null,
      averageHeight: {
        $avg: "$height"
      }
    }
 }
])

$group Stage

You can see in the above examples we have used stage $group in the aggregation pipeline. It requires two fields, one is _id, which is required for distinct grouping for each document. The second is the accumulator field which has accumulator operators such as max, min, avg, count and so on.

$project stage

As defined in the mongo doc project stage, this passes along the documents with the requested fields to the next stage in the pipeline. The specified fields can be existing fields from the input documents or newly computed fields.
It decides what to project and what to not.
For example, the first query we performed in the aggregation pipeline:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
db.students.aggregate([
  {
    $match: {
      branch: {
        $in: ["CSE", "ECE"]
      },
      current_year: 4
    }
 },
  {
    $sort: {
      name: 1
    }
 }
])

This gives complete details of students but suppose we want only roll number and names, we can do so with project stage as:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
db.students.aggregate([
  {
    $match: {
      branch: {
        $in: ["CSE", "ECE"]
      },
      current_year: 4
    }
 },
  {
    $sort: {
      name: 1
    }
 },
  {
    $project: {
      "_id": 0,
      "roll": 1,
      "name": 1
    }
 }
])

And we get output as:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[
  {
    name: 'Nikhil Singh',
    roll: 30
  },
  {
    name: 'Nikita Sahu',
    roll: 24
  },
  {
    name: 'Ubaid Ahmed',
    roll: 10
  }
]

Which is our expected result.

So that’s how we implement an aggregation pipeline. However, there are a lot of aggregation operators and normal operators which cannot be covered in a single article and can be used in a variety of ways to get desired aggregations.

Chat on Discord

December 16, 2021

Aggregations in MongoDB

DURATION

categories

Tags

share

COMPETITIVE PROGRAMMING AT TOPCODER

Single Purpose Aggregations:

Aggregation Pipelines:

1. MAX

2. MIN

3. AVG

$group Stage

$project stage

Recommended for you

Schema Validation in MongoDB Atlas

Storing Large Files in MongoDB Using GridFS

Implementing Text Search in MongoDB