Mongo Cheatsheet: Important Stuff

Introduction

MongoDB is a source available NoSQL DB from MongoDB corp and was first released in 2009. NoSQL rose to popularity due to its schema flexibility, and ability to scale horizontally as NoSQL uses the Base approach.

Some salient points on NoSQL

NoSQL stands for Not Only SQL
No predefined schema
supports evntual consistency
high performance, high availablity, and highly scalable
NoSQL uses the BASE approach

Mongo uses a slightly different naming convention than typical SQL DB’s
Table -> Collection
Row -> Document
Column -> Field
Join -> Embedded Document

Table of Contents

Introduction
BASE VS ACID
CAP Theorem
Different types of NoSQL DB’s
Mongo DB Features
Scalability in Mongo VS RDBMS
Data Modeling
- Embedded Data Model
- Normalized Data Model
  - Aggregation Pipelie
Mongo DB Commands

BASE VS ACID

BASE: Base provides lesser assurance than ACID, however, scales well and is also able to adjust to changes easily.
- Basically Available i.e Guaranteed availability in case of failure,
- Soft state i.e State of DB can change without application interaction due to eventual consistency,
- eventual consistency i.e will be eventually consistent after the application input. The data will be replicated to different nodes and will eventually reach a consistent state.

ACID: Atomicity, Consistency, Idempotency, and Durability
- Atomicity means that transactions must complete or fail as a whole, so a lot of bookkeeping must be done behind the scenes to guarantee this.
- Consistency ensures that a transaction can only bring the database from one valid state to another, any data written to the database must be valid according to all defined rules, including constraints, cascades, triggers, and any combination. However this is different to Consistency in CAP theorem.
- Isolation ensures that concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially
- Durability guarantees that once a transaction has been committed, it will remain committed even in the case of a system failure.

The SQL API lacks a mechanism to describe queries where ACID’s requirements are relaxed. This is why the BASE databases are all NoSQL.

CAP Theorem

Also known as Brewer’s theorem states that it is impossible for a Distributed system to provide more than two of the following three properties.

Consistency : Consistency means that data is the same across the cluster, so you can read or write from/to any node and get the same data
Availablity : Availability means the ability to access the cluster even if a node in the cluster goes down.
Partition : Partition tolerance means that the cluster continues to function even if there is a “partition” (communication break) between two nodes (both nodes are up, but can’t communicate).

However, in reality, you can’t just say that MongoDB is CP/AP/CA, because it actually is a trade-off between C, A, and P, depending on both database/driver configurations.

MongoDB is strongly consistent by default – if you do a write and then do a read, assuming the write was successful you will always be able to read the result of the write you just read. This is because MongoDB is a single-master system and all reads go to the primary by default.

MongoDB never allows writing to secondary and you lose on high availability in CAP theorem. If you optionally enable reading from the secondaries then MongoDB becomes eventually consistent where it’s possible to read out-of-date results.

MongoDB also gets high availability through automatic failover in replica sets:
http://www.mongodb.org/display/DOCS/Replica+Sets

Different types of NoSQL DB’s

Key Value Db : DynamoDB, Redis
- Data in Key value pair
Document DB : Mongo
- Rows will be documents
Column DB : Cassandra, Redshift
- Data is stored in columns
- Analytics and warehousing.
Graph DB : Neo4j
- Data stores with Node relations

Mongo DB Features

whereas ACID is an acronym for Atomicity, Consistency, idempotency, and Durability.

Publicly available on 2009
Stores JSON to BSON when storing to DB
max document size is 16MB
indexes are supported for faster querying.
Aggregation of processing into differnt pipelines.
Supports TTL collections useful for caching, logging etc.
Supports large file storage.

Scalability in Mongo VS RDBMS

Scalable has to be broken down into its constituents:Read scaling & Write scaling

ACID-compliant databases (like traditional RDBMS’s) can scale reads. They are not inherently less efficient than NoSQL databases because the (possible) performance bottlenecks are introduced by things NoSQL (sometimes) lacks (like joins and where restrictions) which you can opt not to use. Clustered SQL RDBMS’s can scale reads by introducing additional nodes in the cluster. There are constraints to how far read operations can be scaled, but these are imposed by the difficulty of scaling up writes as you introduce more nodes into the cluster.

There are various constraints imposed by the ACID principle, To scale up write operations or the number of nodes in a cluster beyond a certain point you have to be able to relax some of the ACID requirements:

Dropping Atomicity lets yo shorten the duration the DB is locked. Ex: Mongo, Couch DB. Similarly Dropping Consisitency lets you scale write operation. Ex: Cassandra. Dropping durability lets you to write command without flushing to disk. Example memcache, redis.

NoSQL typically follows Base instead of ACID model, they give up the A, C, or D requirement and improve scalability.

Joeri Sebrechts

Data Modeling

Embedded Data Model

Suitable for smaller dataset with one-to-one or one-to-many relationships.
Faster read operation since data Fetch in single db operation
Unsuitable if data is huge due to 16MB document limit.

Normalized Data Model

Use a normalized data model to represent more complex many to many relations, and prevent data duplication similar to how we use in the traditional SQL databases.

Aggregation Pipelie

Data is fed a multi-stage pipeline that transforms the documents and passes them to the next stage and the aggregated result is returned.

Mongo DB Commands

Database, Collection related operations

show dbs
use mydb
db.stats()
db
show collections
db.createCollection(‘product’)
db.product.drop()

document related operations

db.product.insert({title: “product1”, price: 2000, description: “test product 1”})
db.product.insertMany([ {title: “product 2”, price: 3000, description: “test product”}, {title: “product 3”, price: 4000, description: “test product “}, {title: “product 4”, price: 4000, description: “test product”}, {title: “product 5”, price: 4000, description: “test product”}, {title: “product 6”, price: 4000, description: “test product”}, {title: “product 7”, price: 4000, description: “test product”}, {title: “product 8”, price: 4000, description: “test product”}, {title: “product 9”, price: 4000, description: “test product”}, {title: “product 10”, price: 4000, description: “test product”},])
db.product.find()
db.product.find().count()
db.product.find().hasNext()
db.product.find().limit(5)
db.product.find().skip(5)
db.product.find().skip(count * 0).limit(count)
db.product.find().pretty()
db.product.find({title: “product 7”})
db.product.find({price: { $gt: 3000 }})
db.product.find({price: { $lte: 3000 }})
db.product.find({price: { $in: [2000, 3000] }})
db.product.find( { $or: [ {title: “product 6”}, {price: 2000} ] } )
db.product.find( { $and: [ {title: “product 6”}, {price: 2000} ] } )

query projection, update, remove

db.product.find({}, {title: 1})
db.product.find({}, {price: 0, description: 0})
db.product.update( {_id: ObjectId(’61d14117b4044b4a7e194703′)}, { title: ‘product 1 updated’, price: 2000, description: ‘test product’ } )
db.product.update( {_id: ObjectId(’61d14117b4044b4a7e194703′)}, { $set: { price: 4000 } } )
db.product.updateMany({price: 4000}, {$set: {price: 6000}})
db.product.remove({“_id” : ObjectId(“61d14117b4044b4a7e194703”)})
db.product.deleteOne({“_id” : ObjectId(“61d14117b4044b4a7e194703”)})
db.product.deleteMany({price: 6000})

Create a New Collection

use classwork;
show collections;
db.dept.insert({_id:10,dname:"ACCOUNTING",loc:"NEW YORK"});
db.dept.insert({_id:20,dname:"RESEARCH",loc:"DALLAS"});
db.dept.insert({_id:30,dname:"SALES",loc:"CHICAGO"});
db.dept.insert({_id:40,dname:"OPERATIONS",loc:"BOSTON"});

db.emp.insert({_id:7369,ename:"SMITH",job:"CLERK",mgr:7902,sal:800.00,deptno:20, comm:null});
db.emp.insert({_id:7499,ename:"ALLEN",job:"SALESMAN",mgr:7698,sal:1600.00,comm:300.00,deptno:30});
db.emp.insert({_id:7521,ename:"WARD",job:"SALESMAN",mgr:7698,sal:1250.00,comm:500.00,deptno:30});
db.emp.insert({_id:7566,ename:"JONES",job:"MANAGER",mgr:7839,sal:2975.00,deptno:20, comm:null});
db.emp.insert({_id:7654,ename:"MARTIN",job:"SALESMAN",mgr:7698,sal:1250.00,comm:1400.00,deptno:30});
db.emp.insert({_id:7698,ename:"BLAKE",job:"MANAGER",mgr:7839,sal:2850.00,deptno:30});
db.emp.insert({_id:7782,ename:"CLARK",job:"MANAGER",mgr:7839,sal:2450.00,deptno:10});

db.emp.insert({_id:7788,ename:"SCOTT",job:"ANALYST",mgr:7566,sal:3000.00,deptno:20});
db.emp.insert({_id:7839,ename:"KING",job:"PRESIDENT",sal:5000.00,deptno:10});
db.emp.insert({_id:7844,ename:"TURNER",job:"SALESMAN",mgr:7698,sal:1500.00,comm:0.00,deptno:30});
db.emp.insert({_id:7876,ename:"ADAMS",job:"CLERK",mgr:7788,sal:1100.00,deptno:20, comm:null});
db.emp.insert({_id:7900,ename:"JAMES",job:"CLERK",mgr:7698,sal:950.00,deptno:30, comm:null});
db.emp.insert({_id:7902,ename:"FORD",job:"ANALYST",mgr:7566,sal:3000.00,deptno:20});
db.emp.insert({_id:7934,ename:"MILLER",job:"CLERK",mgr:7782,sal:1300.00,deptno:10});

Logical operators

db.emp.find({$or: [ {sal: {$gt: 4000}}, {job: ‘ANALYST’} ]})
db.emp.find({$and: [ {deptno: {$ne: 20}}, {job: {$ne: ‘SALESMAN’}} ]})
db.emp.find({ deptno: {$ne: 20}, job: {$ne: ‘SALESMAN’} })
db.emp.find(
- { $or: [ { $and: [ { job: ‘MANAGER’ }, { deptno: 30 } ] },
- { $and: [ { job: ‘SALESMAN’ }, { deptno: 30 }, { sal: { $lte: 1500 } } ] } ]})
db.emp.find(
- { $or: [ { job: ‘MANAGER’, deptno: 30 },
- { job: ‘SALESMAN’, deptno: 30, sal: { $lte: 1500 } } ]})

Aggregation Pipeline

Operators : $project(select), $match(where), $group(group by), $unwind(extray array), $lookup(left join), $out(into collection)

Sum of sal per job

db.emp.aggregate([
    // stage1: get the data grouped by job
    {
      $group: {
        _id: '$job',
        sum: {$sum: '$sal'}
      }
    }
  ])

Avg of sal per dept

db.emp.aggregate([
    // stage1: get the data grouped by deptno
    {
      $group: {
        _id: '$deptno',
        averageSalary: {$avg: '$sal'}
      }
    }
  ])

Print total sal, avg sal, max sal, min sal per job

db.emp.aggregate([
    // stage1: get the data grouped by job
    {
      $group: {
        _id: '$job',
        totalSalary: { $sum: '$sal' },
        averageSalary: { $avg: '$sal' },
        minSalary: { $min: '$sal' },
        maxSalary: { $max: '$sal' },
      }
    }
  ])

Print all jobs for which total sal is more than 5700

db.emp.aggregate([
    // stage1: get the data grouped by job
    {
      $group: {
        _id: '$job',
        totalSalary: {$sum: '$sal'}
      }
    },

    // stage2: match the condition [totalSalary > 5700]
    {
      $match: {
        totalSalary: { $gt: 5700 }
      }
    }
  ])

Display depts total sal in desc order

db.emp.aggregate([
    // stage1: get the data grouped by deptno
    {
      $group: {
        _id: '$deptno',
        totalSalary: {$sum: '$sal'}
      }
    },

    // stage2: get the records sorted in desc order
    {
      $sort: {
        totalSalary: -1
      }
    }
  ])

Find the dept that spends max on sal

db.emp.aggregate([
    // stage1: get the data grouped by deptno
    {
      $group: {
        _id: '$deptno',
        totalSalary: {$sum: '$sal'}
      }
    },

    // stage2: get the records sorted in desc order
    {
      $sort: {
        totalSalary: -1
      }
    },

    // stage3: limit the number of records to 1
    {
      $limit: 1
    }
  ])

Find the dept that spends minimum on sal

db.emp.aggregate([
    // stage1: get the data grouped by deptno
    {
      $group: {
        _id: '$deptno',
        totalSalary: {$sum: '$sal'}
      }
    },

    // stage2: get the records sorted in desc order
    {
      $sort: {
        totalSalary: 1
      }
    },

    // stage3: limit the number of records to 1
    {
      $limit: 1
    }
  ])

Display ename, deptno & sal of all emps whose sal >= 2500

db.emp.find(
    { sal: { $gte: 2500 } },
    { ename:1, deptno: 1, _id: 0, sal: 1 })
```

```
db.emp.aggregate([
    // stage1: find the emp having sal >= 2500
    {
      $match: {
        sal: { $gte: 2500 }
      }
    },

    // stage2: select the required columns
    {
      $project: {
        ename:1, deptno: 1, _id: 0, sal
      }
    }
  ])

Display ename, deptno & sal of all emps whose sal >= 2500 in the DESC order of sal

db.emp.aggregate([
    // stage1: find the emp having sal >= 2500
    {
      $match: {
        sal: { $gte: 2500 }
      }
    },

    // stage2: sort the records by sal in desc order
    {
      $sort: {
        sal: -1
      }
    },

    // stage3: select the required columns
    {
      $project: {
        ename:1, deptno: 1, _id: 0, sal: 1
      }
    }
  ])

Analyse data per dept per job [find the count of emps per deptno per job]

db.emp.aggregate([
    // stage1: find the emp per dept per job
    {
      $group: {
        _id: {
          deptno: '$deptno',
          job: '$job'
        },
        totalEmployees: {
          $sum: 1
        }
      }
    }
  ])

find the job with max AVG sal

db.emp.aggregate([
    // stage1: find the emp per dept per job
    {
      $group: {
        _id: '$job',
        averageSalary: {
          $avg: '$sal'
        }
      }
    },
    // stage2: sort the records with desc order
    {
      $sort: {
        averageSalary: -1
      }
    },
    // stage3: limit the result to 1 record
    {
      $limit: 1
    }
  ])

find number of managers, analysts and clerks in company

db.emp.aggregate([
    // stage1: match the job for getting managers, analysts and clerks
    {
      $match: {
        job: { $in: ['MANAGER', 'ANALYST', 'CLERK'] }
      }
    },

    // stage2: find the emp per dept per job
    {
      $group: {
        _id: '$job',
        count: {
          $sum: 1
        }
      }
    }
  ])

Print ename and dept name and dept location

db.emp.aggregate([
    // stage1: lookup the information from dept collection
    {
      $lookup: {
        from: 'dept',           // get the related dept record
        localField: 'deptno',   // the field which is used for finding the dept from dept collection
        foreignField: '_id',    // the dept field which matches with deptno of emp collection
        as: 'department'
      }
    },

    // stage2: unwind the array into object
    {
      $unwind: '$department'
    },

    // stage3: add new fields for dept name and location
    {
      $addFields: {
        deptName: '$department.dname',
        deptLocation: '$department.loc'
      }
    },

    // stage4: select required fields
    {
      $project: {
        deptName: 1, deptLocation: 1, ename: 1, _id: 0
      }
    }
  ])

print depts and emps in that dept

db.dept.aggregate([
    // stage1: find all employees belong to the dept
    {
      $lookup: {
        from: 'emp',           // get the related dept record
        localField: '_id',   // the field which is used for finding the dept from dept collection
        foreignField: 'deptno',    // the dept field which matches with deptno of emp collection
        as: 'employees'
      }
    },

    // stage2: get only dept name and employee name
    {
      $project: {
        dname: 1, 'employees.ename': 1, _id: 0
      }
    }
  ])

Print emp name and his manager details

db.emp.aggregate([
    // stage1: lookup the information from dept collection
    {
      $lookup: {
        from: 'emp',           // get the related dept record
        localField: 'mgr',   // the field which is used for finding the dept from dept collection
        foreignField: '_id',    // the dept field which matches with deptno of emp collection
        as: 'manager'
      }
    },

    // stage2: unwind the array into object
    {
      $unwind: '$manager'
    },

    // stage3: add new fields for dept name and location
    {
      $addFields: {
        managerName: '$manager.ename',
      }
    },

    // stage4: select required fields
    {
      $project: {
        ename: 1, managerName: 1, _id: 0
      }
    }
  ])