Studying Data Sharding using MongoDB

Technical requirements

  • MongoDB 2.6.
  • File cities.txt (already provided in the hands-on environment)

1 Context

NoSQL databases started gaining popularity in the 2000s when companies began investing and researching more into distributed databases. An important aspect of NoSQL databases is that they have no predefined schema. Records can have different fields as necessary. NoSQL databases, apart from using an Application Programming Interface(API) or query language to access and modify data, may also use the MapReduce method which is used for performing a specific function on an entire dataset and retrieving only the result.

2. Objective

The objective of this exercise is to illustrate the concept of sharding, a database partitioning technique for storing large data collections across multiple database servers. For this purpose, you will work with MongoDB, a document oriented database management system supporting different sharding strategies.

3. To-Do List (full description accessing to https://github.com/javieraespinosa/dxlab-sharding)