NoSQL, unlike SQL which has ER and class diagrams, has neither names nor constraints for data modeling diagrams. The obvious reason is the relaxed rules of NoSQL about relationships, which aim to get a developer started with minimum requirements.
Since data modeling diagram is the blueprint of any application we should always draw one. For tool I prefer diagrams.net (previously draw.io), where you simply need to drag and drop symbols into the canvas and align them. Its “Entity Relation” section on the left menu is most suitable for our modeling.
Example Problem Statement
Let us take a problem statement and draw our data model diagram around it in steps:
There’s a gift shop full of gift items. Multiple cashiers perform duty at different times of day and sell these items to customers. Customers are unregistered in the system but their name is inquired and put on the receipt.
In other words, we need to build a basic gift inventory and checkout system in light of the above information. So, we start by identifying obvious and implied entities.
The Entities
- Item
- Product - not mentioned but required to distinguish between say Hugo Boss perfume (product) and 10 bottles of it (items)
- Cashier
- Transaction - a more informative word than “sell” or “purchase”
And that’s all we need. Customer is not required because we just need its name and have no other information. A receipt is unnecessary as all data to print on the receipt is available in transactions.
Brand could be an entity of its own if we want to store anything more than its name. If it’s just a name, like ‘Nike’, we can store it as an indexed property of the Product, and group products by it for brand-related queries.
Listing Down Possible Queries
During the modeling process, it’s always helpful to plan ahead of the read and write operations, to be able to keep them efficient and retrieve meaningful information from our data.
1. Reading Data
Requirements change with time, and NoSQL is meant to accommodate that, but you must put in some thought about the initial minimum you can specify at the outset. In our case, it will be:
- Rank the best performing brands and products within a given time frame
- Identify the items that need restocking
- Transaction breakdown by any period of time: hourly, daily, weekly, monthly, yearly (we might need to draw and compare charts for insights)
- List down cashiers based on seniority
- A customer has lost the receipt, but they know what products they purchased and when. By querying the transactions, print them a duplicate receipt
- Get all products that are less than $50 (a customer may inquire the cashier)
Obviously, this is not an exhaustive list, but we get the basic idea.
2. Writing Data
For this part, we should mostly be concerned with how references are stored given the amount of expanding data and frequency of updates. In another detailed post, I’ve explained ways to model NoSQL 1 to 1, 1 to many, and many to many relationships along with their pros and cons. Briefly:
Ever Expanding Data
If the number of documents is ever-increasing, never put them or their references in another document. For example, a number of items will keep on increasing, so we can’t embed or add their ids as a reference inside product, as eventually it will become too big and run out of the maximum allocated space per document, not to mention slow to fetch. Instead, we should keep the product reference in each item.
Frequent Additions And Updates
If some document(s) require a lot of adds, updates, or deletes, don’t embed them or put their reference in a single place, because it could result in:
- Slowing down the database operations
- Possible Inconsistent data, if operations were performed not atomically (i.e. a document was fetched, modified on the client-side, and then updated. Between your fetch and update, somebody else might have updated the document, and now you have the outdated data that your update call will persist).
For example: You embed comments and like of users inside a post. The post gets thousands of comments and likes in a short time. Many would update their comments too. This means that the post will need to be updated thousands of times in a short interval. But if comments and likes had their own collections, every comment would have been added and updated separately and much more efficiently.
Now let’s move to the breakdown of all entities with diagrams, their relation with others, and example documents. Due to its familiarity, I’m using MongoDB’s collections as entities and documents as examples.
Product
{
id: 1,
name: "Dark Blue Cologne",
brand: "Hugo Boss",
price: 22.46
}
Cashier
{
id: 1,
name: "John Smith",
cellNum: "4565456789",
address: "Broadway Street house no. 12, Brooklyn, New York, US",
ssn: "011-72-XXXX"
.
.
.
}
Product Has Many Items
//products
{
id: 1,
name: "Dark Blue Cologne",
brand: "Hugo Boss",
price: 22.46
}
//items
{
id: 1,
productId: 1,
barcode: "0705632441947"
}
{
id: 2,
productId: 1,
barcode: "0705632441948"
}
{
id: 3,
productId: 1,
barcode: "0705632441949"
}
Note that in light of the above reasoning, we haven’t put embedded items or their references in an array inside the product document. We keep productId
as a reference in items instead.
Transaction
Transaction is a required associative collection to solve the following many-to-many problem:
- A cashier can sell an unlimited number of product items (Cashier John sells 15 sets of Hugo’s Dark Blue Cologne)
- A product item can be sold by many cashiers (Hugo’s Dark Blue Cologne is sold by cashiers John, Tony, Malinda, George, and 2 others)
salePrice
is necessary to know at what price the product item was sold (in case of discount it’ll be less)
It might be tempting to model transaction in a real-world scenario where a customer buys multiple items in one go, which are shown in the receipt as such. However, the transaction should represent only one sale. For buying multiple items, say 10 at a time, 10 transaction documents should be created. You can create another collection to group multiple transactions or simply place some unique identifier field (ex: transactionId
) in all transactions made collectively at one time.
Transaction Belongs To Product, Cashier And Item
This relationship can also be described as Product has many Transactions, Cashier has many Transactions, Item has many Transactions.
“Product has many Transactions” is completely optional and depends on requirements. For instance, when you need to generate yearly reports of sales and therefore need to know products in aggregated data, it’s easier to have a product reference in the transaction document than getting it through the item document (which will either require join operation if available in the database or multiple queries).
//products
{
id: 1,
name: "Dark Blue Cologne",
brand: "Hugo Boss",
price: 22.46
}
//items
{
id: 1,
productId: 1,
barcode: "0705632441947"
}
{
id: 2,
productId: 1,
barcode: "0705632441948"
}
{
id: 3,
productId: 1,
barcode: "0705632441949"
}
// cashiers
{
id: 1,
name: "John Smith",
cellNum: "4565456789",
address: "Broadway Street house no. 12, Brooklyn, New York, US",
ssn: "011-72-XXXX"
.
.
.
}
{
id: 2,
name: "Malinda Johnson",
cellNum: "4565456689",
address: "Harold Street house no. 133, Brooklyn, New York, US",
ssn: "011-73-XXXX"
.
.
.
}
//transactions
{
id: 1,
productId: 1,
itemId: 1,
customerName: "Daniel Witz",
cashierId: 1,
salePrice: 22 //salePrice to provide discount option
}
{
id: 1,
productId: 1,
itemId: 2,
customerName: "Glenn McDowell",
cashierId: 1,
salePrice: 22
}
{
id: 1,
productId: 1,
itemId: 3,
customerName: "Robert Williams",
cashierId: 2,
salePrice: 22
}
Finalizing The Diagram
We’ve finished our first design iteration and drawing! It efficiently takes care of the read and right considerations we listed down earlier. This basic diagram can be improved as the application development proceeds and the requirements are modified.
As noted earlier, NoSQL data modeling lacks conventional names and design principles similar to that of SQL. For this, it’s always better to include the symbols used in the diagram itself for ease of reading for everyone. Let’s add the entity and one-to-many symbols, the only two used.
Example Of Olympic Games
Let’s take another quick example and repeat the above steps to reach a final diagram.
Problem Statement
Create an application for Olympic Games
Entities
- OlympicEvent - to store data for Olympic events, such as the country it’s taking place in, year, etc.
- Stadium - where the games are played
- Athlete
- Official - the match officials
- Competition - to store the details of the actual competition
- Game - which is played by athletes and of which competition is conducted
Possible Queries
- Find all archery participants from London Olympics 2012
- Find countries that qualified for Table Tennis in Tokyo Olympics 2020
- Get the country name with the highest number of games in the Beijing Olympics 2008
- For the last 4 Olympic games, get all the gold and silver medalists medalists
Data Model Diagram
Relationships Explained
- An OlympicEvent conducts many Games (one-to-many)
- Many Games are played in a Stadium (one-to-many)
- A Game is played by many Athletes and an Athlete can play multiple Games (many-to-many)
- Many Athletes and Officials participate in multiple Competitions; And a Competition has many Athletes and Officials (Both many-to-many)
- A Competition can only be of one Game, but a Game is played by numerous Competition (one-to-many)
Note that for the many-to-many relationship between Athlete/Competition and Official/Competition, we simply keep an array of references in the Competition document, instead of keeping associate collections in between, because the competition won’t have an unlimited number of athletes or officials participating, nor will it be updated that frequently.