In contrast to SQL, NoSQL data modelling allows multiple ways to model 1 to 1, 1 to many and many to many relationships. It does not enforce rules or favors a particular design. The only constraint is application requirements.
Coming from SQL background, I struggled modeling Couchbase and MongoDb applications. Though badly put together at the time, Couchbase documents were of great help. In this post I’m sharing my understanding and way of approaching this problem with some examples.
Databases use different ways to group documents. I’ll use MongoDB’s “collection” because it’s familiar.
I’ve addressed How to create a NoSQL data model diagram separately.
1 to 1
Let’s take Client collection and list down all possible ways of one to one relationship with Address i.e. Client has Address or Address belongs to Client:
Primitive Values (Trivial Case)
Primitive values are by default 1 to 1. They make up the basic document. Client has id, firstName, lastName, registered and addresss. (Or id belongs to Client, address belongs to Client and so on)
{
id: 1,
firstName: "John",
lastName: "Smith",
registered: true,
address: "Broadway Street house no. 12, Brooklyn, New York, US"
}
Pros
- Complete information contains within the document and is easily retrievable without looking into other documents
Cons
- Replication will happen if a field value is shared by other documents. For instance
address: "Broadway Street house no. 12, Brooklyn, New York, US"
in multiple clients
Embedded/Sub Document
Preferred when the embedded document is short and unique or mostly unique i.e. each client has unique address except for few who share residence, in which case address replication will happen but this should be a rarity and can be ignored.
{
id: 1,
firstName: "John",
lastName: "Smith",
registered: true,
address: {
houseNum: 12,
street: "Broadway Street",
state: "New York",
city: "Brooklyn",
country: "US"
}
}
Pros
- Complete information contains within the document and is easily retrievable without looking into other documents
Cons
- Replication when embedded document is exactly the same in other documents
Referenced Collection
At times embedded document, though unique, gets very large. It can still be kept embedded but there’s a choice of making a separate collection and referencing it. In this way Address is not part of Client document but only linked with it via reference field (address here).
{
id: 1,
firstName: "John",
lastName: "Smith",
registered: true,
address: "address1"
}
{
id: "address1"
houseNum: 12,
street: "Broadway Street",
state: "New York",
city: "Brooklyn",
country: "US"
.
.
.
}
Usually important side keeps the reference which in the case is Client. But we can also link the same collections other way around by placing reference of Client in Address.
{
id: "client1",
firstName: "John",
lastName: "Smith",
registered: true
}
{
id: "address1"
houseNum: 12,
street: "Broadway Street",
state: "New York",
city: "Brooklyn",
country: "US"
client: "client1"
.
.
.
}
Pros
- Document size is kept in check by placing some piece of information in a separate document and only keeping its reference
Cons
- For full information two documents need fetching instead of one
1 to Many
Continuing with Client and Address example, let’s extend it so a client can have one or more addresses i.e. Client has many Addresses or many Addresses belong to Client
Array Of Primitive Values
{
id: 1,
firstName: "John",
lastName: "Smith",
registered: true,
address: ["Broadway Street house no. 12, Brooklyn, New York, US", "Harold Street house no. 77, Brooklyn, New York, US"]
}
Pros
- Complete information contains within the document
Cons
- Replication will happen if a field value is shared by other documents
Array OF Embedded/Sub Documents
{
id: 1,
firstName: "John",
lastName: "Smith",
registered: true,
address: [{
houseNum: 12,
street: "Broadway Street",
state: "New York",
city: "Brooklyn",
country: "US"
},
{
houseNum: 77,
street: "Harold Street",
state: "New York",
city: "Brooklyn",
country: "US"
}]
}
Pros
- Complete information contains within the document
Cons
- Replication when embedded document is exactly the same in other documents
- The document size will become unmanageable when embedded array gets too large e.g. Airplane has million of embedded Part documents
- Too many updates required to add/update/delete ever increasing sub documents.
- Too many simultaneous update operations will result in accessing the same document, which will slow down the operation and at worse make the data inconsistent (if the operations are not handled atomically i.e. the document changes between the find and update gap)
Array Of References
{
id: 1,
firstName: "John",
lastName: "Smith",
registered: true,
addresses: ["address1", "address2"]
}
{
id: "address1"
houseNum: 12,
street: "Broadway Street",
state: "New York",
city: "Brooklyn",
country: "US"
.
.
.
}
{
id: "address2"
houseNum: 77,
street: "Harold Street",
state: "New York",
city: "Brooklyn",
country: "US"
.
.
.
}
Pros
- Document size kept in check by only keeping references
- We only need to have one document to have all the information for further use. Example: Client document is retrieved. To fetch addresses all we need is the available references instead of searching the whole Address collection to query documents that belong to Client
Cons
- Even with only references in array, document size could go out of control if not planned properly. For example Video has million plus Likes (increasing still) and all of their references are kept in Video document as array
- Too many updates required in the document holding the references because of constant and ever increasing/updating references.
- Although better than embedded documents, it might also face some issues if there are so many simultaneous operations required to update the references, especially when the operations are not performed atomically.
Reference In “Belongs To” Side
{
id: "client1",
firstName: "John",
lastName: "Smith",
registered: true,
}
{
id: "address1",
houseNum: 12,
street: "Broadway Street",
state: "New York",
city: "Brooklyn",
country: "US",
client: "client1"
.
.
.
}
{
id: "address2",
houseNum: 77,
street: "Harold Street",
state: "New York",
city: "Brooklyn",
country: "US",
client: "client1"
.
.
.
}
Pros
- Document on “one” side doesn’t need to know its belongings and therefore doesn’t need frequent updates or large size.
Cons
- Since document has no reference of its belongings, the whole collection search is required to fetch them when needed (indexing solves this problem)
- If document is removed all its belongings must be removed/updated or they will be stale data of no use e.g. deleted Post with thousands of Comments where Comment has Post id.
Many to Many
Array Of Primitive Values
{
id: 1,
firstName: "John",
lastName: "Smith",
registered: true,
address: ["Broadway Street house no. 12, Brooklyn, New York, US", "Harold Street house no. 77, Brooklyn, New York, US"]
}
{
id: 2,
firstName: "George",
lastName: "Smith",
registered: true,
address: ["Broadway Street house no. 12, Brooklyn, New York, US"]
}
Pros
- Possibly two collections saved and with it lots of references and their management
Cons
- Guaranteed replication which means multiple addition/update/delete operations on array element change
Array OF Embedded Documents
{
id: 1,
firstName: "John",
lastName: "Smith",
registered: true,
address: [{
houseNum: 12,
street: "Broadway Street",
state: "New York",
city: "Brooklyn",
country: "US"
},
{
houseNum: 77,
street: "Harold Street",
state: "New York",
city: "Brooklyn",
country: "US"
}]
}
{
id: 2,
firstName: "William",
lastName: "Smith",
registered: true,
address: [{
houseNum: 12,
street: "Broadway Street",
state: "New York",
city: "Brooklyn",
country: "US"
}]
}
Pros
- Extra collections and reference management saved
Cons
- Multiple addition/update/delete operations on array embedded element change
- Document replication
Array Of References
{
id: 1,
firstName: "John",
lastName: "Smith",
registered: true,
addresses: ["address1", "address2"]
}
{
id: 2,
firstName: "George",
lastName: "Smith",
registered: true,
addresses: ["address1"]
}
{
id: "address1"
houseNum: 12,
street: "Broadway Street",
state: "New York",
city: "Brooklyn",
country: "US"
.
.
.
}
{
id: "address2"
houseNum: 77,
street: "Harold Street",
state: "New York",
city: "Brooklyn",
country: "US"
.
.
.
}
Cons
- Not scalable. When either side in many to many grows too large then this approach is hard to manage. For example, a Post can be liked by potentially millions of Users and a User can like unlimited number of Posts, so keeping array references of liking users in post or liked posts in user is infeasible
Connective/Associative Collection In Between
//Client
{
id: "client1",
firstName: "John",
lastName: "Smith",
registered: true,
addresses: ["address1", "address2"]
}
{
id: "client2",
firstName: "George",
lastName: "Smith",
registered: true,
addresses: ["address1"]
}
//ClientAddress
{
id: "clientAddress1"
client: "client1"
address: "address1"
}
{
id: "clientAddress2"
client: "client1"
address: "address2"
}
{
id: "clientAddress1"
client: "client2"
address: "address1"
}
//Address
{
id: "address1"
houseNum: 12,
street: "Broadway Street",
state: "New York",
city: "Brooklyn",
country: "US"
.
.
.
}
{
id: "address2"
houseNum: 77,
street: "Harold Street",
state: "New York",
city: "Brooklyn",
country: "US"
.
.
.
}
Pros
- The only scalable design for unbounded many to many relationship like a User can like unlimited number of Posts and a Post can be liked by unlimited number of Users
Cons
- Extra collection and therefore extra get/add/update/delete operations