open-source,high-performance, document-orienteddatabase
Schema Design Basics Alvin Richards
[email protected]
This talk Part One
Part Two
‣ Intro
‣ Data modeling
‣ Terms / Definitions
‣ Getting a flavor ‣ Creating a Schema
‣ Indexes ‣ Evolving the Schema
‣ DBRef
‣ Single Table Inheritance ‣ Many – Many ‣ Trees ‣ Lists / Queues / Stacks
So why model data?
A brief history of normalization • 1970 E.F.Codd introduces 1st Normal Form (1NF) • 1971 E.F.Codd introduces 2nd and 3rd Normal Form (2NF, 3NF) • 1974 Codd & Boyce define Boyce/Codd Normal Form (BCNF) • 2002 Date, Darween, Lorentzos define 6th Normal Form (6NF) Goals: • Avoid anomalies when inserting, updating or deleting • Minimize redesign when extending the schema • Make the model informative to users • Avoid bias towards a particular style of query * source : wikipedia
Relational made normalized data look like this
Document databases make normalized data look like this
Some terms before we proceed RDBMS
Document DBs
Table
Collection
Row(s)
SON Document
Index
Index
oin Partition
Embedding & Linking across documents Shard
Partition Part ition Key Key
Shard Key
DB Considerations Considerations How can we manipulate this data ?
• • • •
Dynamic Queries Secondary Indexes Atomic Updates Map Reduce
Access Patterns ?
• • • •
Read / Write Ratio Types of updates Types of queries Data life-cycle
Considerations No Joins Document writes are atomic •
•
Design Session Design documents that simply map to your application post={author post={author:“kyle”, :“kyle”, date date:newDate(), :newDate(), text text:“myblogpost...”, :“myblogpost...”, tags tags:[“mongodb”,“intro”]} :[“mongodb”,“intro”]} >db.post.save(post)
Find the document >db.posts.find() { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "kyle", date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)", text : "My first blog", tags : [ "mongodb", "intro" ] } Notes: • ID must be unique, but can be anything you’d like • MongoDB will generate a default ID if one is not supplied
Add and index, find via Index Secondary index for “author” // 1 means ascending, -1 means descending >db.posts.ensureIndex({author >db.posts.ensureIndex({ author:: 1}) >db.posts.find({author >db.posts.find({author:: 'kyle'}) { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "kyle", ... }
Verifying indexes exist >db.system.indexes.find() // Index on ID { name : "_id_", ns : "test.posts", key : { "_id" : 1 } }
// Index on author { _id : ObjectId("4c4ba6c5672c685e5e8aabf4"), ns : "test.posts", key : { "author" : 1 }, name : "author_1" }
Query operators Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne, // find posts with any tags >db.posts.find({tags >db.posts.find({tags:: {$exists: true}})
Query operators Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne, // find posts with any tags >db.posts.find({tags >db.posts.find({tags:: {$exists: true}}) Regular expressions: expressions: // posts where author starts with k >db.posts.find({author >db.posts.find({author:: /^k*/i })
Query operators Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne, // find posts with any tags >db.posts.find({tags >db.posts.find({tags:: {$exists: true}}) Regular expressions: expressions: // posts where author starts with k >db.posts.find({author >db.posts.find({author:: /^k*/i }) Counting: // posts written by mike >db.posts.find({author >db.posts.find({ author:“mike”}).count() :“mike”}).count()
Extending the Schema new_comment = {author {author:: “fred”, date:: new Date(), date text:: “super duper”} text new_info = { ‘$push’: {comments { comments:: new_comment}, ‘$inc’: {comments_count {comments_count:: 1}} >db.posts.update({_id >db.posts.update({ _id:“...”},new_info) :“...”},new_info)
Extending the Schema { _id : ObjectId("4c4ba5c0672c685e5e8aab ObjectId("4c4ba5c0672c685e5e8aabf3"), f3"), author : "kyle", date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)", text : "My first blog", tags : [ "mongodb", "intro" ], comments_count:: 1, comments_count comments : [ { author : "Fred", date : "Sat Jul 24 2010 20:51:03 GMT-0700 (PDT)", text : "Super Duper" } ]}
Extending the Schema // create index on nested documents: >db.posts.ensureIndex({"comments.author": >db.posts.ensureIndex({"comments.auth or": 1}) >db.posts.find({comments.author:”kyle”})
Extending the Schema // create index on nested documents: >db.posts.ensureIndex({"comments.author": >db.posts.ensureIndex({"comments.auth or": 1}) >db.posts.find({comments.author:”kyle”}) // find last 5 posts: >db.posts.find().sort({date >db.posts.find().sort({ date:-1}).limit(5) :-1}).limit(5)
Extending the Schema // create index on nested documents: >db.posts.ensureIndex({"comments.author": >db.posts.ensureIndex({"comments.auth or": 1}) >db.posts.find({comments.author:”kyle”}) // find last 5 posts: >db.posts.find().sort({date >db.posts.find().sort({ date:-1}).limit(5) :-1}).limit(5) // most commented post: >db.posts.find().sort({comments_count >db.posts.find().sort({ comments_count:-1}).limit(1) :-1}).limit(1) When sorting, check if you need an index
Map Reduce Aggregation and batch manipulation Collection in, Collection out Parallel in sharded environments
Map reduce mapFunc = function () { this.tags.forEach(function this.tags.forEach(function (z) {emit(z, {count:1});}); } reduceFunc = function (k, v) { var total = 0; for (var i = 0; i < v.length; i++) { total += v[i].count; v[i].count; } return {count:total}; } res = db.posts.mapReduce(mapFunc, db.posts.mapReduce(mapFunc, reduceFunc)
>db[res.result].find() { _id : "intro", value : { count : 1 } } { _id : "mongodb", value : { count : 1 } }
Review So Far: - Started out with a simple schema - Queried Data - Evolved the schema - Queried / Updated the data some more
Wordnik 9B records, 100M queries / week, 1.2TB { entry : { header:: { id: 0, header headword:: "m", headword sourceDictionary:: "GCide", sourceDictionary textProns : [ {text: "(em)", seq:0} ], syllables:: [ syllables {id: 0, text: "m"} ], sourceDictionary: "1913 Webster", sourceDictionary: headWord:: "m", headWord id:: 1, id definitions:: : [ definitions {text: "M, the thirteenth letter..."}, {text: "As a numeral, M stands for 1000"}] } } }
Review So Far: - Started out with a simple schema - Queried Data - Evolved the schema - Queried / Updated the data some more Observations: - Using Rich Documents works well - Simplify relations by embedding them - Iterative development is easy with MongoDB
Single Table Inheritance >db.shapes.find() { _id _id:: ObjectId("..."), type: type: "circle", area area:: 3.14, radius radius:: 1} { _id _id:: ObjectId("..."), type type:: "square", area area:: 4, d: 2} { _id _id:: ObjectId("..."), type type:: "rect", area area:: 10, length length:: 5, width width:: 2}
// find shapes where radius > 0 >db.shapes.find({radius >db.shapes.find({ radius:: {$gt {$gt:: 0}})
// create index >db.shapes.ensureIndex({ radius radius:: 1})
One to Many - Embedded Array / Array Keys - slice operator to return subset of array - hard to find latest comments across all documents
One to Many - Embedded Array / Array Keys - slice operator to return subset of array - hard to find latest comments across all documents - Embedded tree - Single document - Natural - Hard to query
One to Many - Embedded Array / Array Keys - slice operator to return subset of array - hard to find latest comments across all documents - Embedded tree - Single document - Natural - Hard to query - Normalized (2 collections) - most flexible - more queries
Many - Many Example: - Product can be in many categories - Category can have many products
Products - product_id Prod_Categories - id - product_id - category_id
Category - category_id
Many - Many products: { _id: _id: ObjectId("4c4ca23933fb5941681b912e"), name:: "Sumatra Dark Roast", name category_ids:: [ ObjectId("4c4ca25433fb5941681b912f"), category_ids ObjectId("4c4ca25433fb5941681b92af”]}
Many - Many products: { _id: _id: ObjectId("4c4ca23933fb5941681b912e"), name:: "Sumatra Dark Roast", name category_ids:: [ ObjectId("4c4ca25433fb5941681b912f"), category_ids ObjectId("4c4ca25433fb5941681b92af”]}
categories: { _id: _id: ObjectId("4c4ca25433fb5941681b912f"), name:: "Indonesia", name product_ids:: [ ObjectId("4c4ca23933fb5941681b912e"), product_ids ObjectId("4c4ca30433fb5941681b9130"), ObjectId("4c4ca30433fb5941681b913a"]}
Many - Many products: { _id: _id: ObjectId("4c4ca23933fb5941681b912e"), name:: "Sumatra Dark Roast", name category_ids:: [ ObjectId("4c4ca25433fb5941681b912f"), category_ids ObjectId("4c4ca25433fb5941681b92af”]}
categories: { _id: _id: ObjectId("4c4ca25433fb5941681b912f"), name:: "Indonesia", name product_ids:: [ ObjectId("4c4ca23933fb5941681b912e"), product_ids ObjectId("4c4ca30433fb5941681b9130"), ObjectId("4c4ca30433fb5941681b913a"]}
//All categories for a given product >db.categories.find({product_ids >db.categories.find({ product_ids:: ObjectId("4c4ca23933fb5941681b9 ObjectId("4c4ca23933fb5941681b912e")}) 12e")})
Many - Many products: { _id: _id: ObjectId("4c4ca23933fb5941681b912e"), name:: "Sumatra Dark Roast", name category_ids:: [ ObjectId("4c4ca25433fb5941681b912f"), category_ids ObjectId("4c4ca25433fb5941681b92af”]}
categories: { _id: _id: ObjectId("4c4ca25433fb5941681b912f"), name:: "Indonesia", name product_ids:: [ ObjectId("4c4ca23933fb5941681b912e"), product_ids ObjectId("4c4ca30433fb5941681b9130"), ObjectId("4c4ca30433fb5941681b913a"]}
//All categories for a given product >db.categories.find({product_ids >db.categories.find({ product_ids:: ObjectId("4c4ca23933fb5941681b9 ObjectId("4c4ca23933fb5941681b912e")}) 12e")})
//All products for a given category >db.products.find({category_ids >db.products.find({category_ids:: ObjectId("4c4ca25433fb5941681b912f")})
Alternative products: { _id _id:: ObjectId("4c4ca23933fb5941681b912e"), name:: "Sumatra Dark Roast", name category_ids:: [ ObjectId("4c4ca25433fb5941681b912f"), category_ids ObjectId("4c4ca25433fb5941681b92af”]} categories: { _id _id:: ObjectId("4c4ca25433fb5941681b912f"), name:: "Indonesia"} name
Alternative products: { _id _id:: ObjectId("4c4ca23933fb5941681b912e"), name:: "Sumatra Dark Roast", name category_ids:: [ ObjectId("4c4ca25433fb5941681b912f"), category_ids ObjectId("4c4ca25433fb5941681b92af”]} categories: { _id _id:: ObjectId("4c4ca25433fb5941681b912f"), name:: "Indonesia"} name
// All products for a given category >db.products.find({category_ids >db.products.find({category_ids:: ObjectId("4c4ca25433fb5941681b912f")})
Alternative products: { _id _id:: ObjectId("4c4ca23933fb5941681b912e"), name:: "Sumatra Dark Roast", name category_ids:: [ ObjectId("4c4ca25433fb5941681b912f"), category_ids ObjectId("4c4ca25433fb5941681b92af”]} categories: { _id _id:: ObjectId("4c4ca25433fb5941681b912f"), name:: "Indonesia"} name
// All products for a given category >db.products.find({category_ids >db.products.find({category_ids:: ObjectId("4c4ca25433fb5941681b912f")})
// All categories for a given product product = db.products.find(_id db.products.find(_id : some_id) >db.categories.find({_id >db.categories.find({ _id : {$in : product.category_ids}})
Trees Full Tree in Document { comments: comments: [ { author author:: “rpb”, text text:: “...”, replies:: [ replies {author author:: “Fred”, text text:: “...”, replies:: []} replies ]} ]}
Pros: Single Document, Performance, Intuitive Cons: Hard to search, Partial Results, 4MB limit
Trees Parent Links - Each node is stored as a document - Contains the id of the parent Child Links - Each node contains the id’s of the children - Can support graphs (multiple parents / child)
Array of Ancestors - Store Ancestors of a node { { { { { { {
_id: "a" } _id: _id:: "b", ancestors _id ancestors:: [ "a" ], parent parent:: "a" } _id:: "c", ancestors _id ancestors:: [ "a", "b" ], parent parent:: "b" } _id:: "d", ancestors _id ancestors:: [ "a", "b" ], parent parent:: "b" } _id:: "e", ancestors _id ancestors:: [ "a" ], parent parent:: "a" } _id:: "f", ancestors _id ancestors:: [ "a", "e" ], parent parent:: "e" } _id:: "g", ancestors _id ancestors:: [ "a", "b", "d" ], parent parent:: "d" }
Array of Ancestors - Store Ancestors of a node { { { { { { {
_id: "a" } _id: _id:: "b", ancestors _id ancestors:: [ "a" ], parent parent:: "a" } _id:: "c", ancestors _id ancestors:: [ "a", "b" ], parent parent:: "b" } _id:: "d", ancestors _id ancestors:: [ "a", "b" ], parent parent:: "b" } _id:: "e", ancestors _id ancestors:: [ "a" ], parent parent:: "a" } _id:: "f", ancestors _id ancestors:: [ "a", "e" ], parent parent:: "e" } _id:: "g", ancestors _id ancestors:: [ "a", "b", "d" ], parent parent:: "d" }
//find all descendants of b: >db.tree2.find({ancestors >db.tree2.find({ancestors:: ‘b’})
Array of Ancestors - Store Ancestors of a node { { { { { { {
_id: "a" } _id: _id:: "b", ancestors _id ancestors:: [ "a" ], parent parent:: "a" } _id:: "c", ancestors _id ancestors:: [ "a", "b" ], parent parent:: "b" } _id:: "d", ancestors _id ancestors:: [ "a", "b" ], parent parent:: "b" } _id:: "e", ancestors _id ancestors:: [ "a" ], parent parent:: "a" } _id:: "f", ancestors _id ancestors:: [ "a", "e" ], parent parent:: "e" } _id:: "g", ancestors _id ancestors:: [ "a", "b", "d" ], parent parent:: "d" }
//find all descendants of b: >db.tree2.find({ancestors >db.tree2.find({ancestors:: ‘b’})
//find all ancestors of f: >ancestors = db.tree2.findOne({_id db.tree2.findOne({ _id:’f’}).ancestors :’f’}).ancestors >db.tree2.find({_id >db.tree2.find({_id:: { $in : ancestors})
findAndModify Queue example //Example: find highest priority job and mark job = db.jobs.findAndModify({ query:: {inprogress: false}, query sort:: sort {priority: -1), update:: {$set: {inprogress: true, update started: new Date()}}, new:: true}) new
Cool Stuf - Aggregation - Capped collections - GridFS - Geo
Learn More • Kyle’s presentation + video: http://www.slideshare.net/kbanker/mongodb-schema-design http://www.blip.tv/file/3704083
• Dwight’s presentation http://www.slideshare.net/mongosf/schema-design-with-mongodb-dwightmerriman
• Documentation Trees: http://www.mongodb.org/display/DOCS/Trees+in+MongoDB Queues: http://www.mongodb.org/display/DOCS/findandmodify+Command Aggregration: http://www.mongodb.org/display/DOCS/Aggregatio http://www. mongodb.org/display/DOCS/Aggregation n Capped Col. : http://www.mongodb.org/display/DOCS/Capped+Collections Geo: http://www.mongodb.org/display/DOCS/Geospatial+In http://www.mongodb.org/display/DOCS/Geospatial+Indexing dexing GridFS: http://www.mongodb.org/display/DOCS/GridFS+Specification
Thank You :-)
Download MongoDB andletusknowwhatyouthink @mongodb http://www.mongodb.org
DBRef DBRef {$ref : collection, $id $id:: id_value} - Think URL - YDSMV: your driver support may vary
Sample Schema: nr = {note_refs note_refs:: [{"$ref" : "notes", "$id" : 5}, ... ]} Dereferencing: nr.forEach(function(r) { nr.forEach(function(r) printjson(db[r.$ref].findOne({_id printjson(db[r.$ref].findOne({ _id:: r.$id})); }
BSON Mongodb stores data in BSON internally Lightweight, Traversable, Ecient encoding Typed boolean, integer, float, date, string, binary, array...