MongoDB vs Hadoop Big Solutions for Big Problems Executive Summary
NoSQL, BigData: MongoDB vs vs Hadoop Hadoop •
•
• • • • • • • • •
MongoDB and Hadoop can replace relational database management systems (RDBMS), such as Oracle, SQL Server, DB2 and MySQL. This has been branded as “NoSQL” as they do not generally use SQL. Relational Database Systems were designed designed when disk disk space space and and CPU were more precious commodities than they are today. today. MongoDB and Hadoop exploit these highly available commodities commodities and scale with less effort. MongoDB and Hadoop distribute distribute the load across multiple machines more efciently. NoSQL/BigData NoSQL/Bi gData systems require less work work to scale to large concurrency or datasets While NoSQL solutions do not typically typically support RDBMS style transactions, transactions, they support support other methods of data integrity. Hadoop has been installed in larger systems than MongoDB. MongoDB is more efcient than Hadoop. Generally, MongoDB is optimal for typical websites websites and CRUD (Create, Retrieve, Update, Delete) or “transactional” applications. applications. Generally,, Hadoop replaces OLAP or reporting systems. Generally Hadoop does not provide high availability, but other vendors vendors have have derived solutions which do. MongoDB provides high availability, this may make expensive solutions such as Oracle RAC or SQL Server clustering unnecessary. unnecessary. In some some cases Hadoop and MongoDB are used together in the the same same system. system.
Technical T echnical Hadoop
MongoDB
Technology T echnology
Java
C ++ native code
Largest System
4000 nodes, 100 TB
100 nodes, 5 TB
Typical Usage
reporting analytics, “map-reduce” on large datasets
high volume systems, moderately sized data sets
Business Structure
Non-profit (Apache) backed by multiple vendors
For-profit (10gen)
Best Feature
Flexibility, map-reduce implementation
light weight, better performance
Biggest Limitation
single point of failure
each node is singlethreaded
Maturity
Widely deployed, but less productized
Newer, less widely deployed, but more productized
Complexity
High
Moderate
Organizational MongoDB Key Backers
Key Installations
• • • • • •
• • • • • • • •
10gen SAP Red Hat Flybridge Capital Partners Sequoia Capital Union Square Ventures
Craigslist UK Government (National Archives, UK.gov) Shuttery Forbes The New York Times Inuit FourSquare Lexis Nexis
Hadoop Key Backers
Key Installations
• • • •
• • • • • • • •
Cloudera (vendor) HortonWorks (vendor) IBM
Yahoo!
Adobedobe EBay Facebook FOX Audience Network Hulu LinkedIn NAVTEQ The New York Times
Bottom Line New systems which require higher end scalability can utilize MongoDB and/or Hadoop to scale with less effort. MongoDB can provide high availability at a fraction of the cost of Oracle RAC and similar solutions. Lack of standardization in this area will require writing specically to their interface. The technology is not as mature as RDBMS software, but already widely deployed and ready for use by mainstream busi nesses. Different capabilities in each of these implementations make them more readily appropriate to replace different types of systems (OLAP vs OTAP). A wider ecosystem around Apache Hadoop is positive but single vendor support of MongoDB may make a business/support arrangement more straightforward. Both onsite installations and Cloud Platform as a Service (PaaS) uses are possible with both MongoDB and Hadoop. See our detailed report at http://osintegrators.com/MongoAndHadoop
Contact www.osintegrators.com 345 W. Main St. Suite 201 Durham, NC 27701 (919) 321-0119
[email protected]