2. • 1970's Relational Databases Invented
– Storage is expensive
– Data is normalized
– Data storage is abstracted away from app
2
3. • 1970's Relational Databases Invented
– Storage is expensive
– Data is normalized
– Data storage is abstracted away from app
• 1980's RDBMS commercialized
– Client/Server model
– SQL becomes the standard
3
4. • 1970's Relational Databases Invented
– Storage is expensive
– Data is normalized
– Data storage is abstracted away from app
• 1980's RDBMS commercialized
– Client/Server model
– SQL becomes the standard
• 1990's Things begin to change
– Client/Server=> 3-tier architecture
– Rise of the Internet and the Web
4
5. • 2000's Web 2.0
– Rise of "Social Media"
– Acceptance of E-Commerce
– Constant decrease of HW prices
– Massive increase of collected data
5
6. • 2000's Web 2.0
– Rise of "Social Media"
– Acceptance of E-Commerce
– Constant decrease of HW prices
– Massive increase of collected data
• Result
– Constant need to scale dramatically
– How can we scale? 6
7. + complex transactions
+ tabular data
+ ad hoc queries
- O<->R mapping hard
- speed/scale problems
- not super agile
BI / OLTP /
reporting operational
7
8. + complex transactions
+ ad hoc queries + tabular data
+ SQL standard + ad hoc queries
protocol between - O<->R mapping hard
clients and servers - speed/scale problems
+ scales horizontally - not super agile
better than oper dbs.
- some scale limits at
massive scale BI / OLTP /
- schemas are rigid reporting operational
- no real time; great at
bulk nightly data loads
8
9. + complex transactions
+ ad hoc queries + tabular data
+ SQL standard + ad hoc queries
protocol between - O<->R mapping hard
clients and servers - speed/scale problems
+ scales horizontally - not super agile
better than oper dbs.
- some scale limits at
massive scale BI / OLTP /
- schemas are rigid reporting operational
- no real time; great at
bulk nightly data loads
fewer issues
9
here
10. + complex transactions
+ ad hoc queries + tabular data
+ SQL standard + ad hoc queries
protocol between - O<->R mapping hard
clients and servers - speed/scale problems
+ scales horizontally - not super agile
better than oper dbs.
- some scale limits at
massive scale BI / OLTP /
- schemas are rigid reporting operational
- no real time; great at
bulk nightly data loads
a lot more
fewer issues issues here
10
here
11. + complex transactions
+ ad hoc queries + tabular data
+ SQL standard + ad hoc queries
protocol between - O<->R mapping hard
clients and servers - speed/scale problems
+ scales horizontally - not super agile
better than oper dbs.
- some scale limits at
massive scale BI / OLTP /
- schemas are rigid reporting operational
- no real time; great at caching
bulk nightly data loads
app layer
flat files partitioning
map/reduce
11
12. • Agile Development
Methodology
• Shorter development cycles
• Constant evolution of
requirements
• Flexibility at design time
12
13. • Agile Development
Methodology
• Shorter development cycles
• Constant evolution of
requirements
• Flexibility at design time
• Relational Schema
• Hard to evolve
• long painful migrations
• must stay in sync with
application
• few developers interact directly
13
18. + speed and scale
- ad hoc query limited
- not very transactional
- no sql/no standard
+ fits OO well
scalable + agile
nonrelational
BI / reporting (“nosql”)
OLTP /
operational
18
19. Non-relational next generation
operation data stores and databases
A collection of very different products
• Different data models (Not relational)
• Most are not using SQL for queries
• No predefined schema
• Some allow flexible data structures
19
27. • Designed and developed by founders of Doubleclick,
ShopWiki, GILT groupe, etc.
• Coding started fall 2007
• First production site March 2008 -
businessinsider.com
• Open Source – AGPL, written in C++
• Version 0.8 – first official release February 2009
• Version 1.0 – August 2009
• Version 2.0 – September 2011
27
41. • Scale linearly
• High Availability
• Increase capacity with no downtime
• Transparent to the application
41
42. Replica Sets
• High Availability/Automatic Failover
• Data Redundancy
• Disaster Recovery
• Transparent to the application
• Perform maintenance with no down time
42
62. Content Management Operational Intelligence E-Commerce
User Data Management High Volume Data Feeds
62
63. Wordnik uses MongoDB as the foundation for its “live” dictionary that stores its entire
text corpus – 3.5T of data in 20 billion records
Problem Why MongoDB Impact
Analyze a staggering amount of Migrated 5 billion records in a Reduced code by 75%
data for a system build on single day with zero downtime compared to MySQL
continuous stream of high- MongoDB powers every Fetch time cut from 400ms to
quality text pulled from online website request: 20m API calls 60ms
sources per day Sustained insert speed of 8k
Adding too much data too Ability to eliminate memcached words per second, with
quickly resulted in outages; layer, creating a simplified frequent bursts of up to 50k per
tables locked for tens of system that required fewer second
seconds during inserts resources and was less prone to Significant cost savings and 15%
Initially launched entirely on error. reduction in servers
MySQL but quickly hit
performance road blocks
Life with MongoDB has been good for Wordnik. Our code is faster, more flexible and dramatically smaller.
Since we don’t spend time worrying about the database, we can spend more time writing code for our
application. -Tony Tam, Vice President of Engineering and Technical Co-founder
63
64. Intuit relies on a MongoDB-powered real-time analytics tool for small businesses to
derive interesting and actionable patterns from their customers’ website traffic
Problem Why MongoDB Impact
Intuit hosts more than 500,000 MongoDB's querying and In one week Intuit was able to
websites Map/Reduce functionality could become proficient in MongoDB
wanted to collect and analyze server as a simpler, higher- development
data to recommend conversion performance solution than a Developed application features
and lead generation complex Hadoop more quickly for MongoDB than
improvements to customers. implementation. for relational databases
With 10 years worth of user The strength of the MongoDB MongoDB was 2.5 times faster
data, it took several days to community. than MySQL
process the information using a
relational database.
We did a prototype for one week, and within one week we had made big progress. Very big progress. It
was so amazing that we decided, “Let’s go with this.” -Nirmala Ranganathan, Intuit
64
65. Shutterfly uses MongoDB to safeguard more than six billion images for millions of
customers in the form of photos and videos, and turn everyday pictures into keepsakes
Problem Why MongoDB Impact
Managing 20TB of data (six JSON-based data structure 500% cost reduction and 900%
billion images for millions of Provided Shutterfly with an performance improvement
customers) partitioning by agile, high performance, compared to previous Oracle
function. scalable solution at a low cost. implementation
Home-grown key value store on Works seamlessly with Accelerated time-to-market for
top of their Oracle database Shutterfly’s services-based nearly a dozen projects on
offered sub-par performance architecture MongoDB
Codebase for this hybrid store Improved Performance by
became hard to manage reducing average latency for
High licensing, HW costs inserts from 400ms to 2ms.
The “really killer reason” for using MongoDB is its rich JSON-based data structure, which offers Shutterfly
an agile approach to develop software. With MongoDB, the Shutterfly team can quickly develop and
deploy new applications, especially Web 2.0 and social features. -Kenny Gorman, Director of Data Services
65