Big data for the rest of us

Not Just Hadoop

NoSQL in the
Enterprise

Talking about
What is BIG Data
BIG Data & you
Real world examples
The future of Big Data

@spf13

AKA
Steve Francia
16+ years building
the internet

Father, husband,
skateboarder

Chief Evangelist @
responsible for drivers,
integrations, web & writing

2000
Google Inc
Today announced it has released
the largest search engine on the
Internet.

Google’s new index, comprising
more than 1 billion URLs

2008
Our indexing system for processing
links indicates that
we now count 1 trillion unique URLs

(and the number of individual web
pages out there is growing by
several billion pages per day).

An unprecedented
amount of data is
being created and is
accessible

Data Growth 1,000
1000

750

500
500

250
250
120
55
4 10 24
1
0
2000 2001 2002 2003 2004 2005 2006 2007 2008

Millions of URLs

Truly Exponential
Growth
Is hard for people to grasp

A BBC reporter recently: "Your current PC
is more powerful than the computer they
had on board the ﬁrst ﬂight to the moon".

Moore’s Law
Applies to more than just CPUs

Boiled down it is that things double at
regular intervals

It’s exponential growth.. and applies to
big data

How BIG is it?
2007

2008
2005
2006
2003
2004
2001
2002

We’ve had BIG Data
needs for a long time

In 1998 Google won the search race
through custom software & infrastructure


In 2002 Amazon again wrote custom &
proprietary software to handle their BIG
Data needs


In 2006 Facebook started with off the
shelf software, but quickly turned to
developing their own custom built
solutions

Ability to handle big data is
one of the largest factors in
determining winners vs
losers.

For over a decade
BIG Data =
custom software

Why all this
talk about BIG
Data now?

In the past few
years open source
software emerged
enabling ‘us’ to
handle BIG Data

Doers & Tellers talking about
different things
http://www.slideshare.net/siliconangle/trendconnect-big-data-report-september

Doers talk a lot more about
actual solutions

They know it’s a two sided story

Storage

Processing

Take aways
MongoDB and Hadoop
MongoDB for storage &
operations
Hadoop for processing &
analytics

How MongoDB
enables big data
•Flexible schema
• Horizontal scale built in & free
•Operates at near speed of memory
• Optimized for modern apps

MongoDB @ Orbitz
Rob Lancaster
October 23 | 2012

Use Cases

• Hotel Data Collection
• Hotel Rate Feed:
• Supply hotel rates to Google for their Hotel Finder
• Uses MongoDB:
- Maintain state of data sent to Google
- Identify changes in rates as they occur
• Makes use of complex querying, secondary indexing
• EasyLoader:
• Feature allowing suppliers to easily load inventory to Orbitz
• Uses MongoDB to persist all changes for auditing purposes

29

Hotel Data Collection

• Goals:
• Better understand performance of
our hotel path
• Optimize our hotel rate cache
• Methods:
• Capture every action performed
by our hotel search engine.
• Persist this data for long periods.
• Challenges:
• Need high performance capture.
• Scalable, inexpensive storage.

30

Requirements

Collection Storage & Processing
• High write throughput • High data volume
• 500 servers • ~500 GB/day
• > 100 million documents/day • 7 TB/month compressed
• Flexibility • Scalable
• Complex extendable documents • Inexpensive
• No forced schema
• Proximity with other data
• Scalability
• Simplicity

31

The Solution

• Utilize MongoDB as a collector:
• ~ 500 clients
• Utilize unsafe writes for high
throughput
• Heterogeneous documents
• New collection for each hour
• HDFS for storage & processing:
• Data moved via M/R job:
- One job per collection
- One mapper per MongoDB instance
• Additional processing and analysis
by other jobs

32

Challenges & Conclusions

•Challenges? None really.
•Achieved a robust and simple solution
•MongoDB has been entirely worry free
• Very high write throughput
• Reads (well, full collection dumps across the wire) are
slower

33

The
Futureof
BIG
data

What is BIG?
BIG today is
normal tomorrow

Data Growth 9,000
9000

6750

4,400
4500

2,150
2250
1,000
500
55 120 250
1 4 10 24
0
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011

Millions of URLs

How BIG is it?
2011

2012
2009
2010
2007
2008
2005
2006

2012
Generating over
250 Millions of
tweets per day

MongoDB enables us to scale
with the redeﬁnition of BIG.

Tools like Hadoop are
enabling us to process the
new BIG.

MongoDB is
committed to working
with best data tools
including
Hadoop, Storm,
Disco, Spark & more

http://spf13.com
http://github.com/s
@spf13

Question
download at mongodb.org
We’re hiring!! Contact us at jobs@10gen.com

Big data for the rest of us

More Related Content

What's hot

Viewers also liked

Similar to Big data for the rest of us

More from Steven Francia

Recently uploaded

Big data for the rest of us

Editor's Notes