Making Sense of Big data with Hadoop

Making Sense of

BIG DATA with Hadoop

● 13 years with a pager
● Oracle ACE Director
● Oak table member
● Senior consultant for Pythian
● @gwenshap
● http://www.pythian.com/news/
author/shapira/
● shapira@pythian.com

© 2012 Pythian

Pythian
Recognized Leader:
• Global industry-leader in remote database administration services and consulting for
Oracle, Oracle Applications, MySQL and Microsoft SQL Server

• Work with over 165 multinational companies such as LinkShare Corporation, IGN
Entertainment, CrowdTwist, TinyCo and Western Union to help manage their
complex IT deployments

Expertise:
• One of the world’s largest concentrations of dedicated, full-time DBA expertise.
Employ 7 Oracle ACEs/ACE Directors. Heavily involved in the MySQL community,
driving the MySQL Professionals Group and sit on the IOUG Advisory Board for MySQL.

• Hold 7 Specializations under Oracle Platinum Partner program, including Oracle
Exadata, Oracle GoldenGate & Oracle RAC

Global Reach & Scalability:
• 24/7/365 global remote support for DBA and consulting, systems administration,
special projects or emergency response

3 © 2012 Pythian

MORE DATA THAN
YOU CAN HANDLE

© 2012 Pythian

MORE DATA THAN
RELATIONAL
DATABASES
CAN HANDLE

© 2012 Pythian

MORE DATA THAN
RELATIONAL
DATABASES
CAN HANDLE
CHEAPLY

© 2012 Pythian

Data Arriving at fast Rates
Typically unstructured
Stored without aggregation
Analyzed in Real Time
For Reasonable Cost

© 2012 Pythian

Complex Data Architecture

© 2012 Pythian

Your Data
is NOT
as BIG
as you think

© 2012 Pythian

BECAUSE WE CAN

© 2012 Pythian

More Data Beats Smarter
Algorithms

© 2012 Pythian

email
Photos
Job posting

Tweets Video Medical
imaging
Sensors Blog posts
Tags Scanned docs
© 2012 Pythian

An Imperial College Team found:
•3,000 patients under 19 were treated in geriatric clinics

• between 15,000 and 20,000 men have been admitted to
obstetric wards

•and almost 10,000 to gynecology wards

http://www.straightstatistics.org/blog/2012/04/06/why-are-so-many-men-pregnant

16 © 2012 Pythian

Unstructured
Eventually Structured Data

Scalable Storage
+
Massive Parallel Processing
+
Reasonable Cost

© 2012 Pythian

Hadoop: Platform for distributed
computing

© 2012 Pythian

Hadoop is Scalable. But not fast.

© 2012 Pythian

Assumptions
• Lots of data
• Large Files
• Unstructured
• Scan entire files
• Unreliable Hardware
• Adding servers = increase capacity

© 2012 Pythian

Principles
• Bring Code to Data
• Share Nothing

© 2012 Pythian

HDFS
• Distributed
• Replicated
• Big Files
• Write Once
• Read Entire File

© 2012 Pythian

/users/shapira/log-1, blocks {1,4,5}
/users/shapira/log-2, blocks {2,3,6}

1 4 5 2 3

1 4
5

2 4 1 3 2 3

6 6 5 6

© 2012 Pythian

Map Reduce
Combine
Map Reduce
Start
Map Stop
Job 1 Reduce?
Job 1
… …

Map Reduce?

Hadoop Job
Results
Combine
Map Reduce

Start Map Reduce?
Job 2 Stop
… Job 1
…

Map
Reduce?

Implementation
• Balance disks, cores and RAM
• High Bandwidth
• More nodes or better nodes?

© 2012 Pythian

It’s about the Ecosystem
• Sqoop
• Flume
• Hive
• Pig
• HBase

© 2012 Pythian

Use Case:
ETL
BI

OLTP DWH

© 2012 Pythian

Use case:
Listening to the crowd

© 2012 Pythian

Our customers use Hadoop for:
• Storing lots of pre-processed data
• Merging different data types
• Scalable data processing
• Advanced data processing

34 © 2012 Pythian

Easy case:
Your CTO heard about Big Data
And is eager to invest.
You have a Big Budget.

© 2012 Pythian

Sneak Hadoop to Your Business
• Find an important business problem
• Acquire data (be sneaky!)
• Get the tools: R, Hadoop, Tableau
• Laptops, desktops, test servers
• Analyze data
• Make pretty charts
• Get business used to it
• Wait for an Outage
• PROFIT!

© 2012 Pythian

Oracle Big Data
The “ETL Machine”

Software
Oracle NoSQL
Cloudera Hadoop Distribution
Oracle Loader for Hadoop
Data Integrator for Hadoop
Direct Connector for Hadoop
Oracle Connector for R

© 2012 Pythian

Thank you & Q&A
To contact us…

sales@pythian.com

1-866-PYTHIAN

To follow us…
http://www.pythian.com/news/

http://www.facebook.com/pages/The-Pythian-Group/

http://twitter.com/pythian

http://www.linkedin.com/company/pythian

49 © 2012 Pythian

Making Sense of Big data with Hadoop

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Making Sense of Big data with Hadoop

Similar to Making Sense of Big data with Hadoop (20)

More from Gwen (Chen) Shapira

More from Gwen (Chen) Shapira (20)

Recently uploaded

Recently uploaded (20)

Making Sense of Big data with Hadoop

Editor's Notes