1
What Can Hadoop Do for You?
@EvaAndreasson | Sr. Product Manager, Cloudera
2014
CONFIDENTIAL - RESTRICTED
InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese an...
Presented at QCon London
www.qconlondon.com
Purpose of QCon
- to empower software development by facilitating the spread o...
Agenda
• Today’s data challenges
• Apache Hadoop and its ecosystem 101
• Common use cases
• Where to learn more
• Q&A
2
Today’s Data Challenges
3
• “Return On Bytes”
• How to cost efficiently query, manage, and store
TBs (or even PBs) of data?
• Pre-mature data death
...
• Enough time to process raw data before you need it
• Data ingest from sensors, cameras, feeds, streaming, logs,
user int...
• Costly adaption to new data types
• Saving account info, images, videos, url clicks, logs, and
transactional data – toge...
Apache Hadoop and its Ecosystem 101
7
8
A software framework that lets you find and
process data directly where it is stored and
where you only need to apply st...
Hadoop Distributed File System (HDFS)
9
MapReduce: A Parallel, Scalable Processing Framework
HDFS
The Apache Hadoop Ecosystem – a Zoo!
MapReduce
HDFS
The Apache Hadoop Ecosystem – a Zoo!
MapReduce
Yarn
HDFS
The Apache Hadoop Ecosystem – a Zoo!
Flume
MapReduce
Yarn
HDFS
The Apache Hadoop Ecosystem – a Zoo!
Flume
MapReduce
Hive Pig
Mahout,
Oryx, …
Yarn
HDFS
The Apache Hadoop Ecosystem – a Zoo!
Flume
MapReduce
HBase
Hive Pig
Mahout,
Oryx, …
ZooKeeper
Yarn
HDFS
The Apache Hadoop Ecosystem – a Zoo!
Flume
MapReduce
HBase
Hive Pig
Mahout,
Oryx, …
ZooKeeper
Yarn
Oozie
Hue
HDFS
The Apache Hadoop Ecosystem – a Zoo!
Flume
MapReduce
HBase
Hive Pig
Mahout,
Oryx, …
ZooKeeperOozie
Impala Solr
Hue
Ya...
HDFS
The Apache Hadoop Ecosystem – a Zoo!
Flume
MapReduce
HBase
Hive Pig
Mahout,
Oryx, …
ZooKeeperOozie
Hue
Yarn
Impala So...
Inter-
active
SQL
Distributed File System
The Hadoop Ecosystem – Explained!
Stream / event-based data ingest
Batch Process...
Example Enterprise Architecture: Use Hadoop as an EDH
The Enterprise Data Hub: One landing zone for all data
Some Typical Use cases
#1: Scale Data Processing at Low Cost
• Do what I usually do, but on a larger set of data
• Do my complex queries, but wit...
#2: Create an Active Archive
• Eliminate pre-mature data death
• Data as a (back-chargeable) service across the
organizati...
#3: Break Silos and Ask Bigger Questions
• What new insights can we achieve by combining
siloed data sets?
• What else can...
There are a Lot of Use Cases…
• Predictive analysis (event prediction)
• Anomaly detection
• Customer profiling
• Recommen...
Do the Same – and More – to a Lower Cost
• Network and storage solution company
• Create pro-active support
• 600000 “phon...
Increasing Revenue Through an Active Archive
• Global on-line retailer
• Need to correlate online/offline data
• 10+ year ...
Where To Learn More?
28
To Learn More…
1. Read some good stuff
• Order the Hadoop Operations book
(http://shop.oreilly.com/product/0636920025085.d...
Q&A
32 ©2012 Cloudera, Inc.
Watch the video with slide synchronization on
InfoQ.com!
http://www.infoq.com/presentations/hadoop-
usage-examples
What Can Hadoop Do for You?
Upcoming SlideShare
Loading in …5
×

What Can Hadoop Do for You?

1,395 views

Published on

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1isP0Rk.

Eva Andreasson presents typical categories of problems that are commonly solved using Hadoop and also some concrete examples in each category. Filmed at qconlondon.com.

Eva Andreasson has been working with Java virtual machine technologies, SOA, Cloud, and other enterprise middleware solutions for the past 10 years. She joined Cloudera in 2012. Recently launched Cloudera Search - a new real-time search workload natively integrated with the Hadoop stack - and drove the efforts that lead to 10x user uptake of Hue - the UI of the Hadoop ecosystem.

Published in: Technology, Education
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,395
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
0
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

What Can Hadoop Do for You?

  1. 1. 1 What Can Hadoop Do for You? @EvaAndreasson | Sr. Product Manager, Cloudera 2014 CONFIDENTIAL - RESTRICTED
  2. 2. InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /hadoop-usage-examples http://www.infoq.com/presentati ons/nasa-big-data http://www.infoq.com/presentati ons/nasa-big-data
  3. 3. Presented at QCon London www.qconlondon.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  4. 4. Agenda • Today’s data challenges • Apache Hadoop and its ecosystem 101 • Common use cases • Where to learn more • Q&A 2
  5. 5. Today’s Data Challenges 3
  6. 6. • “Return On Bytes” • How to cost efficiently query, manage, and store TBs (or even PBs) of data? • Pre-mature data death • Off-disk and archived data difficult and costly to access • Forced data silos • Costly copies and moves of data • Organizational blind-spots Key Challenge #1: Volume
  7. 7. • Enough time to process raw data before you need it • Data ingest from sensors, cameras, feeds, streaming, logs, user interactions… • Raw data structuring for various ETL and DB models Key Challenge #2: Velocity
  8. 8. • Costly adaption to new data types • Saving account info, images, videos, url clicks, logs, and transactional data – together? • Inflexible data models • Major surgery for future queries • Most data is modeled for questions we know will be asked… • Raw data value loss Key Challenge #3: Variety
  9. 9. Apache Hadoop and its Ecosystem 101 7
  10. 10. 8 A software framework that lets you find and process data directly where it is stored and where you only need to apply structure at query time What is Hadoop?
  11. 11. Hadoop Distributed File System (HDFS) 9
  12. 12. MapReduce: A Parallel, Scalable Processing Framework
  13. 13. HDFS The Apache Hadoop Ecosystem – a Zoo! MapReduce
  14. 14. HDFS The Apache Hadoop Ecosystem – a Zoo! MapReduce Yarn
  15. 15. HDFS The Apache Hadoop Ecosystem – a Zoo! Flume MapReduce Yarn
  16. 16. HDFS The Apache Hadoop Ecosystem – a Zoo! Flume MapReduce Hive Pig Mahout, Oryx, … Yarn
  17. 17. HDFS The Apache Hadoop Ecosystem – a Zoo! Flume MapReduce HBase Hive Pig Mahout, Oryx, … ZooKeeper Yarn
  18. 18. HDFS The Apache Hadoop Ecosystem – a Zoo! Flume MapReduce HBase Hive Pig Mahout, Oryx, … ZooKeeper Yarn Oozie Hue
  19. 19. HDFS The Apache Hadoop Ecosystem – a Zoo! Flume MapReduce HBase Hive Pig Mahout, Oryx, … ZooKeeperOozie Impala Solr Hue Yarn Spark
  20. 20. HDFS The Apache Hadoop Ecosystem – a Zoo! Flume MapReduce HBase Hive Pig Mahout, Oryx, … ZooKeeperOozie Hue Yarn Impala Solr Spark Sentry
  21. 21. Inter- active SQL Distributed File System The Hadoop Ecosystem – Explained! Stream / event-based data ingest Batch Processing KeyValue Store SQL Proc. Oriented Query Machine Learning Process MgmtWorkflow Mgmt GUI Resource and Workload Scheduler Free- Text Search Real Time Processing Access Control
  22. 22. Example Enterprise Architecture: Use Hadoop as an EDH The Enterprise Data Hub: One landing zone for all data
  23. 23. Some Typical Use cases
  24. 24. #1: Scale Data Processing at Low Cost • Do what I usually do, but on a larger set of data • Do my complex queries, but within a reasonable time
  25. 25. #2: Create an Active Archive • Eliminate pre-mature data death • Data as a (back-chargeable) service across the organization
  26. 26. #3: Break Silos and Ask Bigger Questions • What new insights can we achieve by combining siloed data sets? • What else can we find by asking questions over new types of data?
  27. 27. There are a Lot of Use Cases… • Predictive analysis (event prediction) • Anomaly detection • Customer profiling • Recommendation engines • Clickstream analysis • Image processing • Product and process improvements (feedback loops) • Genome sequence processing • Object matching • Path optimization • …
  28. 28. Do the Same – and More – to a Lower Cost • Network and storage solution company • Create pro-active support • 600000 “phone home” logs/week • SLA: 40% done within 18 hours • Complex queries taking weeks or not even possible to run • Expected future data growth of ~7TB a month! • With Hadoop et al and Cloudera’s help • Future proof scale • Faster and more flexible analytic capabilities • Correlation of disk latency with manufacturer (24 billion records) • 64x query performance improvement (from weeks to hours) • Pattern matching queries in the same infrastructure detecting bugs (240 billion records) • TCO freed up budget for other customer-focused projects
  29. 29. Increasing Revenue Through an Active Archive • Global on-line retailer • Need to correlate online/offline data • 10+ year history records, 1,000’s product categories • Large parts of data archived, complex to access • With Hadoop et al and Cloudera’s help • Unified long-term storage and processing over all data • Machine learning and query without data moves • Correlate all customer, product, and sales data ad hoc • Lead to targeted marketing and increased revenue streams
  30. 30. Where To Learn More? 28
  31. 31. To Learn More… 1. Read some good stuff • Order the Hadoop Operations book (http://shop.oreilly.com/product/0636920025085.do) and/or the Definitive Guide to Hadoop (http://shop.oreilly.com/product/0636920021773.do) • Visit Cloudera’s blog: blog.cloudera.com/ 2. Play on your own • Cloudera QuickStart VM: https://ccp.cloudera.com/display/SUPPORT/Cloudera+Manager+Free+Edition+Dem o+VM • View the videos at gethue.com 3. Get help and training • Join or send an email to: cdh-user@cloudera.org • Visit the Cloudera dev center: cloudera.com/content/dev-center/en/home.html • Get training: university.cloudera.com 4. Contact Cloudera • eva@cloudera.com • On-line contact form: http://cloudera.com/content/cloudera/en/about/contact- us/contact-form.html
  32. 32. Q&A
  33. 33. 32 ©2012 Cloudera, Inc.
  34. 34. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations/hadoop- usage-examples

×