Your SlideShare is downloading. ×
Hadoop for the disillusioned
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Hadoop for the disillusioned


Published on

Published in: Technology, News & Politics

1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • Hadoop is not new - NY Time Source:
  • Wired Source:
  • Source: Gartner Hype Cycle -
    “Big Data is a fad”, “Its just BI 2.0”, “This is all just hype”, “We can’t figure out how to use it”, “There’s nothing new here”, “It’s not ready”, “Too few support options”, “Its too hard”
  • - You’re sharding your RDBMS infrastructure and its becoming brittle and a nightmare to maintain.
    - Twitter has a good quote where they stated it used to take them 2 weeks to run an alter table statement
  • Using Hadoop for ETL to save money by displacing ETL vendors
    Using Hive to offload datasets and their corresponding queries from your EDW and lower your EDW bill
  • A great way to competitively differentiate with arbitrarily structured data
  • Hadoop’s power is in its single storage repository and its support for arbitrary data structures. You have the technology to ask any question if you just have the data.
  • Transcript

    • 1. Hadoop for the disillusioned Steve Watt, Red Hat CC flickr rubenswieringa @wattsteve
    • 2. @wattsteve
    • 3. Wired Magazine - July 2008 @wattsteve
    • 4. Hadoop in 2013 Platform Layers Technologies Computational Runtimes YARN, GiRAPH, MapReduce, HBase, Phoenix, Spark/BDAS, Drill, Impala, Stinger & more FileSystems Azure, CassandraFS, CephFS, CleverSafe, GlusterFS, GridGain, HDFS, Lustre MapR FS, S3, SWIFT, Quantcast FS, Symantec VCFS & more Infrastructures System on a Chip, x86, Virtualization and Cloud Distributions Cloudera, Hortonworks, IBM, Intel, MapR, WanDisco CC flickr lowfatbrains @wattsteve
    • 5. Source: Gartner Hype Cycle @wattsteve
    • 6. Your data is growing beyond your ability to manage & query it CC flickr kakadu @wattsteve
    • 7. Save money when asking the same questions of your data CC flickr martijnsnels @wattsteve
    • 8. Hadoop Customer, “Great, but now what?” Innovators Early Adopters Early Majority Late Majority Laggards CHASM Geoffrey Moore’s Technology Adoption Lifecycle @wattsteve
    • 9. new and build data products CC flickr cbcastro @wattsteve
    • 10.      Ask your domain experts and LOB folks what unanswered questions they have Where can you get the data you need to answer that question? (domain experts should know where to get it) Some of this data may be outside your organization (Social Media, Sensor Data, Data brokerages/Marketplaces, Web Pages) and some of it may be inside. If the data for the query doesn’t exist, figure out how to instrument or gather it. Pair your domain experts with your data engineers so they can work out how to obtain and massage the data given the types of queries desired CC flickr birdwatcher63 @wattsteve
    • 11. • Building data products is a similar exercise except that it involves typical product planning, such as identifying a market. • This is also a great way for an organization to explore what assets they have within their data CC flickr syume @wattsteve
    • 12. Mapping the night sky CC flickr bobfamiliar @wattsteve
    • 13. Analyzing farm soil content to predict human conflict CC flickr oxfam @wattsteve
    • 14. Crisis Management for the Chilean Earthquake CC flickr flodigrip @wattsteve
    • 15. Thanks for listening Steve Watt @wattsteve