• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
The power of hadoop in business
 

The power of hadoop in business

on

  • 685 views

What is the future of Hadoop? ...

What is the future of Hadoop?

What is the new future of Hadoop?

How is that different from the old one?

Here is how Ted Dunning answered these questions at the winter Hadoop Conference of Japan 2013.

Statistics

Views

Total Views
685
Views on SlideShare
685
Embed Views
0

Actions

Likes
0
Downloads
43
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    The power of hadoop in business The power of hadoop in business Presentation Transcript

    • 1©MapR Technologies - Confidential The Power of Hadoop to Transform Business
    • 2©MapR Technologies - Confidential My Background  University, Startups – Aptex, MusicMatch, ID Analytics, Veoh – big data since before it was big  Open source – even before the internet – Apache Hadoop, Mahout, Zookeeper, Drill – bought the beer at first HUG  MapR  Founding member of Apache Drill
    • 3©MapR Technologies - Confidential MapR Technologies  Silicon Valley Startup – Top investors – Top technical and management team • Google, Microsoft, EMC, NetApp, Oracle  Enterprise quality distribution for Hadoop  Many extensions to basic Hadoop function  Strong supporter of Apache Drill
    • 4©MapR Technologies - Confidential Philosophy First What is History?
    • 5©MapR Technologies - Confidential The study of the past (what came before now)
    • 6©MapR Technologies - Confidential What is the future? (it comes after now)
    • 7©MapR Technologies - Confidential
    • 8©MapR Technologies - Confidential
    • 9©MapR Technologies - Confidential
    • 10©MapR Technologies - Confidential But the future also has a past!
    • 11©MapR Technologies - Confidential Do you remember the future?
    • 12©MapR Technologies - Confidential
    • 13©MapR Technologies - Confidential
    • 14©MapR Technologies - Confidential
    • 15©MapR Technologies - Confidential
    • 16©MapR Technologies - Confidential
    • 17©MapR Technologies - Confidential Some things turned out as expected
    • 19©MapR Technologies - Confidential Many things are different!
    • 20©MapR Technologies - Confidential Hadoop has a history
    • 21©MapR Technologies - Confidential Hadoop also has a future
    • 22©MapR Technologies - Confidential The Old Future of Hadoop  Map-reduce and HDFS – more and more, but not really different  Eco-system additions – Simpler programming (Hive and Pig) – Key-value store – Ad hoc query  Stands apart from other computing – Required by HDFS and other limitations
    • 23©MapR Technologies - Confidential The New Future of Hadoop  Real-time processing – Combines real-time and long-time  Integration with traditional IT – No need to stand apart  Integration with new technologies – Solr, Node.js, Twisted all should interface directly  Fast and flexible computation – Drill logical plan language
    • 24©MapR Technologies - Confidential Example #1 Search Abuse
    • 25©MapR Technologies - Confidential History matrix One row per user One column per thing
    • 26©MapR Technologies - Confidential Recommendation based on cooccurrence Cooccurrence gives item-item mapping One row and column per thing
    • 27©MapR Technologies - Confidential Cooccurrence matrix can also be implemented as a search index
    • 28©MapR Technologies - Confidential SolR Indexer SolR Indexer Solr indexing Cooccurrence (Mahout) Item meta- data Index shards Complete history
    • 29©MapR Technologies - Confidential SolR Indexer SolR Indexer Solr search Web tier Item meta- data Index shards User history
    • 30©MapR Technologies - Confidential Objective Results  At a very large credit card company  History is all transactions, all web interaction  Processing time cut from 20 hours per day to 3  Recommendation engine load time decreased from 8 hours to 3 minutes
    • 31©MapR Technologies - Confidential Example #2 Web Technology
    • 32©MapR Technologies - Confidential Fast analysis (Storm) Analytic output Real-time data Raw logs
    • 33©MapR Technologies - Confidential Large analysis (map-reduce) Analytic output Raw logs
    • 34©MapR Technologies - Confidential Presentation tier (d3 + node.js) Analytic output Browser query Raw logs
    • 35©MapR Technologies - Confidential Objective Results  Real-time + long-time analysis is seamless  Web tier can be rooted directly on Hadoop cluster  No need to move data
    • 36©MapR Technologies - Confidential Example #3 Apache Drill
    • 37©MapR Technologies - Confidential Big Data Processing – Hadoop Batch processing Query runtime Minutes to hours Data volume TBs to PBs Programming model MapReduce Users Developers Google project MapReduce Open source project Hadoop MapReduce
    • 38©MapR Technologies - Confidential Big Data Processing – Hadoop and Storm Batch processing Stream processing Query runtime Minutes to hours Never-ending Data volume TBs to PBs Continuous stream Programming model MapReduce DAG (pre-programmed) Users Developers Developers Google project MapReduce Open source project Hadoop MapReduce Storm or Apache S4
    • 39©MapR Technologies - Confidential Big Data Processing – The missing part Batch processing Interactive analysis Stream processing Query runtime Minutes to hours Never-ending Data volume TBs to PBs Continuous stream Programming model MapReduce DAG (pre-programmed) Users Developers Developers Google project MapReduce Open source project Hadoop MapReduce Storm and S4
    • 40©MapR Technologies - Confidential Big Data Processing – The missing part Batch processing Interactive analysis Stream processing Query runtime Minutes to hours Milliseconds to minutes Never-ending Data volume TBs to PBs GBs to PBs Continuous stream Programming model MapReduce Queries (ad hoc) DAG (pre-programmed) Users Developers Analysts and developers Developers Google project MapReduce Open source project Hadoop MapReduce Storm and S4
    • 41©MapR Technologies - Confidential Big Data Processing Batch processing Interactive analysis Stream processing Query runtime Minutes to hours Milliseconds to minutes Never-ending Data volume TBs to PBs GBs to PBs Continuous stream Programming model MapReduce Queries DAG Users Developers Analysts and developers Developers Google project MapReduce Dremel Open source project Hadoop MapReduce Storm and S4
    • 42©MapR Technologies - Confidential Big Data Processing Batch processing Interactive analysis Stream processing Query runtime Minutes to hours Milliseconds to minutes Never-ending Data volume TBs to PBs GBs to PBs Continuous stream Programming model MapReduce Queries DAG Users Developers Analysts and developers Developers Google project MapReduce Dremel Open source project Hadoop MapReduce Storm and S4 Apache Drill
    • 43©MapR Technologies - Confidential Design Principles Flexible • Pluggable query languages • Extensible execution engine • Pluggable data formats • Column-based and row-based • Schema and schema-less • Pluggable data sources Easy • Unzip and run • Zero configuration • Reverse DNS not needed • IP addresses can change • Clear and concise log messages Dependable • No SPOF • Instant recovery from crashes Fast • C/C++ core with Java support • Google C++ style guide • Min latency and max throughput (limited only by hardware)
    • 44©MapR Technologies - Confidential Simple Architecture Interface Query language Transform Logical Language Optimize Physical Plan Execute
    • 45©MapR Technologies - Confidential Standard Interfaces Interface Query language Transform Logical Language Optimize Physical Plan Execute SQL 2003 Drill logical syntax Scanner API
    • 46©MapR Technologies - Confidential query:[ { op:"sequence", do:[ { op: "scan", memo: "initial_scan", ref: "donuts", source: "local-logs", selection: {data: "activity"} }, { op: "transform", transforms: [ { ref: "donuts.quanity", expr: "donuts.sales”} ] }, { op: "filter", expr: "donuts.ppu < 1.00" }, … Logical Plan Syntax:
    • 47©MapR Technologies - Confidential Logical Streaming Example { @id: <refnum>, op: “window-frame”, input: <input>, keys: [ <name>,... ], ref: <name>, before: 2, after: here } 0 1 2 3 4 0 0 1 0 1 2 1 2 3 2 3 4
    • 48©MapR Technologies - Confidential scan-json filter flatten aggregate exp1 exp2 "table-1" Logical Plan
    • 49©MapR Technologies - Confidential Execution Plan scan-json filter flatten aggregate exp1 exp2 "table-1" scan-json filter flatten exp1 "table-1" scan-json filter flatten exp1 "table-1" node1 node2 node3
    • 50©MapR Technologies - Confidential Representing a DAG { @id: 19, op: "aggregate", input: 18, type: <simple|running|repeat>, keys: [<name>,...], aggregations: [ {ref: <name>, expr: <aggexpr> },... ] } aggregate exp2 18 19
    • 51©MapR Technologies - Confidential Non-SQL queries scan-json streaming k-means ball k- means aggregate exp2 "table-1" k k-means join cluster features scan-json "table-1"
    • 52©MapR Technologies - Confidential Design Principles Flexible • Pluggable query languages • Extensible execution engine • Pluggable data formats • Column-based and row-based • Schema and schema-less • Pluggable data sources Easy • Unzip and run • Zero configuration • Reverse DNS not needed • IP addresses can change • Clear and concise log messages Dependable • No SPOF • Instant recovery from crashes Fast • C/C++ core with Java support • Google C++ style guide • Min latency and max throughput (limited only by hardware)
    • 53©MapR Technologies - Confidential The future is not what we thought it would be
    • 54©MapR Technologies - Confidential It is better!
    • 55©MapR Technologies - Confidential Get Involved! Tweet: #hcj13w #mapr @ted_dunning
    • 56©MapR Technologies - Confidential Get Involved!  Download these slides – http://www.mapr.com/company/events/hcj-01-21-2013  Join the Drill project – drill-dev-subscribe@incubator.apache.org – #apachedrill  Contact me: – tdunning@maprtech.com – tdunning@apache.org – @ted_dunning  Join MapR (in Japan!) – jobs@mapr.com