Move Beyond ETL: Tapping the True Business Value of Hadoop

577 views
468 views

Published on

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
577
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • This talk is really about blind spots. I believe there are three that are ultimately keeping many of you from “tapping the true value of Hadoop.”
  • How are we going to store all the information on the internet?
    Google File System (GFS)

    How are we going to analyze is?
    MapReduce (MR)

    How are we going to do something with it?
    BigTable (BT)
  • “I, too, need to store large amounts of data!”

    These are technology companies
    The followers on this wave are in other businesses, but need to use technology to move forward
    They waited to see if these technologies would really work
  • Three things the follower does not see:
    New use cases from a few early adopters
    What changes about the architectures to support new use cases
    Where the early adopters are ultimately going
  • I don’t actually mean the use cases that are way out there.

    I mean the very next ones that you early adopters are doing now, and you should be doing next (this year or next year)
  • We all know product recommendations
  • Recommendations are not just for products.
    Recommend content
    Recommend people
    Recommend actions
  • Auto-complete
    Recommendations within search
    Personalized search results
    Search within the enterprise
  • Predict energy usage (Opower)
    Predict weather (Climate Corp)
    Predict device returns (Motorola)
  • Deals to tablet and mobile devices
  • Optimizing experiences on each channel
    The key ingredients here are:
    Data consolidation (get everything in one place so it is accessible)
    Experimentation (try different things on live traffic)
    Rapid iteration (optimize by making changes quickly)
  • You should, too. At the very least, you should start doing “traditional BI” on big data.
  • Next generation use cases are in two categories:
    Analysis: Now that we have data, and it is consolidated, let’s ask more questions.
    Action: Now that we have data, and it is consolidated, let’s put it to work.
  • Followers (early majority) are at the Understand phase. Early adopters are going deep into Understand, or moving on to Act.

    I really want to talk about the last phase. What are the key ingredients?
  • Early adopters are changing their system architectures:
    They are adding new-age tools
    They are removing and replacing outdated systems
    They are restructuring and shuffling components
  • Review the difference between building upon understanding versus moving into action.
  • You got data delivered back into the application, but did you include any of the key ingredients?
  • Let’s focus on the early adopters who migrated into action. What have they done?

    We have already added the KVStore, HBase, to connect data back to the frontends.
    We can add a stream processing engine to get real-time.
    We can use the Lambda architecture to get all sorts of nice properties like immutable data sources, and make only incremental additions.
  • What does it look like to go through this process of “going deep” into action?

    Add room for a stream processing system (Storm, Samza)
    Add a query layer on top to join the results from the batch layer from the speed layer
  • You got data delivered back into the application, but did you include any of the key ingredients?
  • To make a change to something you need to edit the batch layer, the speed layer, and potentially the query that joins the two.
  • You don’t have enough data to see the future of where people are going.
  • What’s next?
  • What’s next?
  • I don’t know how to quantify the business value. I’ll leave that to Gartner.

    But I hope that I can convince you that:
    The intrinsic value of each phase is greater than the previous. What good is collecting data if you don’t do anything with it? What good is it if you don’t understand it?
    The realized value to the business at each phase is even more extreme that what I’ve shown here. What good is understanding unless you do something with it? You can do something with it as a human being, but many more decisions now are made by machines, not humans.
  • How long does this take?

    The testing, aka experiment design, development, and deployment is the bottleneck.

    Why are you spending so much money working on increasing the speed of these other phases?
  • What you would design to solve the first three phases (up to understanding) is different from what you would build to solve “action.”

    We don’t know what’s coming next. Design for your problem. And do so without just blindly following the early adopters. Instead, start with your requirements, and design with purpose.
  • Move Beyond ETL: Tapping the True Business Value of Hadoop

    1. 1. Photo Credit: http://www.crosseyedlife.com/teaching-resources/
    2. 2. Menu Who am I? Early adopters of Hadoop Next generation use cases Changing big data architectures Art of the possible My request Questions Appetiz er Main Dessert
    3. 3. Who am I? Google, Software Engineer Personalized Search Personalized Recommendations WibiData, CTO Real-time Personalization Platform Customer Use Cases
    4. 4. EARLY ADOPTERS OF HADOOP
    5. 5. Early AdopterEarly Majority
    6. 6. Collect Everything Keep Everything Ask Anything
    7. 7. Collect Everything Collect Everything Collect Everything Collect Everything Collect Everything Maybe I should, too? Keep Everything
    8. 8. Blind Spots 1. New, high-value use cases 1. Architectural changes to support broader use cases 1. The ultimate strategic goals of early adopters
    9. 9. NEXT GENERATION USE CASES Blind Spot Number 1
    10. 10. Recommendations
    11. 11. Recommendations
    12. 12. Search
    13. 13. Prediction and Prevention
    14. 14. Targeted Offers
    15. 15. Customer Experience Optimization
    16. 16. Clearly, early adopters have moved beyond ETL.
    17. 17. Life After ETL Understanding 360-degree customer views Visualization Graphs Exploration Trends Customer segmentation ROI Prediction Action Recommendations Prevention Mobile Offers Recommendations Localization Search Personalization
    18. 18. Evolution of Enterprise Data Collect Organize Understand ActUnderstandUnderstand
    19. 19. CHANGING ARCHITECTURE Blind Spot Number 2
    20. 20. Sometimes, supporting a new use case requires a different architecture.
    21. 21. Evolution of Enterprise Data Collect Organize Understand Act Collect Organize Understand
    22. 22. Key Ingredients Data Consolidation Organization Experimentation Try something! Rapid iteration Tuning Deployment Evaluation Real time Required to Understand Required to Act
    23. 23. Web Web Web HDFS Logs Txns POS Third Party Data 1. Collect
    24. 24. MapReduce Web Web Web HDFS Logs Txns POS Third Party Data 1. Collect 2. Organize Data Warehouse
    25. 25. Web Web Web HDFS POS Third Party 1. Collect 2. Organize 3. Understand MapReduce Data Warehouse
    26. 26. Web Web Web HDFS POS Third Party 1. Collect 2. Organize 3. Understand 4. Act MapReduceHBase Data Warehouse
    27. 27. Key Ingredients Data Consolidation Organization Experimentation Try something! Rapid iteration Tuning Deployment Evaluation Real time Required to Understand Required to Act Did we get any of these?
    28. 28. Early Adopter Migration Strategies Add serving capability Key-value store Indexing Add stream processing Storm Samza Lambda architecture Add both
    29. 29. Web Web Web HDFS POS Third Party 1. Collect 2. Organize 3. Understand 4. Act MapReduceHBase Data Warehouse HBaseStorm Query BatchServingSpeed
    30. 30. Key Ingredients Data Consolidation Organization Experimentation Try something! Rapid iteration Tuning Deployment Evaluation Real time Required to Understand Required to Act Did we get any of these?
    31. 31. Web Web Web HDFS POS Third Party 1. Collect 2. Organize 3. Understand 4. Act MapReduce Data Warehouse HBaseStorm Query BatchServingSpeed
    32. 32. ART OF THE POSSIBLE Blind Spot Number 3 Photo credit: http://mediahub.olive.co.uk/blog/the-art-of-the-possible
    33. 33. You can’t build a data platform to solve a problem you haven’t identified yet.
    34. 34. What’s Next? Collect Organize Understand Act ?
    35. 35. What’s Next? Collect OrganizeUnderstand Act
    36. 36. Where is the Value? Collect Organize Understand Act 0% 20% 40% 60% 80% 100% Collect Organize Understand Act
    37. 37. “As the amount of data goes up, the importance of human judgment should go down” - Andrew McAfee HBR Blog
    38. 38. Question Hypothesis PredictionTesting Analysis Hire smarter people Faster EDW Hire smarter peopleFaster Deployment Faster EDW Testing
    39. 39. What does this all mean? The real value is in next generation “action” use cases The architecture for “action” is different Design for your problem, since you don’t know the art of the possible. Requirements first, then technology
    40. 40. My Request Stop building faster data warehouses. You already understand your data. Turn your understanding into action.
    41. 41. Questions? Garrett Wu http://www.wibidata.com gwu@wibidata.com

    ×