Hadoop meets Cloud with Multi-Tenancy

9,991 views

Published on

CTO Kaz's talk at Hadoop Conference Japan 2013 Winter.

Published in: Technology
0 Comments
21 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
9,991
On SlideShare
0
From Embeds
0
Number of Embeds
1,530
Actions
Shares
0
Downloads
258
Comments
0
Likes
21
Embeds 0
No embeds

No notes for slide
  • ContextLogic – Company behind Wish.com – Rapidly growing social rewards platform. MobFox – Europe’s largest mobile advertising network. Cookpad – Japan’s largest recipe discovery service. Getjar – World’s largest free app store. Viki.com – Global video streaming & sharing site. Splurgy – Socially enabled universal promotions platform.
  • Hadoop meets Cloud with Multi-Tenancy

    1. 1. Treasure Data Hadoop meets Cloud with Multi-Tenancy Kazuki Ohta Founder and CTO at Treasure Data, Inc. Hadoopユーザー会 k@treasure-data.com @kzk_moverFriday, April 5, 13
    2. 2. Who are you?  Kazuki Ohta (太田一樹) • @kzk_mover, k@treasure-data.com  Treasure Data, Inc. • Chief Technology Officer, Founded July 2011  Hadoop User Group Japan • One of Founders • “Hadoop徹底入門”  Open-Source Enthusiast • Hadoop, memcached, jemalloc, MongoDB, memcached, uim, etc... 2Friday, April 5, 13
    3. 3. Treasure Data = Cloud + Big Data Cloud Big Data-as-a-Service Database-as-a-service Enterprise Lightweight RDBMS Traditional RDBMS Data Warehouse DB2 On-Premise $34B $10B market market 1Bil entry Data Volume Or 10TB © 2012 Forrester Research, Inc. Reproduction Prohibited 3Friday, April 5, 13
    4. 4. What is the Problem? 4Friday, April 5, 13
    5. 5. Big Data? NoSQL? 5Friday, April 5, 13
    6. 6. Too Many Solutions 6Friday, April 5, 13
    7. 7. Hadoop Versions Too Many Variations (+Eco System) from http://marblejenka.blogspot.jp/2013/01/hadoop.html 7Friday, April 5, 13
    8. 8. Current Big Data Solutions: ‘Feature Creep’ http://en.wikipedia.org/wiki/Feature_creep 8Friday, April 5, 13
    9. 9. We need Machete :) EVERYTHING with ONE interface Simple & Discoverable Machete Design by James Lindenbaum Heroku Co-Founder http://www.youtube.com/watch?v=3BhDLm9jo5Y 9Friday, April 5, 13
    10. 10. ‘Simplicity’ itself is a feature :) by Anand Babu Periasamy GlusterFS Co-Founder 10Friday, April 5, 13
    11. 11. Next Topic: Cloud? 11Friday, April 5, 13
    12. 12. http://www.saasblogs.com/saas/demystifying-the-cloud-where-do-saas-paas-and-other-acronyms-fit-in/ 12Friday, April 5, 13
    13. 13. Battle Field of IaaS Vendors: SCM HW Performance / Price In the near future, most of HW buyers aren’t individual companies, but cloud. IaaS Vendors Decrease with Battle Field: Moore’s Law Supply Chain Management On-Premise Time 13Friday, April 5, 13
    14. 14. PaaS, SaaS: IT is all about Operation More Sleep, More Value With PaaS, you offload your development operations function and have the PaaS provider handle the tools and components required to deploy and manage applications reliably. - EngineYard 14Friday, April 5, 13
    15. 15. PaaS/SaaS Battle Field: ‘Time’ is Money Ideal Customer Expectation Value Obsolete over time Reality (On-Premise) Upgrade HW/SW Selection, PoC, Deploy... Time Sign-up or PO 15Friday, April 5, 13
    16. 16. Introduction to Treasure Data 16Friday, April 5, 13
    17. 17. Company Overview US team as of 2012 July 17Friday, April 5, 13
    18. 18. Company Overview  Silicon Valley-based Company • All Founders are Japanese • Hironobu Yoshikawa • Kazuki Ohta • Sadayuki Furuhashi  OSS Enthusiasts • MessagePack, Fluentd, etc. • Cloud native 18Friday, April 5, 13
    19. 19. 19 Our 50+ Customers – Fortune Global 500 leaders and start-ups including: 250 billion records / month in Feb 2013 2 million jobs executedFriday, April 5, 13
    20. 20. Vision: Single Analytics Platform for the World 20Friday, April 5, 13
    21. 21. Investors  Bill Tai  Naren Gupta - Nexus Ventures, Director of Redhat, TIBCO  Othman Laraki - Former VP Growth at Twitter  James Lindenbaum, Adam Wiggins, Orion Henry - Heroku Founders  Anand Babu Periasamy, Hitesh Chellani - Gluster Founders  Yukihiro “Matz” Matsumoto - Creator of Ruby Jerry Yang, Founder of Yahoo!  Dan Scheinman - Director of Arista Networks where Hadoop was invented :)  + 10 more people Check out Today (2013/01/21)’s Morning 日経新聞! • and.... 21Friday, April 5, 13
    22. 22. Treasure Data’s Philosophy and Architecture 22Friday, April 5, 13
    23. 23. Big Data Adoption Stages Optimization What’s the best? Predictive Analysis What’s a trend? Analytics Statistical Analysis Treasure Data’s FOCUS Why? Alerts Error?(80% of needs) Drill Down Query Where exactly? Reporting Ad-hoc Reports Where? Standard Reports What happened? Intelligence Sophistication 23Friday, April 5, 13
    24. 24. Full Stack Support for Big Data Reporting Our best-in-class architecture Data from almost any source and operations team ensure the can be securely and reliably integrity and availability of your uploaded using td-agent in data. streaming or batch mode. Our SQL, REST, JDBC, ODBC You can store gigabytes to and command-line interfaces petabytes of data efficiently and support all major query tools securely in our cloud-based and approaches. columnar datastore. 24Friday, April 5, 13
    25. 25. Treasure Data = Collect + Store + Query 25Friday, April 5, 13
    26. 26. Example in AdTech: MobFox 1. Europe’s largest independent mobile ad exchange. 2. 20 billion imps/month (circa Jan. 2013) 3. Serving ads for 15,000+ mobile apps (circa Jan. 2013) 4. Needed Big Data Analytics infrastructure ASAP. 26Friday, April 5, 13
    27. 27. Two Weeks From Start to Finish! 27Friday, April 5, 13
    28. 28. Our Value was Proven :) Customer Our Value: Save Time! Value Obsolete over time Reality (On-Premise) Simple Interface Upgrade HW/SW Selection, PoC, Deploy... Time Sign-up or PO 28Friday, April 5, 13
    29. 29. Architecture Breakdown Data Collection Data Store/Analytics Connectivity • Increasing variety of • Remaining complexity in • Required to ensure data sources both traditional DWH connectivity with • No single data schema and Hadoop (very slow existing BI/visualization/ • Lack of streaming data time to market) apps by JDBC, REST collection method • Challenges in scaling and ODBC. • 60% of Big Data project data volume and resource consumed expanding cost. 29Friday, April 5, 13
    30. 30. 1) Data Collection  60% of BI project resource is consumed here  Most ‘underestimated’ and ‘unsexy’ but MOST important  Fluentd: OSS lightweight but robust Log Collector • http://fluentd.org/ These talks will cover Fluentd :) 15:40∼ Log analysis system with Hadoop in livedoor 2013 by Satoshi Tagomori @ NHN Japan 16:30∼ いかにしてHadoopにデータを集めるか by Sadayuki Furuhahsi @ Treasure Data, Inc. 30Friday, April 5, 13
    31. 31. 2) Data Store / Analytics - Columnar Storage 31Friday, April 5, 13
    32. 32. 3) Connectivity REST API td-command Query Query Query API Processing JDBC, ODBC Driver Cluster BI apps Web App Treasure Data Result MySQL Columnar Storage Postgres 32Friday, April 5, 13
    33. 33. Most Difficult Challenge: Multi-Tenancy  All customers share the Hadoop clusters (4 Data Centers)  Resource Sharing (Burst Cores), Rapid Improvement, Ease of Upgrade Job Submission + Plan Change Local FairScheduler datacenter A Local FairScheduler Global datacenter B Scheduler Local FairScheduler datacenter C On-Demand Resouce Allocation Local FairScheduler datacenter D 33Friday, April 5, 13
    34. 34. Conclusion  Big Data is too complex • Needs Simplicity • Machete v.s. Swiss Army Knife (Feature Creep)  IT is changing • The value of Software itself is decreasing • Operation is the key  Treasure Data = Cloud + Big Data • Currently Focusing on Big Data Reporting • Instant Value with Simple Interface 34Friday, April 5, 13
    35. 35. We’re Hiring Top Talents, please contact me :) 35Friday, April 5, 13
    36. 36. Appendix 18 36Friday, April 5, 13
    37. 37. Big Data Market Growth (average of IDC, Gartner and Wikibon stats) Big Data Revenue Breakdown CAGR 38% “In 2012…BI and Analytics are rated #1 priorities.” — Ravi Kalakota, Gartner “Big Data is the new definitive source of “More than half a billion dollars in venture capital competitive advantage across all has been invested in new big data technology.” industries.” — Dan Vessett, IDC — Jeff Kelly, Wikibon 37Friday, April 5, 13
    38. 38. Big Data Situation Customer Treasure Data Value RedShift AWS Obsolescence over time EMR Software B Software A On-premise solutions Time Sign-up or PO 38Friday, April 5, 13
    39. 39. Treasure Data Service Architecture User Apache App Treasure Data columnar data App RDBMS warehouse Other data sources MAPREDUCE JOBS HIVE, PIG (to be supported) td-command Query Query Processing API JDBC, REST Cluster BI apps 39Friday, April 5, 13
    40. 40. Our Own Open Source technologies We are open source natives and proud of our heritage. We’ve contributed to Hibernate, Hadoop, Cassandra, Memcached, KDE, MongoDB among others. Our product reflects our deep commitment to the open-source community and is built on top of open source software we’ve authored and open sourced. • Fluentd - a popular data collector daemon written in Ruby www.fluentd.org (a leading user: SlideShare/Linkedin, One Kings Lane) • MessagePack - a fast, compact serializer. www.msgpack.org (a leading user: Pinterest, Redis) Substantial commitment (Code, Packaging, Documentation, Sponsorship) Tech marketing, Possible lead gen 40Friday, April 5, 13
    41. 41. Example in Web Industry 41Friday, April 5, 13
    42. 42. Example Use Case – MySQL to TD 42Friday, April 5, 13
    43. 43. Example Use Case – MySQL to TD 43Friday, April 5, 13
    44. 44. Big Data for the Rest of Us www.treasure-data.com | @TreasureDataFriday, April 5, 13

    ×