Hadoop meets Cloud with Multi-Tenancy

Treasure Data
Hadoop meets Cloud with Multi-Tenancy

Kazuki Ohta
Founder and CTO at Treasure Data, Inc.
Hadoopユーザー会
k@treasure-data.com

@kzk_mover

Friday, April 5, 13

Who are you?
 Kazuki Ohta (太田一樹)
• @kzk_mover, k@treasure-data.com

 Treasure Data, Inc.
• Chief Technology Officer, Founded July 2011

 Hadoop User Group Japan
• One of Founders
• “Hadoop徹底入門”

 Open-Source Enthusiast
• Hadoop, memcached, jemalloc, MongoDB, memcached, uim, etc...

2

Friday, April 5, 13

Treasure Data = Cloud + Big Data
Cloud Big Data-as-a-Service

Database-as-a-service

Enterprise
Lightweight RDBMS Traditional
RDBMS Data Warehouse

DB2
On-Premise
$34B $10B
market market

1Bil entry Data Volume
Or 10TB

© 2012 Forrester Research, Inc. Reproduction Prohibited 3

Friday, April 5, 13

What is the Problem?

4

Friday, April 5, 13

Big Data? NoSQL?

5

Friday, April 5, 13

Too Many Solutions

6

Friday, April 5, 13

Hadoop Versions

Too Many Variations (+Eco System)

from http://marblejenka.blogspot.jp/2013/01/hadoop.html 7

Friday, April 5, 13

Current Big Data Solutions: ‘Feature Creep’

http://en.wikipedia.org/wiki/Feature_creep 8

Friday, April 5, 13

We need Machete :)

EVERYTHING
with
ONE interface

Simple & Discoverable
Machete Design by James Lindenbaum
Heroku Co-Founder
http://www.youtube.com/watch?v=3BhDLm9jo5Y

9

Friday, April 5, 13

‘Simplicity’ itself is a feature :)
by Anand Babu Periasamy
GlusterFS Co-Founder

10

Friday, April 5, 13

Next Topic: Cloud?

11

Friday, April 5, 13

http://www.saasblogs.com/saas/demystifying-the-cloud-where-do-saas-paas-and-other-acronyms-ﬁt-in/

12

Friday, April 5, 13

Battle Field of IaaS Vendors: SCM

HW Performance / Price In the near future, most of
HW buyers aren’t individual
companies, but cloud.
IaaS Vendors

Decrease with Battle Field:
Moore’s Law Supply Chain Management
On-Premise

Time

13

Friday, April 5, 13

PaaS, SaaS:
IT is all about Operation

More Sleep, More Value

With PaaS, you offload your development operations function and
have the PaaS provider handle the tools and components required to
deploy and manage applications reliably. - EngineYard

14

Friday, April 5, 13

PaaS/SaaS Battle Field: ‘Time’ is Money
Ideal
Customer Expectation
Value

Obsolete
over time

Reality
(On-Premise)

Upgrade
HW/SW Selection, PoC, Deploy...
Time
Sign-up or PO

15

Friday, April 5, 13

Introduction
to
Treasure Data

16

Friday, April 5, 13

Company Overview

US team as of 2012 July 17

Friday, April 5, 13

Company Overview
 Silicon Valley-based Company
• All Founders are Japanese
• Hironobu Yoshikawa
• Kazuki Ohta
• Sadayuki Furuhashi

 OSS Enthusiasts
• MessagePack, Fluentd, etc.
• Cloud native

18

Friday, April 5, 13

19

Our 50+ Customers – Fortune Global 500 leaders
and start-ups including:

250 billion records / month
in Feb 2013

2 million jobs executed

Friday, April 5, 13

Vision: Single Analytics Platform for the World
20

Friday, April 5, 13

Investors
 Bill Tai
 Naren Gupta - Nexus Ventures, Director of Redhat, TIBCO
 Othman Laraki - Former VP Growth at Twitter
 James Lindenbaum, Adam Wiggins, Orion Henry - Heroku
Founders
 Anand Babu Periasamy, Hitesh Chellani - Gluster
Founders
 Yukihiro “Matz” Matsumoto - Creator of Ruby
Jerry Yang, Founder of Yahoo!
 Dan Scheinman - Director of Arista Networks
where Hadoop was invented :)
 + 10 more people
Check out Today (2013/01/21)’s Morning 日経新聞!
• and....
21

Friday, April 5, 13

Treasure Data’s
Philosophy and Architecture

22

Friday, April 5, 13

Big Data Adoption Stages
Optimization What’s the best?
Predictive Analysis What’s a trend? Analytics
Statistical Analysis Treasure Data’s FOCUS
Why?
Alerts Error?(80% of needs)
Drill Down Query Where exactly?
Reporting
Ad-hoc Reports Where?
Standard Reports What happened?

Intelligence Sophistication
23

Friday, April 5, 13

Full Stack Support for Big Data Reporting

Our best-in-class architecture Data from almost any source
and operations team ensure the can be securely and reliably
integrity and availability of your uploaded using td-agent in
data. streaming or batch mode.

Our SQL, REST, JDBC, ODBC You can store gigabytes to
and command-line interfaces petabytes of data eﬃciently and
support all major query tools securely in our cloud-based
and approaches. columnar datastore.

24

Friday, April 5, 13

Treasure Data = Collect + Store + Query
25

Friday, April 5, 13

Example in AdTech: MobFox

1. Europe’s largest independent mobile ad exchange.
2. 20 billion imps/month (circa Jan. 2013)
3. Serving ads for 15,000+ mobile apps (circa Jan. 2013)
4. Needed Big Data Analytics infrastructure ASAP.

26

Friday, April 5, 13

Two Weeks From Start to Finish!

27

Friday, April 5, 13

Our Value was Proven :)
Customer Our Value: Save Time!
Value

Obsolete
over time

Reality
(On-Premise)
Simple
Interface
Upgrade
HW/SW Selection, PoC, Deploy...
Time
Sign-up or PO

28

Friday, April 5, 13

Architecture Breakdown

Data Collection Data Store/Analytics Connectivity
• Increasing variety of • Remaining complexity in • Required to ensure
data sources both traditional DWH connectivity with
• No single data schema and Hadoop (very slow existing BI/visualization/
• Lack of streaming data time to market) apps by JDBC, REST
collection method • Challenges in scaling and ODBC.
• 60% of Big Data project data volume and
resource consumed expanding cost.

29

Friday, April 5, 13

1) Data Collection
 60% of BI project resource is consumed here
 Most ‘underestimated’ and ‘unsexy’ but MOST important
 Fluentd: OSS lightweight but robust Log Collector
• http://fluentd.org/

These talks will cover Fluentd :)
15:40∼ Log analysis system with Hadoop in livedoor 2013
by Satoshi Tagomori @ NHN Japan

16:30∼ いかにしてHadoopにデータを集めるか
by Sadayuki Furuhahsi @ Treasure Data, Inc.

30

Friday, April 5, 13

2) Data Store / Analytics - Columnar Storage

31

Friday, April 5, 13

3) Connectivity

REST API
td-command
Query
Query
Query API
Processing
JDBC, ODBC Driver Cluster
BI apps

Web App
Treasure Data
Result MySQL Columnar Storage

Postgres

32

Friday, April 5, 13

Most Difficult Challenge: Multi-Tenancy
 All customers share the Hadoop clusters (4 Data Centers)
 Resource Sharing (Burst Cores), Rapid Improvement, Ease of Upgrade

Job Submission
+ Plan Change
Local FairScheduler

datacenter A

Local FairScheduler
Global
datacenter B
Scheduler
Local FairScheduler

datacenter C On-Demand
Resouce Allocation
Local FairScheduler
datacenter D

33

Friday, April 5, 13

Conclusion
 Big Data is too complex
• Needs Simplicity
• Machete v.s. Swiss Army Knife (Feature Creep)

 IT is changing
• The value of Software itself is decreasing
• Operation is the key

 Treasure Data = Cloud + Big Data
• Currently Focusing on Big Data Reporting
• Instant Value with Simple Interface

34

Friday, April 5, 13

We’re Hiring Top Talents, please contact me :)
35

Friday, April 5, 13

Appendix

18 36
Friday, April 5, 13

Big Data Market Growth
(average of IDC, Gartner and Wikibon stats) Big Data Revenue Breakdown

CAGR 38%

“In 2012…BI and Analytics are
rated #1 priorities.”
— Ravi Kalakota, Gartner

“Big Data is the new definitive source of
“More than half a billion dollars in venture capital
competitive advantage across all
has been invested in new big data technology.” industries.”
— Dan Vessett, IDC — Jeff Kelly, Wikibon

37

Friday, April 5, 13

Big Data Situation

Customer
Treasure Data
Value
RedShift
AWS
Obsolescence
over time

EMR

Software B

Software A On-premise
solutions
Time
Sign-up or PO

38

Friday, April 5, 13

Treasure Data Service Architecture
User

Apache

App Treasure Data
columnar data
App RDBMS
warehouse

Other data sources

MAPREDUCE JOBS

HIVE, PIG (to be supported)
td-command
Query
Query
Processing
API
JDBC, REST Cluster
BI apps

39

Friday, April 5, 13

Our Own Open Source technologies
We are open source natives and proud of our heritage.
We’ve contributed to Hibernate, Hadoop, Cassandra,
Memcached, KDE, MongoDB among others.
Our product reflects our deep commitment to the open-source
community and is built on top of open source software we’ve
authored and open sourced.
• Fluentd - a popular data collector daemon written in Ruby
www.fluentd.org (a leading user: SlideShare/Linkedin, One Kings Lane)
• MessagePack - a fast, compact serializer.
www.msgpack.org (a leading user: Pinterest, Redis)

Substantial commitment
(Code, Packaging, Documentation,
Sponsorship)

Tech marketing, Possible lead gen

40

Friday, April 5, 13

Example in Web Industry

41

Friday, April 5, 13

Example Use Case – MySQL to TD

42

Friday, April 5, 13

Example Use Case – MySQL to TD

43

Friday, April 5, 13

Big Data for the Rest of Us

www.treasure-data.com | @TreasureData

Friday, April 5, 13

Hadoop meets Cloud with Multi-Tenancy

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Hadoop meets Cloud with Multi-Tenancy

Similar to Hadoop meets Cloud with Multi-Tenancy (20)

More from Treasure Data, Inc.

More from Treasure Data, Inc. (20)

Recently uploaded

Recently uploaded (20)

Hadoop meets Cloud with Multi-Tenancy

Editor's Notes