Hadoop Summit Japan 2011 Fall - LT by IBM

Data Discovery Tool
BigSheets
MapReduce with No Coding?
p g
Atsushi Tsuchiya (eAtsuhsi@JP.ibm.com)
Atsushi Tsuchiya (eAtsuhsi@JP.ibm.com)
Big Data Tiger Team
IBM Software
IBM Software

Looking at Data
Looking at Data
• What would you do with Big data?
h ld d ih i d ?
• How to make use of it?
• It is difficult! – too vague.
• No specific problem that needs to be solved.
p p
• No specific question that needs to be answered.
• Only you know is to improve the business.
yy p
• But you have *data*
• So what would you do first?
So, what would you do first?
Looking at Data!
g

IBM with Hadoop
IBM with Hadoop
• IBM has been working with Open source
y g
community for the long time.
– Eclipse, Hadoop and so on …

• BigInsights include Hadoop

BigInsights
• BigInsihgts i
i ih is IBM Hadoop product for Big data
d d f i d
analytics.
– Basic Edition (up to 10TB) – Free 無償で使えます！
– Enterprise Edition
p

• Next version BigInsights ‐ coming soon
Next version BigInsights coming soon.
– v1.2 available.

• And many more

BigInsights Componetns
BigInsights Componetns
• BigInsihgts i l d
i ih includes:
– IBM Java
– JAQL - IBMが開発した言語(オープンソース)
– IBM Distribution of Hadoop
– BigSheets - データ探索ツール
– FLEX scheduler for Adaptive MapReduce
– Orchestrator (Workflow Engine)
– SystemT (Text Analytics), SystemML (Machine Learning)
– LDAP
– Web Console / Developer Studio

BigInsights – Basic Edition
BigInsights – Basic Edition
Version
Will be Update Basic Enterprise
Function in Nov Edition
Editi Edition
Editi
release.

Integrated Install Inc Inc
Open Source components:
Hadoop (including common utilities, HDFS, MapReduce framework) 0.20.2 Inc Inc
Jaql (programming / query language) 0.5.2 Inc Inc
Pig (programming / query language) 0.7 Inc Inc
Flume (data collection/aggregation) 0.9.1 Inc Inc
Hive (data summarization/querying) 0.5 Inc Inc
Lucene (text search) 3.0.2
302 Inc Inc
Zookeeper (process coordination) 3.2.2 Inc Inc
Avro (data serialization) 1.3.0 Inc Inc
HBase (
(real time read/write)
/ ) 0.20.6
0 20 6 Inc Inc
Oozie (workflow/ job orchestration) 2.2.2 Inc Inc
Online documentation Inc Inc
Capability to integrate with DB2, InfoSphere Warehouse Inc Inc
Two DB2 UDFs to submit jobs, and read results from BigInsights

BigInsights – Enterprise Edition
Enterprise Edition
Basic Enterprise
Function Edition Edition
R Connector
Jaql module to invoke R statistical capabilities from BigInsights n/a Inc
Netezza C
N t Connector
t
Jaql modules to read/write data from/to Netezza n/a Inc
LDAP n/a Inc
Web Console n/a Inc
Workflow Engine n/a Inc
Scheduler (Orchestrator) n/a Inc
Text Analytics Module (System T) n/a Inc
Eclipse support (for System T）* n/a Inc
BigSheets – Data Discovery Tool n/a Inc
IBM Optim Development Studio V2.2.1.0 n/a Inc
Support by IBM
pp y n/a Inc

BigSheets
• A data exploring tool for Hadoop
• Only comes with BigInsights Enterprise edition
Only comes with BigInsights Enterprise edition

BigSheets Concept Model
Concept Model
Enrich Inspect

Explore
Internet No Coding is Required!
Gather
BigSheets

Intranet

Publish Get/
Manipulate
Logs Gather

Massive Results
Other in BigInsights

Explore &
Analyze

It s like a spreadsheets.
It’s like a spreadsheets

Looks very familiar ?!?

Visualizations
• Predefined visualization
• Customer Plug‐in
Customer Plug in

A number of coffee shops in North America for each States.

Internet
BigSheets

Intranet

Gather Logs

Other
BigInsight
s

• BigInsights can gather data from
i i h h d f
– Predefined formats :
• BigSheets data reader
• Basic crawler data reader
• Basic crawler data reader (binary support)
Basic crawler data reader (binary support)
• Character‐delimited data reader
• Tab Separated Value (TSV) data reader
p ( )
• JavaScript Object Notation (JSON) array reader
• Comma Separated Value (CSV) data reader

– Customer BigSheets Reader

Internet
BigSheets

Intranet

Gather Logs

Other
BigInsight
s

• BigInsights can import structured and
i i h i d d
unstructured data
– CSV
– Files
– Network
• http
p
• hdfs
• AWS (S3n/S3)
– Other
• Customer Importer

Internet
BigSheets

Intranet

Collection Logs

Other
BigInsight
s

A complete list of MacDonald s in North America.
A complete list of MacDonald's in North America

Internet
BigSheets

Intranet

Logs

BigInsight
Other s

Calculate

Reformat

Import

A complete list of MacDonald's in North America.

Internet
BigSheets

Intranet

Logs

BigInsight
Other s

Column chart

Heat map

BigSheets in Action
in Action
映売げ
• Blockbuster 映画売り上げ予測
– ABC Newsより

Blockbuster – 映画の売り上げ予測
IBM BigInsights/BigSheets
①週末につぶやかれたTweets
①週末につぶやかれたTweets
(約200,000)フィードを受けて、

②数時間以内に、
（今までは、月曜の朝になってから）
売り上げ予測チャト作成
‐売り上げ予測チャート作成
‐センチメント分析
例えば、今年の夏は、
がどれよりも人気があた（
X‐manがどれよりも人気があった（つ
ぶやかれた）→宣伝、上映戦略など
をこまめに修正

Conclusion

• We all need to improve the business.

• S
So, where would you start with Big data?
h ld t t ith Bi d t ?

Data Discovery is a key to start improving
YOUR Business!
YOUR Business!

Hadoop Summit Japan 2011 Fall - LT by IBM

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Hadoop Summit Japan 2011 Fall - LT by IBM

Similar to Hadoop Summit Japan 2011 Fall - LT by IBM (20)

Recently uploaded

Recently uploaded (20)

Hadoop Summit Japan 2011 Fall - LT by IBM