Hadoop Summit Japan 2011 Fall - LT by IBM


Published on

Data Discovery Tool for BigInsights (on top of Hadoop) - MapReduce with no coding.

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Hadoop Summit Japan 2011 Fall - LT by IBM

  1. 1. Data Discovery Tool BigSheetsMapReduce with No Coding? p gAtsushi Tsuchiya (eAtsuhsi@JP.ibm.com)Atsushi Tsuchiya (eAtsuhsi@JP.ibm.com) Big Data Tiger Team IBM Software IBM Software
  2. 2. Looking at Data Looking at Data• What would you do with Big data?  h ld d ih i d ?• How to make use of it?• It is difficult! – too vague. • No specific problem that needs to be solved. p p • No specific question that needs to be answered.• Only you know is to improve the business. yy p• But you have *data*• So what would you do first? So, what would you do first? Looking at Data! g
  3. 3. IBM with Hadoop IBM with Hadoop• IBM has been working with Open source  y g community for the long time. – Eclipse, Hadoop and so on …• BigInsights include Hadoop
  4. 4. BigInsights• BigInsihgts i i ih is IBM Hadoop product for Big data  d d f i d analytics. – Basic Edition (up to 10TB) – Free 無償で使えます! – Enterprise Edition  p• Next version BigInsights ‐ coming soon Next version BigInsights coming soon. – v1.2 available.• And many more
  5. 5. BigInsights Componetns BigInsights Componetns• BigInsihgts i l d i ih includes: – IBM Java – JAQL - IBMが開発した言語(オープンソース) – IBM Distribution of Hadoop – BigSheets - データ探索ツール – FLEX scheduler for Adaptive MapReduce  – Orchestrator (Workflow Engine) – SystemT (Text Analytics), SystemML (Machine Learning) – LDAP – Web Console / Developer Studio
  6. 6. BigInsights – Basic Edition BigInsights – Basic Edition Version Will be Update Basic EnterpriseFunction in Nov Edition Editi Edition Editi release.Integrated Install Inc IncOpen Source components:Hadoop (including common utilities, HDFS, MapReduce framework) 0.20.2 Inc IncJaql (programming / query language) 0.5.2 Inc IncPig (programming / query language) 0.7 Inc IncFlume (data collection/aggregation) 0.9.1 Inc IncHive (data summarization/querying) 0.5 Inc IncLucene (text search) 3.0.2 302 Inc IncZookeeper (process coordination) 3.2.2 Inc IncAvro (data serialization) 1.3.0 Inc IncHBase ( (real time read/write) / ) 0.20.6 0 20 6 Inc IncOozie (workflow/ job orchestration) 2.2.2 Inc IncOnline documentation Inc IncCapability to integrate with DB2, InfoSphere Warehouse Inc Inc Two DB2 UDFs to submit jobs, and read results from BigInsights
  7. 7. BigInsights – Enterprise Edition Enterprise Edition Basic EnterpriseFunction Edition EditionR Connector Jaql module to invoke R statistical capabilities from BigInsights n/a IncNetezza CN t Connector t Jaql modules to read/write data from/to Netezza n/a IncLDAP n/a IncWeb Console n/a IncWorkflow Engine n/a IncScheduler (Orchestrator) n/a IncText Analytics Module (System T) n/a IncEclipse support (for System T)* n/a IncBigSheets – Data Discovery Tool n/a IncIBM Optim Development Studio V2.2.1.0 n/a IncSupport by IBM pp y n/a Inc
  8. 8. BigSheets• A data exploring tool for Hadoop• Only comes with BigInsights Enterprise edition Only comes with BigInsights Enterprise edition
  9. 9. BigSheets Concept Model Concept Model Enrich Inspect ExploreInternet No Coding is Required! Gather BigSheetsIntranet Publish Get/ Manipulate Logs Gather Massive Results Other in BigInsights Explore &  Analyze
  10. 10. It s like a spreadsheets.It’s like a spreadsheets Looks very familiar ?!?
  11. 11. Visualizations• Predefined visualization• Customer Plug‐in Customer Plug in A number of coffee shops in North America for each States.
  12. 12. DEMO
  13. 13. Internet BigSheets Intranet Gather Logs Other BigInsight s• BigInsights can gather data from i i h h d f – Predefined formats : • BigSheets data reader • Basic crawler data reader • Basic crawler data reader (binary support) Basic crawler data reader (binary support) • Character‐delimited data reader • Tab Separated Value (TSV) data reader p ( ) • JavaScript Object Notation (JSON) array reader • Comma Separated Value (CSV) data reader – Customer BigSheets Reader 
  14. 14. Internet BigSheets Intranet Gather Logs Other BigInsight s• BigInsights can import structured and  i i h i d d unstructured data – CSV – Files – Network • http p • hdfs • AWS (S3n/S3) – Other • Customer Importer
  15. 15. Internet BigSheets Intranet Collection Logs Other BigInsight sA complete list of MacDonald s in North America.A complete list of MacDonalds in North America
  16. 16. Internet BigSheets Intranet Logs BigInsight Other s Calculate ReformatImport A complete list of MacDonalds in North America.
  17. 17. Internet BigSheets Intranet Logs BigInsight Other sColumn chart Heat map
  18. 18. BigSheets in Action in Action 映 売 げ• Blockbuster 映画売り上げ予測 – ABC Newsより
  19. 19. Blockbuster – 映画の売り上げ予測 IBM BigInsights/BigSheets ①週末につぶやかれたTweets  ①週末につぶやかれたTweets (約200,000)フィードを受けて、 ②数時間以内に、 (今までは、月曜の朝になってから) 売り上げ予測チャ ト作成 ‐売り上げ予測チャート作成 ‐センチメント分析 例えば、今年の夏は、 がどれよりも人気があ た( X‐manがどれよりも人気があった(つ ぶやかれた)→宣伝、上映戦略など をこまめに修正
  20. 20. Conclusion• We all need to improve the business.• S So, where would you start with Big data? h ld t t ith Bi d t ? Data Discovery is a key to start improving  YOUR Business! YOUR Business!
  21. 21. Thank you!Thank you!
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.