Data Discovery Tool BigSheetsMapReduce with No Coding? p gAtsushi Tsuchiya (eAtsuhsi@JP.ibm.com)Atsushi Tsuchiya (eAtsuhsi@JP.ibm.com) Big Data Tiger Team IBM Software IBM Software
Looking at Data Looking at Data• What would you do with Big data? h ld d ih i d ?• How to make use of it?• It is difficult! – too vague. • No specific problem that needs to be solved. p p • No specific question that needs to be answered.• Only you know is to improve the business. yy p• But you have *data*• So what would you do first? So, what would you do first? Looking at Data! g
IBM with Hadoop IBM with Hadoop• IBM has been working with Open source y g community for the long time. – Eclipse, Hadoop and so on …• BigInsights include Hadoop
BigInsights• BigInsihgts i i ih is IBM Hadoop product for Big data d d f i d analytics. – Basic Edition (up to 10TB) – Free 無償で使えます！ – Enterprise Edition p• Next version BigInsights ‐ coming soon Next version BigInsights coming soon. – v1.2 available.• And many more
BigInsights Componetns BigInsights Componetns• BigInsihgts i l d i ih includes: – IBM Java – JAQL - IBMが開発した言語(オープンソース) – IBM Distribution of Hadoop – BigSheets - データ探索ツール – FLEX scheduler for Adaptive MapReduce – Orchestrator (Workflow Engine) – SystemT (Text Analytics), SystemML (Machine Learning) – LDAP – Web Console / Developer Studio
BigInsights – Basic Edition BigInsights – Basic Edition Version Will be Update Basic EnterpriseFunction in Nov Edition Editi Edition Editi release.Integrated Install Inc IncOpen Source components:Hadoop (including common utilities, HDFS, MapReduce framework) 0.20.2 Inc IncJaql (programming / query language) 0.5.2 Inc IncPig (programming / query language) 0.7 Inc IncFlume (data collection/aggregation) 0.9.1 Inc IncHive (data summarization/querying) 0.5 Inc IncLucene (text search) 3.0.2 302 Inc IncZookeeper (process coordination) 3.2.2 Inc IncAvro (data serialization) 1.3.0 Inc IncHBase ( (real time read/write) / ) 0.20.6 0 20 6 Inc IncOozie (workflow/ job orchestration) 2.2.2 Inc IncOnline documentation Inc IncCapability to integrate with DB2, InfoSphere Warehouse Inc Inc Two DB2 UDFs to submit jobs, and read results from BigInsights
BigInsights – Enterprise Edition Enterprise Edition Basic EnterpriseFunction Edition EditionR Connector Jaql module to invoke R statistical capabilities from BigInsights n/a IncNetezza CN t Connector t Jaql modules to read/write data from/to Netezza n/a IncLDAP n/a IncWeb Console n/a IncWorkflow Engine n/a IncScheduler (Orchestrator) n/a IncText Analytics Module (System T) n/a IncEclipse support (for System T）* n/a IncBigSheets – Data Discovery Tool n/a IncIBM Optim Development Studio V126.96.36.199 n/a IncSupport by IBM pp y n/a Inc
BigSheets• A data exploring tool for Hadoop• Only comes with BigInsights Enterprise edition Only comes with BigInsights Enterprise edition
BigSheets Concept Model Concept Model Enrich Inspect ExploreInternet No Coding is Required! Gather BigSheetsIntranet Publish Get/ Manipulate Logs Gather Massive Results Other in BigInsights Explore & Analyze
It s like a spreadsheets.It’s like a spreadsheets Looks very familiar ?!?
Visualizations• Predefined visualization• Customer Plug‐in Customer Plug in A number of coffee shops in North America for each States.
Internet BigSheets Intranet Gather Logs Other BigInsight s• BigInsights can import structured and i i h i d d unstructured data – CSV – Files – Network • http p • hdfs • AWS (S3n/S3) – Other • Customer Importer
Internet BigSheets Intranet Collection Logs Other BigInsight sA complete list of MacDonald s in North America.A complete list of MacDonalds in North America
Internet BigSheets Intranet Logs BigInsight Other s Calculate ReformatImport A complete list of MacDonalds in North America.
Internet BigSheets Intranet Logs BigInsight Other sColumn chart Heat map
BigSheets in Action in Action 映 売 げ• Blockbuster 映画売り上げ予測 – ABC Newsより