Slides pentaho-hadoop-weka
Upcoming SlideShare
Loading in...5
×
 

Slides pentaho-hadoop-weka

on

  • 2,041 views

 

Statistics

Views

Total Views
2,041
Views on SlideShare
2,041
Embed Views
0

Actions

Likes
4
Downloads
75
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Slides pentaho-hadoop-weka Slides pentaho-hadoop-weka Presentation Transcript

  • F**** around with Big Data and Predictive Analytics Featuring Kettle, Weka & Hadoop. © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • Pentahuh? 2© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • What’s Pentaho exactly? CENTRAL ADMINISTRATION, AUDITING & MONITORING DELIVER When & Where Users Need It STREAMLINE Information Delivery VISUALIZE & Report Information In Any Style ACCESS All Enterprise Data Sources ISV & Packaged Applications SaaS / Cloud Applications EMBEDDED Web Mobile Print E-Mail STANDALONE ‣ Advanced & Predictive Analytics DATA MINING ‣ Interactive ‣ Operational ‣ Enterprise REPORTING ‣ Ad hoc Exploration ‣ Multi-Dimensional ANALYSIS ‣ Interactive Metrics ‣ Rich Visualizations DASHBOARDS ERP / CRM / Enterprise Apps (e.g. SAP, Oracle) Hadoop & NoSQL Data Unstructured & semi-structured (XML, Excel, Files, etc.) Relational Data Sources Cloud (e.g. Salesforce, Amazon, Dell) ‣ Data Integration ‣ Graphical ETL Designer INTEGRATE, CLEANSE, & ENRICH DATA ‣ In Memory Caching ‣ High Performance ANALYTICS ACCELERATOR ‣ Direct Access ‣ Hadoop Clustering/ Scheduling ‣ Instant OLAP Cubes ‣ Enterprise Scalability
  • We do open source analytics. 4© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • Why does Pentaho claim to have anything to do with Big Data?? 5© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • Project Kettle powerful Extraction, Transformation and Loading (ETL) capabilities using an innovative, metadata-driven approach 6© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • Bring the code to the data 7© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 JDBC
  • Bring the code to the data 8© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 JDBCKettle
  • KettleKettle Bring the code to the data 9© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Kettle
  • Project Weka a comprehensive set of tools for machine learning and data mining 10© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 11© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 12© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • Bring Weka to the data 13© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Kettle Kettle JDBCKettle Kettle
  • Bring Weka to the data 14© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • JDBC Services for Kettle runtime optimization and SQL pushdown 15© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • A smart(er) JDBC Layer 16© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Kettle Kettle Kettle Kettle JDBC SELECT CUSTOMER_ID, SUM(UNIT_SALES) FROM SALES_FACT WHERE AGE_GROUP_ID > 3 GROUP BY CUSTOMER_ID;
  • SELECT CUSTOMER_ID FROM SALES_FACT; SELECT CUSTOMER_ID, SUM(UNIT_SALES) FROM SALES_FACT WHERE AGE_GROUP_ID > 3 GROUP BY CUSTOMER_ID; A smart(er) JDBC Layer 17© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Kettle Kettle Kettle Kettle Kettle JDBC Kettle Kettle
  • The gains 18© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 • Job design and administration becomes trivial. • Runs the rich Kettle plugin environment directly on the nodes. • Performs much better than Hive. • The JDBC layer is pretty neat.
  • The caveats 19© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 • True parallel machine learning algorithms are rare and hard to design. • Not an actual production-ready design. • Clients might have caches, which must be notified by the BD store for updates.
  • © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 20 Demo!
  • © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 21 Thank you! Join the conversation. You can find us on: blog.pentaho.com @Pentaho Facebook.com/Pentaho Pentaho Business Analytics
  • © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 22 Want to learn more? Learning Linear Models in Hadoop with Weka http://markahall.blogspot.ca/2013/03/learning-linear-models-in-hadoop-with.html Introduction to MapReduce with Pentaho Data Integration http://www.youtube.com/watch?v=KZe1UugxXcs `