Slides pentaho-hadoop-weka

2,739
-1

Published on

Published in: Technology, Education
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,739
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
140
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

Slides pentaho-hadoop-weka

  1. 1. F**** around with Big Data and Predictive Analytics Featuring Kettle, Weka & Hadoop. © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  2. 2. Pentahuh? 2© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  3. 3. What’s Pentaho exactly? CENTRAL ADMINISTRATION, AUDITING & MONITORING DELIVER When & Where Users Need It STREAMLINE Information Delivery VISUALIZE & Report Information In Any Style ACCESS All Enterprise Data Sources ISV & Packaged Applications SaaS / Cloud Applications EMBEDDED Web Mobile Print E-Mail STANDALONE ‣ Advanced & Predictive Analytics DATA MINING ‣ Interactive ‣ Operational ‣ Enterprise REPORTING ‣ Ad hoc Exploration ‣ Multi-Dimensional ANALYSIS ‣ Interactive Metrics ‣ Rich Visualizations DASHBOARDS ERP / CRM / Enterprise Apps (e.g. SAP, Oracle) Hadoop & NoSQL Data Unstructured & semi-structured (XML, Excel, Files, etc.) Relational Data Sources Cloud (e.g. Salesforce, Amazon, Dell) ‣ Data Integration ‣ Graphical ETL Designer INTEGRATE, CLEANSE, & ENRICH DATA ‣ In Memory Caching ‣ High Performance ANALYTICS ACCELERATOR ‣ Direct Access ‣ Hadoop Clustering/ Scheduling ‣ Instant OLAP Cubes ‣ Enterprise Scalability
  4. 4. We do open source analytics. 4© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  5. 5. Why does Pentaho claim to have anything to do with Big Data?? 5© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  6. 6. Project Kettle powerful Extraction, Transformation and Loading (ETL) capabilities using an innovative, metadata-driven approach 6© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  7. 7. Bring the code to the data 7© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 JDBC
  8. 8. Bring the code to the data 8© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 JDBCKettle
  9. 9. KettleKettle Bring the code to the data 9© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Kettle
  10. 10. Project Weka a comprehensive set of tools for machine learning and data mining 10© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  11. 11. 11© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  12. 12. 12© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  13. 13. Bring Weka to the data 13© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Kettle Kettle JDBCKettle Kettle
  14. 14. Bring Weka to the data 14© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  15. 15. JDBC Services for Kettle runtime optimization and SQL pushdown 15© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  16. 16. A smart(er) JDBC Layer 16© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Kettle Kettle Kettle Kettle JDBC SELECT CUSTOMER_ID, SUM(UNIT_SALES) FROM SALES_FACT WHERE AGE_GROUP_ID > 3 GROUP BY CUSTOMER_ID;
  17. 17. SELECT CUSTOMER_ID FROM SALES_FACT; SELECT CUSTOMER_ID, SUM(UNIT_SALES) FROM SALES_FACT WHERE AGE_GROUP_ID > 3 GROUP BY CUSTOMER_ID; A smart(er) JDBC Layer 17© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Kettle Kettle Kettle Kettle Kettle JDBC Kettle Kettle
  18. 18. The gains 18© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 • Job design and administration becomes trivial. • Runs the rich Kettle plugin environment directly on the nodes. • Performs much better than Hive. • The JDBC layer is pretty neat.
  19. 19. The caveats 19© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 • True parallel machine learning algorithms are rare and hard to design. • Not an actual production-ready design. • Clients might have caches, which must be notified by the BD store for updates.
  20. 20. © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 20 Demo!
  21. 21. © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 21 Thank you! Join the conversation. You can find us on: blog.pentaho.com @Pentaho Facebook.com/Pentaho Pentaho Business Analytics
  22. 22. © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 22 Want to learn more? Learning Linear Models in Hadoop with Weka http://markahall.blogspot.ca/2013/03/learning-linear-models-in-hadoop-with.html Introduction to MapReduce with Pentaho Data Integration http://www.youtube.com/watch?v=KZe1UugxXcs `
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×