Slides pentaho-hadoop-weka
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Slides pentaho-hadoop-weka

on

  • 2,203 views

 

Statistics

Views

Total Views
2,203
Views on SlideShare
2,203
Embed Views
0

Actions

Likes
4
Downloads
80
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Slides pentaho-hadoop-weka Presentation Transcript

  • 1. F**** around with Big Data and Predictive Analytics Featuring Kettle, Weka & Hadoop. © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 2. Pentahuh? 2© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 3. What’s Pentaho exactly? CENTRAL ADMINISTRATION, AUDITING & MONITORING DELIVER When & Where Users Need It STREAMLINE Information Delivery VISUALIZE & Report Information In Any Style ACCESS All Enterprise Data Sources ISV & Packaged Applications SaaS / Cloud Applications EMBEDDED Web Mobile Print E-Mail STANDALONE ‣ Advanced & Predictive Analytics DATA MINING ‣ Interactive ‣ Operational ‣ Enterprise REPORTING ‣ Ad hoc Exploration ‣ Multi-Dimensional ANALYSIS ‣ Interactive Metrics ‣ Rich Visualizations DASHBOARDS ERP / CRM / Enterprise Apps (e.g. SAP, Oracle) Hadoop & NoSQL Data Unstructured & semi-structured (XML, Excel, Files, etc.) Relational Data Sources Cloud (e.g. Salesforce, Amazon, Dell) ‣ Data Integration ‣ Graphical ETL Designer INTEGRATE, CLEANSE, & ENRICH DATA ‣ In Memory Caching ‣ High Performance ANALYTICS ACCELERATOR ‣ Direct Access ‣ Hadoop Clustering/ Scheduling ‣ Instant OLAP Cubes ‣ Enterprise Scalability
  • 4. We do open source analytics. 4© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 5. Why does Pentaho claim to have anything to do with Big Data?? 5© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 6. Project Kettle powerful Extraction, Transformation and Loading (ETL) capabilities using an innovative, metadata-driven approach 6© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 7. Bring the code to the data 7© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 JDBC
  • 8. Bring the code to the data 8© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 JDBCKettle
  • 9. KettleKettle Bring the code to the data 9© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Kettle
  • 10. Project Weka a comprehensive set of tools for machine learning and data mining 10© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 11. 11© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 12. 12© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 13. Bring Weka to the data 13© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Kettle Kettle JDBCKettle Kettle
  • 14. Bring Weka to the data 14© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 15. JDBC Services for Kettle runtime optimization and SQL pushdown 15© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 16. A smart(er) JDBC Layer 16© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Kettle Kettle Kettle Kettle JDBC SELECT CUSTOMER_ID, SUM(UNIT_SALES) FROM SALES_FACT WHERE AGE_GROUP_ID > 3 GROUP BY CUSTOMER_ID;
  • 17. SELECT CUSTOMER_ID FROM SALES_FACT; SELECT CUSTOMER_ID, SUM(UNIT_SALES) FROM SALES_FACT WHERE AGE_GROUP_ID > 3 GROUP BY CUSTOMER_ID; A smart(er) JDBC Layer 17© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Kettle Kettle Kettle Kettle Kettle JDBC Kettle Kettle
  • 18. The gains 18© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 • Job design and administration becomes trivial. • Runs the rich Kettle plugin environment directly on the nodes. • Performs much better than Hive. • The JDBC layer is pretty neat.
  • 19. The caveats 19© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 • True parallel machine learning algorithms are rare and hard to design. • Not an actual production-ready design. • Clients might have caches, which must be notified by the BD store for updates.
  • 20. © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 20 Demo!
  • 21. © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 21 Thank you! Join the conversation. You can find us on: blog.pentaho.com @Pentaho Facebook.com/Pentaho Pentaho Business Analytics
  • 22. © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 22 Want to learn more? Learning Linear Models in Hadoop with Weka http://markahall.blogspot.ca/2013/03/learning-linear-models-in-hadoop-with.html Introduction to MapReduce with Pentaho Data Integration http://www.youtube.com/watch?v=KZe1UugxXcs `