0
F**** around with Big Data and Predictive Analytics
Featuring Kettle, Weka & Hadoop.
© 2012, Pentaho. All Rights Reserved....
Pentahuh?
2© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
What’s Pentaho exactly?
CENTRAL ADMINISTRATION, AUDITING & MONITORING
DELIVER
When & Where
Users Need It
STREAMLINE
Inform...
We do open source analytics.
4© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Why does Pentaho claim to have
anything to do with Big Data??
5© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwid...
Project Kettle
powerful Extraction, Transformation and Loading (ETL) capabilities
using an innovative, metadata-driven app...
Bring the code to the data
7© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
JDBC
Bring the code to the data
8© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
JDBCKettle
KettleKettle
Bring the code to the data
9© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Ke...
Project Weka
a comprehensive set of tools for machine learning and data mining
10© 2012, Pentaho. All Rights Reserved. pen...
11© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
12© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Bring Weka to the data
13© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Kettle
Kettle
JDBC...
Bring Weka to the data
14© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
JDBC Services for Kettle
runtime optimization and SQL pushdown
15© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldw...
A smart(er) JDBC Layer
16© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Kettle
Kettle
Kett...
SELECT
CUSTOMER_ID
FROM
SALES_FACT;
SELECT
CUSTOMER_ID,
SUM(UNIT_SALES)
FROM
SALES_FACT
WHERE
AGE_GROUP_ID > 3
GROUP BY
CU...
The gains
18© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
• Job design and
administration...
The caveats
19© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
• True parallel machine
learn...
© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
20
Demo!
© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
21
Thank you!
Join the conversation. You ca...
© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
22
Want to learn more?
Learning Linear Mode...
Upcoming SlideShare
Loading in...5
×

Slides pentaho-hadoop-weka

2,462

Published on

Published in: Technology, Education
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,462
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
128
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

Transcript of "Slides pentaho-hadoop-weka"

  1. 1. F**** around with Big Data and Predictive Analytics Featuring Kettle, Weka & Hadoop. © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  2. 2. Pentahuh? 2© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  3. 3. What’s Pentaho exactly? CENTRAL ADMINISTRATION, AUDITING & MONITORING DELIVER When & Where Users Need It STREAMLINE Information Delivery VISUALIZE & Report Information In Any Style ACCESS All Enterprise Data Sources ISV & Packaged Applications SaaS / Cloud Applications EMBEDDED Web Mobile Print E-Mail STANDALONE ‣ Advanced & Predictive Analytics DATA MINING ‣ Interactive ‣ Operational ‣ Enterprise REPORTING ‣ Ad hoc Exploration ‣ Multi-Dimensional ANALYSIS ‣ Interactive Metrics ‣ Rich Visualizations DASHBOARDS ERP / CRM / Enterprise Apps (e.g. SAP, Oracle) Hadoop & NoSQL Data Unstructured & semi-structured (XML, Excel, Files, etc.) Relational Data Sources Cloud (e.g. Salesforce, Amazon, Dell) ‣ Data Integration ‣ Graphical ETL Designer INTEGRATE, CLEANSE, & ENRICH DATA ‣ In Memory Caching ‣ High Performance ANALYTICS ACCELERATOR ‣ Direct Access ‣ Hadoop Clustering/ Scheduling ‣ Instant OLAP Cubes ‣ Enterprise Scalability
  4. 4. We do open source analytics. 4© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  5. 5. Why does Pentaho claim to have anything to do with Big Data?? 5© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  6. 6. Project Kettle powerful Extraction, Transformation and Loading (ETL) capabilities using an innovative, metadata-driven approach 6© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  7. 7. Bring the code to the data 7© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 JDBC
  8. 8. Bring the code to the data 8© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 JDBCKettle
  9. 9. KettleKettle Bring the code to the data 9© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Kettle
  10. 10. Project Weka a comprehensive set of tools for machine learning and data mining 10© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  11. 11. 11© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  12. 12. 12© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  13. 13. Bring Weka to the data 13© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Kettle Kettle JDBCKettle Kettle
  14. 14. Bring Weka to the data 14© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  15. 15. JDBC Services for Kettle runtime optimization and SQL pushdown 15© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  16. 16. A smart(er) JDBC Layer 16© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Kettle Kettle Kettle Kettle JDBC SELECT CUSTOMER_ID, SUM(UNIT_SALES) FROM SALES_FACT WHERE AGE_GROUP_ID > 3 GROUP BY CUSTOMER_ID;
  17. 17. SELECT CUSTOMER_ID FROM SALES_FACT; SELECT CUSTOMER_ID, SUM(UNIT_SALES) FROM SALES_FACT WHERE AGE_GROUP_ID > 3 GROUP BY CUSTOMER_ID; A smart(er) JDBC Layer 17© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Kettle Kettle Kettle Kettle Kettle JDBC Kettle Kettle
  18. 18. The gains 18© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 • Job design and administration becomes trivial. • Runs the rich Kettle plugin environment directly on the nodes. • Performs much better than Hive. • The JDBC layer is pretty neat.
  19. 19. The caveats 19© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 • True parallel machine learning algorithms are rare and hard to design. • Not an actual production-ready design. • Clients might have caches, which must be notified by the BD store for updates.
  20. 20. © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 20 Demo!
  21. 21. © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 21 Thank you! Join the conversation. You can find us on: blog.pentaho.com @Pentaho Facebook.com/Pentaho Pentaho Business Analytics
  22. 22. © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 22 Want to learn more? Learning Linear Models in Hadoop with Weka http://markahall.blogspot.ca/2013/03/learning-linear-models-in-hadoop-with.html Introduction to MapReduce with Pentaho Data Integration http://www.youtube.com/watch?v=KZe1UugxXcs `
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×