Your SlideShare is downloading. ×
0
Accelerated Analytics for the Big Data Fabric       Bay Area Hadoop User Group       © 2012, Pentaho. All Rights Reserved....
AGENDA The Big Data Fabric Big Data Preparation – An Everyday Challenge Use-Case Scenario – Call Volume Analysis    So...
The Big Data Fabric                                                                                Data Integration Big An...
Preparing Big Data for Analysis          is an Everyday Challenge                                             •        Ver...
Pentaho Visual MapReduce                                           Accessible by any ETL                                  ...
Pentaho Reporting & Analytics          Batch Reporting         and Ad Hoc Query                                           ...
Use Case Scenario – Call Volume Analysis• VOIP service provider has excess capacity and is  considering expansion to consu...
Solution Requirements• Data Preparation   – Access the call records in HDFS   – Extract the destination area code for each...
Solution Workflow• Phase I - Business Analysts   – Use a data extract to prepare and validate their analyses   – Iterate o...
Data Preparation (Phase I)• The data pipeline implements the data preparation logic• Each component has a “personality”– a...
Data Pipeline – Input from HDFS                                                                                      11   ...
Data Pipeline - Calculator                                                                                   12  © 2012, P...
Data Pipeline – Stream Lookup                                                                                     13    © ...
Data Pipeline – Row Filter                                                                                   14  © 2012, P...
Data Pipeline – Table Output                                                                                    15   © 201...
Visualization – Multi-Dimensional UX                                                                                      ...
Visualization – Geographic                                                                                   17  © 2012, P...
Visualization - Heatmap                                                                                  18 © 2012, Pentah...
Deployment to Hadoop (Phase II)• To process a larger set of data we can deploy the data pipeline via  MapReduce    – Input...
Pentaho MapReduce – Inputs/Outputs      The core logic of the data pipeline is       identical … only the ends change     ...
Pentaho MapReduce – Orchestration                                                                                        2...
Instant Analytics (Roadmap)Choose a Big Data Source,Answer a Few Questions,   Publish to Pentaho                          ...
SUMMARY1. The Big Data Fabric encompasses a large collection of Hadoop   distributions, NoSQL and analytical databases2. A...
Summary                                                                                 24© 2012, Pentaho. All Rights Rese...
Thank YouJoin the conversation. You can find us on:     http://blog.pentaho.com     @Pentaho     Facebook.com/Pentaho     ...
Upcoming SlideShare
Loading in...5
×

Bay Area Hadoop User Group

645

Published on

Accelerated Analytics for the Big Data Fabric

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
645
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Leveraging PDI to incorporate Big Data into your data fabric provides immediate access to analytics, examples: Batch and Ad Hoc reporting directly against Big Data Data sources using familiar BI tools with no coding – Report Designer, Interactive Reporting Agile framework to quickly generate/house/manage data marts for interactive analysis, data discovery, etc.
  • Transcript of "Bay Area Hadoop User Group"

    1. 1. Accelerated Analytics for the Big Data Fabric Bay Area Hadoop User Group © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
    2. 2. AGENDA The Big Data Fabric Big Data Preparation – An Everyday Challenge Use-Case Scenario – Call Volume Analysis  Solution Requirements  Solution Workflow  Phase I - Data Preparation & Visualization  Phase II - Pentaho MapReduce & Orchestration Summary 2 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
    3. 3. The Big Data Fabric Data Integration Big Analytics Pentaho Business Analytics 3rd Party Tools R Visualization Dashboards 3rd Party BI Tools Interactive Analysis Reports ApplicationsData Integration SchedulingJob Orchestration High Performance Workflow Visual IDE Hadoop Analytic Databases NoSQL Databases Big Data Mgmt 3
    4. 4. Preparing Big Data for Analysis is an Everyday Challenge • Very technical skills required • Divide between M-R developers & analysts • Beyond the reach of many organizations 4 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
    5. 5. Pentaho Visual MapReduce Accessible by any ETL developer, business analyst or data scientist Executes inside Hadoop as a native Java MapReduce task © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 5
    6. 6. Pentaho Reporting & Analytics Batch Reporting and Ad Hoc Query Data Visualization, Discovery and AnalysisHadoop NoSQL Hybrid 6 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
    7. 7. Use Case Scenario – Call Volume Analysis• VOIP service provider has excess capacity and is considering expansion to consumer markets• Business Analyst: what are the top 10 states for inbound calls on Fridays, Saturdays and Sundays?• Research data available: – Call records – date/timestamp & destination phone # ? – NANP (North American Numbering Plan) data – area code by country, state & time zone 7 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
    8. 8. Solution Requirements• Data Preparation – Access the call records in HDFS – Extract the destination area code for each call – Read the area code reference data – Lookup country, state and time zone by area code, append to each record – Filter out records (non-U.S. calls, calls made on M-Tu-W-Th) – Load to a relational database – Generate metadata• Analysis – Explore data multi-dimensionally – Find the top-10 states by inbound call volume – Navigate via a geospatial interface• Deployment – Deploy in MapReduce to handle larger data volumes 8 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
    9. 9. Solution Workflow• Phase I - Business Analysts – Use a data extract to prepare and validate their analyses – Iterate over requirements with executives and stake-holders• Phase II - MapReduce Developers/Analysts – Create production Pentaho MapReduce transformations – Manage the deployment and orchestration between the Hadoop cluster and the production database 9 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
    10. 10. Data Preparation (Phase I)• The data pipeline implements the data preparation logic• Each component has a “personality”– access, calculate, join, filter …• Free-form design – As many or as few inputs, transformations and outputs as needed• Schema contract exists only between connected components• Pipelined, multi-threaded for performance• 100% Java-based for deployment flexibility 10 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
    11. 11. Data Pipeline – Input from HDFS 11 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
    12. 12. Data Pipeline - Calculator 12 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
    13. 13. Data Pipeline – Stream Lookup 13 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
    14. 14. Data Pipeline – Row Filter 14 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
    15. 15. Data Pipeline – Table Output 15 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
    16. 16. Visualization – Multi-Dimensional UX 16 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
    17. 17. Visualization – Geographic 17 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
    18. 18. Visualization - Heatmap 18 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
    19. 19. Deployment to Hadoop (Phase II)• To process a larger set of data we can deploy the data pipeline via MapReduce – Input and output streams are encoded in key-value pairs – Two specialized components provide an interface: – A special job component deploys the data pipeline to the Hadoop cluster: 19 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
    20. 20. Pentaho MapReduce – Inputs/Outputs The core logic of the data pipeline is identical … only the ends change ........ 20 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
    21. 21. Pentaho MapReduce – Orchestration 21 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
    22. 22. Instant Analytics (Roadmap)Choose a Big Data Source,Answer a Few Questions, Publish to Pentaho Report, Explore and Analyze Customize Model (Optional) 22 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
    23. 23. SUMMARY1. The Big Data Fabric encompasses a large collection of Hadoop distributions, NoSQL and analytical databases2. A component-based approach to data access and integration can: – Allow business analysts and data scientists to perform their own data preparation – Result in more rapid validation of business requirements & metrics – Be used to create data pipelines that can be deployed directly to a cluster, enabling analytics against much larger data sets – Support orchestration across environments 23 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
    24. 24. Summary 24© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
    25. 25. Thank YouJoin the conversation. You can find us on: http://blog.pentaho.com @Pentaho Facebook.com/Pentaho Pentaho Business Analytics © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×