• Like

Bay Area Hadoop User Group

  • 604 views
Uploaded on

Accelerated Analytics for the Big Data Fabric

Accelerated Analytics for the Big Data Fabric

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
604
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
11
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Leveraging PDI to incorporate Big Data into your data fabric provides immediate access to analytics, examples: Batch and Ad Hoc reporting directly against Big Data Data sources using familiar BI tools with no coding – Report Designer, Interactive Reporting Agile framework to quickly generate/house/manage data marts for interactive analysis, data discovery, etc.

Transcript

  • 1. Accelerated Analytics for the Big Data Fabric Bay Area Hadoop User Group © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 2. AGENDA The Big Data Fabric Big Data Preparation – An Everyday Challenge Use-Case Scenario – Call Volume Analysis  Solution Requirements  Solution Workflow  Phase I - Data Preparation & Visualization  Phase II - Pentaho MapReduce & Orchestration Summary 2 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 3. The Big Data Fabric Data Integration Big Analytics Pentaho Business Analytics 3rd Party Tools R Visualization Dashboards 3rd Party BI Tools Interactive Analysis Reports ApplicationsData Integration SchedulingJob Orchestration High Performance Workflow Visual IDE Hadoop Analytic Databases NoSQL Databases Big Data Mgmt 3
  • 4. Preparing Big Data for Analysis is an Everyday Challenge • Very technical skills required • Divide between M-R developers & analysts • Beyond the reach of many organizations 4 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 5. Pentaho Visual MapReduce Accessible by any ETL developer, business analyst or data scientist Executes inside Hadoop as a native Java MapReduce task © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 5
  • 6. Pentaho Reporting & Analytics Batch Reporting and Ad Hoc Query Data Visualization, Discovery and AnalysisHadoop NoSQL Hybrid 6 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 7. Use Case Scenario – Call Volume Analysis• VOIP service provider has excess capacity and is considering expansion to consumer markets• Business Analyst: what are the top 10 states for inbound calls on Fridays, Saturdays and Sundays?• Research data available: – Call records – date/timestamp & destination phone # ? – NANP (North American Numbering Plan) data – area code by country, state & time zone 7 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 8. Solution Requirements• Data Preparation – Access the call records in HDFS – Extract the destination area code for each call – Read the area code reference data – Lookup country, state and time zone by area code, append to each record – Filter out records (non-U.S. calls, calls made on M-Tu-W-Th) – Load to a relational database – Generate metadata• Analysis – Explore data multi-dimensionally – Find the top-10 states by inbound call volume – Navigate via a geospatial interface• Deployment – Deploy in MapReduce to handle larger data volumes 8 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 9. Solution Workflow• Phase I - Business Analysts – Use a data extract to prepare and validate their analyses – Iterate over requirements with executives and stake-holders• Phase II - MapReduce Developers/Analysts – Create production Pentaho MapReduce transformations – Manage the deployment and orchestration between the Hadoop cluster and the production database 9 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 10. Data Preparation (Phase I)• The data pipeline implements the data preparation logic• Each component has a “personality”– access, calculate, join, filter …• Free-form design – As many or as few inputs, transformations and outputs as needed• Schema contract exists only between connected components• Pipelined, multi-threaded for performance• 100% Java-based for deployment flexibility 10 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 11. Data Pipeline – Input from HDFS 11 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 12. Data Pipeline - Calculator 12 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 13. Data Pipeline – Stream Lookup 13 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 14. Data Pipeline – Row Filter 14 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 15. Data Pipeline – Table Output 15 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 16. Visualization – Multi-Dimensional UX 16 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 17. Visualization – Geographic 17 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 18. Visualization - Heatmap 18 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 19. Deployment to Hadoop (Phase II)• To process a larger set of data we can deploy the data pipeline via MapReduce – Input and output streams are encoded in key-value pairs – Two specialized components provide an interface: – A special job component deploys the data pipeline to the Hadoop cluster: 19 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 20. Pentaho MapReduce – Inputs/Outputs The core logic of the data pipeline is identical … only the ends change ........ 20 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 21. Pentaho MapReduce – Orchestration 21 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 22. Instant Analytics (Roadmap)Choose a Big Data Source,Answer a Few Questions, Publish to Pentaho Report, Explore and Analyze Customize Model (Optional) 22 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 23. SUMMARY1. The Big Data Fabric encompasses a large collection of Hadoop distributions, NoSQL and analytical databases2. A component-based approach to data access and integration can: – Allow business analysts and data scientists to perform their own data preparation – Result in more rapid validation of business requirements & metrics – Be used to create data pipelines that can be deployed directly to a cluster, enabling analytics against much larger data sets – Support orchestration across environments 23 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 24. Summary 24© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 25. Thank YouJoin the conversation. You can find us on: http://blog.pentaho.com @Pentaho Facebook.com/Pentaho Pentaho Business Analytics © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555