Bay Area Hadoop User Group
 

Bay Area Hadoop User Group

on

  • 855 views

Accelerated Analytics for the Big Data Fabric

Accelerated Analytics for the Big Data Fabric

Statistics

Views

Total Views
855
Views on SlideShare
855
Embed Views
0

Actions

Likes
0
Downloads
11
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Leveraging PDI to incorporate Big Data into your data fabric provides immediate access to analytics, examples: Batch and Ad Hoc reporting directly against Big Data Data sources using familiar BI tools with no coding – Report Designer, Interactive Reporting Agile framework to quickly generate/house/manage data marts for interactive analysis, data discovery, etc.

Bay Area Hadoop User Group Bay Area Hadoop User Group Presentation Transcript

  • Accelerated Analytics for the Big Data Fabric Bay Area Hadoop User Group © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • AGENDA The Big Data Fabric Big Data Preparation – An Everyday Challenge Use-Case Scenario – Call Volume Analysis  Solution Requirements  Solution Workflow  Phase I - Data Preparation & Visualization  Phase II - Pentaho MapReduce & Orchestration Summary 2 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • The Big Data Fabric Data Integration Big Analytics Pentaho Business Analytics 3rd Party Tools R Visualization Dashboards 3rd Party BI Tools Interactive Analysis Reports ApplicationsData Integration SchedulingJob Orchestration High Performance Workflow Visual IDE Hadoop Analytic Databases NoSQL Databases Big Data Mgmt 3 View slide
  • Preparing Big Data for Analysis is an Everyday Challenge • Very technical skills required • Divide between M-R developers & analysts • Beyond the reach of many organizations 4 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 View slide
  • Pentaho Visual MapReduce Accessible by any ETL developer, business analyst or data scientist Executes inside Hadoop as a native Java MapReduce task © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 5
  • Pentaho Reporting & Analytics Batch Reporting and Ad Hoc Query Data Visualization, Discovery and AnalysisHadoop NoSQL Hybrid 6 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • Use Case Scenario – Call Volume Analysis• VOIP service provider has excess capacity and is considering expansion to consumer markets• Business Analyst: what are the top 10 states for inbound calls on Fridays, Saturdays and Sundays?• Research data available: – Call records – date/timestamp & destination phone # ? – NANP (North American Numbering Plan) data – area code by country, state & time zone 7 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • Solution Requirements• Data Preparation – Access the call records in HDFS – Extract the destination area code for each call – Read the area code reference data – Lookup country, state and time zone by area code, append to each record – Filter out records (non-U.S. calls, calls made on M-Tu-W-Th) – Load to a relational database – Generate metadata• Analysis – Explore data multi-dimensionally – Find the top-10 states by inbound call volume – Navigate via a geospatial interface• Deployment – Deploy in MapReduce to handle larger data volumes 8 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • Solution Workflow• Phase I - Business Analysts – Use a data extract to prepare and validate their analyses – Iterate over requirements with executives and stake-holders• Phase II - MapReduce Developers/Analysts – Create production Pentaho MapReduce transformations – Manage the deployment and orchestration between the Hadoop cluster and the production database 9 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • Data Preparation (Phase I)• The data pipeline implements the data preparation logic• Each component has a “personality”– access, calculate, join, filter …• Free-form design – As many or as few inputs, transformations and outputs as needed• Schema contract exists only between connected components• Pipelined, multi-threaded for performance• 100% Java-based for deployment flexibility 10 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • Data Pipeline – Input from HDFS 11 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • Data Pipeline - Calculator 12 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • Data Pipeline – Stream Lookup 13 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • Data Pipeline – Row Filter 14 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • Data Pipeline – Table Output 15 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • Visualization – Multi-Dimensional UX 16 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • Visualization – Geographic 17 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • Visualization - Heatmap 18 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • Deployment to Hadoop (Phase II)• To process a larger set of data we can deploy the data pipeline via MapReduce – Input and output streams are encoded in key-value pairs – Two specialized components provide an interface: – A special job component deploys the data pipeline to the Hadoop cluster: 19 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • Pentaho MapReduce – Inputs/Outputs The core logic of the data pipeline is identical … only the ends change ........ 20 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • Pentaho MapReduce – Orchestration 21 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • Instant Analytics (Roadmap)Choose a Big Data Source,Answer a Few Questions, Publish to Pentaho Report, Explore and Analyze Customize Model (Optional) 22 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • SUMMARY1. The Big Data Fabric encompasses a large collection of Hadoop distributions, NoSQL and analytical databases2. A component-based approach to data access and integration can: – Allow business analysts and data scientists to perform their own data preparation – Result in more rapid validation of business requirements & metrics – Be used to create data pipelines that can be deployed directly to a cluster, enabling analytics against much larger data sets – Support orchestration across environments 23 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • Summary 24© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • Thank YouJoin the conversation. You can find us on: http://blog.pentaho.com @Pentaho Facebook.com/Pentaho Pentaho Business Analytics © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555