Reducing the Implementation Efforts ofHadoop, NoSQL & Analytical Databases© 2013, Pentaho. All Rights Reserved. pentaho.co...
Goals for Today2To understand:•  Landscape of Big DataDatabases Today•  Current Approaches toImplementation•  How PentahoR...
The Big Data Landscape3- Most businesses have1 or more of thesedatabases- Often used togetherto form a completesolution- T...
The Big Data Landscape4- Most businesseshave 1 or more ofthese databases- Often used togetherto form a completesolution- T...
Transactional Databases•  Your everyday, garden-variety database•  Used to store data for line of business applications (E...
Hadoop, HDFS and Hive•  Hadoop is a set of related open source Apache projects•  MapReduce is a programming framework•  HD...
NoSQL Databases•  Non-relational databases–  No (or very limited) support for the SQL language•  Developers use programs, ...
Analytical Databases•  Databases designed for Business Intelligence–  Offer full support for SQL and relational metadata– ...
Current Implementation ApproachesComplex and Time Consuming9Load/IngestManipulate/Process Access ModelVisualize/Analyze Pr...
A Smarter Approach10Load/IngestManipulate/Process Access ModelVisualize/Analyze Predict DecisionsIngest datainto bigdata s...
Complete platform from data to analytics–  Continuity of one platform–  Eliminate disjointed steps–  Days instead of weeks...
Sample Scenario•  A telecommunication company needs to:–  Ingest, store and retrieve call detail records (CDRs)from multip...
Pentaho Data Integration for Big Data13•  Transform the data for reporting without coding•  Execute in the Hadoop Cluster ...
Complete Visual Data Managementand Big Data AnalyticsHadoop NoSQL Analytic DatabasesData Discovery,VisualizationEnterprise...
Upcoming SlideShare
Loading in …5
×

Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQL and Analytical Databases

927 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
927
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
24
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQL and Analytical Databases

  1. 1. Reducing the Implementation Efforts ofHadoop, NoSQL & Analytical Databases© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  2. 2. Goals for Today2To understand:•  Landscape of Big DataDatabases Today•  Current Approaches toImplementation•  How PentahoReduces Time, Effort,and Complexity
  3. 3. The Big Data Landscape3- Most businesses have1 or more of thesedatabases- Often used togetherto form a completesolution- Typically created /sold / supported bydifferent vendor- Different programminginterfaces fordevelopment andadministrationNoSQLHadoop AnalyticalTransactional /DW
  4. 4. The Big Data Landscape4- Most businesseshave 1 or more ofthese databases- Often used togetherto form a completesolution- Typically created /sold / supported bydifferent vendor- Differentprogramminginterfaces fordevelopment andadministrationNoSQLHadoopAnalyticalTransactional /DW
  5. 5. Transactional Databases•  Your everyday, garden-variety database•  Used to store data for line of business applications (ERP, CRM, HR)•  Serves as the “system of record” for business facts, e.g.–  Inventory levels–  Customer account status–  Payroll data•  Offers rich sources of analytical information (potential facts anddimensions)•  Fast and reliable for INSERT|UPDATE|DELETE operations5
  6. 6. Hadoop, HDFS and Hive•  Hadoop is a set of related open source Apache projects•  MapReduce is a programming framework•  HDFS is the Hadoop Distributed File System–  Used to store structured (tabular) and unstructured files•  Structured files in HDFS can be queried via Hive–  Hive provides a basic SQL interface on top of HDFS files–  Hive generates MapReduce programs to fulfill the SQL requests•  Ideal for:–  Unstructured data that doesn’t fit in a Transactional or NoSQL database–  Email, twitter posts, images, voice logs, etc.6
  7. 7. NoSQL Databases•  Non-relational databases–  No (or very limited) support for the SQL language•  Developers use programs, not queries to access data–  Flexible definition of structure (“extend as you go”)•  Limited metadata–  No referential integrity / limited support for transactions•  But:–  They can ingest (capture and persist) very large amounts of data withvery low overhead, across multiple servers•  1000’s (to 100’s of thousands) of inserts per second–  Excellent for large numbers of reads based on key values•  Millions of reads per second7
  8. 8. Analytical Databases•  Databases designed for Business Intelligence–  Offer full support for SQL and relational metadata–  Use specialized techniques to improve query performance•  Compression•  Column-based storage (e.g. “M” or “F”)•  Clusters of independent servers•  Advanced mathematics for index & query optimization–  Support high-speed “bulk” inserts of structured data–  May be offered as standalone software or via an appliance•  Ideal for:–  Answering summary queries requiring complex filters, joins & groupings–  Online Analytical Processing (OLAP) clients8
  9. 9. Current Implementation ApproachesComplex and Time Consuming9Load/IngestManipulate/Process Access ModelVisualize/Analyze Predict DecisionsIngest datainto bigdata storeMake dataready foranalysis• Transform• Aggregate• Strucure-izeDirect tobig datastore ORMove sliceto DM/DWBuildmetadatamodel:•  Dimensions•  Measures•  HierarchiesReportDashboardAnalyzePivotDrillAct, theniteratebased oninsightsSTRUCTURED DATACRM, POS, ERP,etc.UNSTRUCTURED DATAForecastingClusteringClassificationAssociationRegressionCoding Coding Coding Coding CodingCodingCurrentDeveloperApproach:ETL Tool BI Tool ModelBuilderBI Tool DataMining ToolCurrentITApproach:Coding Coding
  10. 10. A Smarter Approach10Load/IngestManipulate/Process Access ModelVisualize/Analyze Predict DecisionsIngest datainto bigdata storeMake dataready foranalysis• Transform• Aggregate• Strucure-izeDirect tobig datastore ORMove sliceto DM/DWBuildmetadatamodel:•  Dimensions•  Measures•  HierarchiesReportDashboardAnalyzePivotDrillAct, theniteratebased oninsightsSTRUCTURED DATACRM, POS, ERP,etc.UNSTRUCTURED DATAForecastingClusteringClassificationAssociationRegressionCoding Coding Coding Coding CodingCodingCurrentDeveloperApproach:ETL Tool BI Tool ModelBuilderBI Tool DataMining ToolCurrentITApproach:Coding CodingBusiness Analytics for Big Data
  11. 11. Complete platform from data to analytics–  Continuity of one platform–  Eliminate disjointed steps–  Days instead of weeksHigh productivity visual development–  Use existing skills for Hadoop–  Visual development for MapReduce jobs–  Visual development for Sqoop, OozieFast performance within Hadoop–  Parallel execution of MapReducejobs across the Hadoop clusterVisual DataPreparationCoding VS. Pentaho MapReduce11Pentaho Visual Development for Big Data15x Faster to Design & Deploy Big Data Analytics
  12. 12. Sample Scenario•  A telecommunication company needs to:–  Ingest, store and retrieve call detail records (CDRs)from multiple digital switches–  Maintain a detailed profile of its commercialcustomers–  Analyze the volume of calls over time by SIC code,region, state, county and city•  Which database technology would it use for each ofthese applications?12
  13. 13. Pentaho Data Integration for Big Data13•  Transform the data for reporting without coding•  Execute in the Hadoop Cluster without java coding•  Visually orchestrate job workflows•  Visualize and analyze the dataLeverage Native Capabilities of Big Data Vendors
  14. 14. Complete Visual Data Managementand Big Data AnalyticsHadoop NoSQL Analytic DatabasesData Discovery,VisualizationEnterprise & AdHoc ReportingData Ingestion,Manipulation,IntegrationPredictiveAnalytics14Big Data Fabric

×