SlideShare a Scribd company logo
Reducing the Implementation Efforts of
Hadoop, NoSQL & Analytical Databases
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Goals for Today
2
To understand:
•  Landscape of Big Data
Databases Today
•  Current Approaches to
Implementation
•  How Pentaho
Reduces Time, Effort,
and Complexity
The Big Data Landscape
3
- Most businesses have
1 or more of these
databases
- Often used together
to form a complete
solution
- Typically created /
sold / supported by
different vendor
- Different programming
interfaces for
development and
administration
NoSQL
Hadoop Analytical
Transactional /DW
The Big Data Landscape
4
- Most businesses
have 1 or more of
these databases
- Often used together
to form a complete
solution
- Typically created /
sold / supported by
different vendor
- Different
programming
interfaces for
development and
administration
NoSQL
Hadoop
Analytical
Transactional /DW
Transactional Databases
•  Your everyday, garden-variety database
•  Used to store data for line of business applications (ERP, CRM, HR)
•  Serves as the “system of record” for business facts, e.g.
–  Inventory levels
–  Customer account status
–  Payroll data
•  Offers rich sources of analytical information (potential facts and
dimensions)
•  Fast and reliable for INSERT|UPDATE|DELETE operations
5
Hadoop, HDFS and Hive
•  Hadoop is a set of related open source Apache projects
•  MapReduce is a programming framework
•  HDFS is the Hadoop Distributed File System
–  Used to store structured (tabular) and unstructured files
•  Structured files in HDFS can be queried via Hive
–  Hive provides a basic SQL interface on top of HDFS files
–  Hive generates MapReduce programs to fulfill the SQL requests
•  Ideal for:
–  Unstructured data that doesn’t fit in a Transactional or NoSQL database
–  Email, twitter posts, images, voice logs, etc.
6
NoSQL Databases
•  Non-relational databases
–  No (or very limited) support for the SQL language
•  Developers use programs, not queries to access data
–  Flexible definition of structure (“extend as you go”)
•  Limited metadata
–  No referential integrity / limited support for transactions
•  But:
–  They can ingest (capture and persist) very large amounts of data with
very low overhead, across multiple servers
•  1000’s (to 100’s of thousands) of inserts per second
–  Excellent for large numbers of reads based on key values
•  Millions of reads per second
7
Analytical Databases
•  Databases designed for Business Intelligence
–  Offer full support for SQL and relational metadata
–  Use specialized techniques to improve query performance
•  Compression
•  Column-based storage (e.g. “M” or “F”)
•  Clusters of independent servers
•  Advanced mathematics for index & query optimization
–  Support high-speed “bulk” inserts of structured data
–  May be offered as standalone software or via an appliance
•  Ideal for:
–  Answering summary queries requiring complex filters, joins & groupings
–  Online Analytical Processing (OLAP) clients
8
Current Implementation Approaches
Complex and Time Consuming
9
Load/
Ingest
Manipulate/
Process Access Model
Visualize/
Analyze Predict Decisions
Ingest data
into big
data store
Make data
ready for
analysis
• Transform
• Aggregate
• Strucure-ize
Direct to
big data
store OR
Move slice
to DM/DW
Build
metadata
model:
•  Dimensions
•  Measures
•  Hierarchies
Report
Dashboard
Analyze
Pivot
Drill
Act, then
iterate
based on
insights
STRUCTURED DATA
CRM, POS, ERP,
etc.
UNSTRUCTURED DATA
Forecasting
Clustering
Classification
Association
Regression
Coding Coding Coding Coding CodingCoding
Current
Developer
Approach:
ETL Tool BI Tool Model
Builder
BI Tool Data
Mining Tool
Current
IT
Approach:
Coding Coding
A Smarter Approach
10
Load/
Ingest
Manipulate/
Process Access Model
Visualize/
Analyze Predict Decisions
Ingest data
into big
data store
Make data
ready for
analysis
• Transform
• Aggregate
• Strucure-ize
Direct to
big data
store OR
Move slice
to DM/DW
Build
metadata
model:
•  Dimensions
•  Measures
•  Hierarchies
Report
Dashboard
Analyze
Pivot
Drill
Act, then
iterate
based on
insights
STRUCTURED DATA
CRM, POS, ERP,
etc.
UNSTRUCTURED DATA
Forecasting
Clustering
Classification
Association
Regression
Coding Coding Coding Coding CodingCoding
Current
Developer
Approach:
ETL Tool BI Tool Model
Builder
BI Tool Data
Mining Tool
Current
IT
Approach:
Coding Coding
Business Analytics for Big Data
Complete platform from data to analytics
–  Continuity of one platform
–  Eliminate disjointed steps
–  Days instead of weeks
High productivity visual development
–  Use existing skills for Hadoop
–  Visual development for MapReduce jobs
–  Visual development for Sqoop, Oozie
Fast performance within Hadoop
–  Parallel execution of MapReduce
jobs across the Hadoop cluster
Visual Data
Preparation
Coding VS. Pentaho MapReduce
11
Pentaho Visual Development for Big Data
15x Faster to Design & Deploy Big Data Analytics
Sample Scenario
•  A telecommunication company needs to:
–  Ingest, store and retrieve call detail records (CDRs)
from multiple digital switches
–  Maintain a detailed profile of its commercial
customers
–  Analyze the volume of calls over time by SIC code,
region, state, county and city
•  Which database technology would it use for each of
these applications?
12
Pentaho Data Integration for Big Data
13
•  Transform the data for reporting without coding
•  Execute in the Hadoop Cluster without java coding
•  Visually orchestrate job workflows
•  Visualize and analyze the data
Leverage Native Capabilities of Big Data Vendors
Complete Visual Data Management
and Big Data Analytics
Hadoop NoSQL Analytic Databases
Data Discovery,
Visualization
Enterprise & Ad
Hoc Reporting
Data Ingestion,
Manipulation,
Integration
Predictive
Analytics
14
Big Data Fabric

More Related Content

What's hot

Pentaho Analytics for MongoDB - presentation from MongoDB World 2014
Pentaho Analytics for MongoDB - presentation from MongoDB World 2014Pentaho Analytics for MongoDB - presentation from MongoDB World 2014
Pentaho Analytics for MongoDB - presentation from MongoDB World 2014
Pentaho
 
Embedded Analytics in Customer Success
Embedded Analytics in Customer SuccessEmbedded Analytics in Customer Success
Embedded Analytics in Customer Success
Pentaho
 
Big Data Predictions for 2015
Big Data Predictions for 2015 Big Data Predictions for 2015
Big Data Predictions for 2015
Pentaho
 
Up Your Analytics Game with Pentaho and Vertica
Up Your Analytics Game with Pentaho and Vertica Up Your Analytics Game with Pentaho and Vertica
Up Your Analytics Game with Pentaho and Vertica
Pentaho
 
Competitive edgewithmongod bandpentaho_2014sep_v3[1]
Competitive edgewithmongod bandpentaho_2014sep_v3[1]Competitive edgewithmongod bandpentaho_2014sep_v3[1]
Competitive edgewithmongod bandpentaho_2014sep_v3[1]
Pentaho
 
Data Integration and Advanced Analytics for MongoDB: Blend, Enrich and Analyz...
Data Integration and Advanced Analytics for MongoDB: Blend, Enrich and Analyz...Data Integration and Advanced Analytics for MongoDB: Blend, Enrich and Analyz...
Data Integration and Advanced Analytics for MongoDB: Blend, Enrich and Analyz...MongoDB
 
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB
 
MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...
MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...
MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...MongoDB
 
Unlocking data science in the enterprise - with Oracle and Cloudera
Unlocking data science in the enterprise - with Oracle and ClouderaUnlocking data science in the enterprise - with Oracle and Cloudera
Unlocking data science in the enterprise - with Oracle and Cloudera
Cloudera, Inc.
 
Open Analytics 2014 - Pedro Alves - Innovation though Open Source
Open Analytics 2014 - Pedro Alves - Innovation though Open SourceOpen Analytics 2014 - Pedro Alves - Innovation though Open Source
Open Analytics 2014 - Pedro Alves - Innovation though Open Source
OpenAnalytics Spain
 
BI congres 2016-4: Hoe groei je als organisatie in analytische maturiteit? - ...
BI congres 2016-4: Hoe groei je als organisatie in analytische maturiteit? - ...BI congres 2016-4: Hoe groei je als organisatie in analytische maturiteit? - ...
BI congres 2016-4: Hoe groei je als organisatie in analytische maturiteit? - ...
BICC Thomas More
 
MongoDB IoT City Tour STUTTGART: Analysing the Internet of Things. By, Pentaho
MongoDB IoT City Tour STUTTGART: Analysing the Internet of Things. By, PentahoMongoDB IoT City Tour STUTTGART: Analysing the Internet of Things. By, Pentaho
MongoDB IoT City Tour STUTTGART: Analysing the Internet of Things. By, Pentaho
MongoDB
 
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data WarehouseHybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
DataWorks Summit
 
BI congres 2016-2: Diving into weblog data with SAS on Hadoop - Lisa Truyers...
BI congres 2016-2: Diving into weblog data with SAS on Hadoop -  Lisa Truyers...BI congres 2016-2: Diving into weblog data with SAS on Hadoop -  Lisa Truyers...
BI congres 2016-2: Diving into weblog data with SAS on Hadoop - Lisa Truyers...
BICC Thomas More
 
BI congres 2014-5: from BI to big data - Jan Aertsen - Pentaho
BI congres 2014-5: from BI to big data - Jan Aertsen - PentahoBI congres 2014-5: from BI to big data - Jan Aertsen - Pentaho
BI congres 2014-5: from BI to big data - Jan Aertsen - Pentaho
BICC Thomas More
 
Ask bigger questions
Ask bigger questionsAsk bigger questions
Ask bigger questions
South West Data Meetup
 
Traditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonTraditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A Comparison
Capgemini
 
Big Data Discovery
Big Data DiscoveryBig Data Discovery
Big Data Discovery
Harald Erb
 
Stream based Data Integration
Stream based Data IntegrationStream based Data Integration
Stream based Data Integration
Jeffrey T. Pollock
 

What's hot (20)

Pentaho Analytics for MongoDB - presentation from MongoDB World 2014
Pentaho Analytics for MongoDB - presentation from MongoDB World 2014Pentaho Analytics for MongoDB - presentation from MongoDB World 2014
Pentaho Analytics for MongoDB - presentation from MongoDB World 2014
 
Embedded Analytics in Customer Success
Embedded Analytics in Customer SuccessEmbedded Analytics in Customer Success
Embedded Analytics in Customer Success
 
Big Data Predictions for 2015
Big Data Predictions for 2015 Big Data Predictions for 2015
Big Data Predictions for 2015
 
Up Your Analytics Game with Pentaho and Vertica
Up Your Analytics Game with Pentaho and Vertica Up Your Analytics Game with Pentaho and Vertica
Up Your Analytics Game with Pentaho and Vertica
 
Big Data for BI - Beyond the Hype - Pentaho
Big Data for BI - Beyond the Hype - PentahoBig Data for BI - Beyond the Hype - Pentaho
Big Data for BI - Beyond the Hype - Pentaho
 
Competitive edgewithmongod bandpentaho_2014sep_v3[1]
Competitive edgewithmongod bandpentaho_2014sep_v3[1]Competitive edgewithmongod bandpentaho_2014sep_v3[1]
Competitive edgewithmongod bandpentaho_2014sep_v3[1]
 
Data Integration and Advanced Analytics for MongoDB: Blend, Enrich and Analyz...
Data Integration and Advanced Analytics for MongoDB: Blend, Enrich and Analyz...Data Integration and Advanced Analytics for MongoDB: Blend, Enrich and Analyz...
Data Integration and Advanced Analytics for MongoDB: Blend, Enrich and Analyz...
 
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
 
MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...
MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...
MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...
 
Unlocking data science in the enterprise - with Oracle and Cloudera
Unlocking data science in the enterprise - with Oracle and ClouderaUnlocking data science in the enterprise - with Oracle and Cloudera
Unlocking data science in the enterprise - with Oracle and Cloudera
 
Open Analytics 2014 - Pedro Alves - Innovation though Open Source
Open Analytics 2014 - Pedro Alves - Innovation though Open SourceOpen Analytics 2014 - Pedro Alves - Innovation though Open Source
Open Analytics 2014 - Pedro Alves - Innovation though Open Source
 
BI congres 2016-4: Hoe groei je als organisatie in analytische maturiteit? - ...
BI congres 2016-4: Hoe groei je als organisatie in analytische maturiteit? - ...BI congres 2016-4: Hoe groei je als organisatie in analytische maturiteit? - ...
BI congres 2016-4: Hoe groei je als organisatie in analytische maturiteit? - ...
 
MongoDB IoT City Tour STUTTGART: Analysing the Internet of Things. By, Pentaho
MongoDB IoT City Tour STUTTGART: Analysing the Internet of Things. By, PentahoMongoDB IoT City Tour STUTTGART: Analysing the Internet of Things. By, Pentaho
MongoDB IoT City Tour STUTTGART: Analysing the Internet of Things. By, Pentaho
 
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data WarehouseHybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
 
BI congres 2016-2: Diving into weblog data with SAS on Hadoop - Lisa Truyers...
BI congres 2016-2: Diving into weblog data with SAS on Hadoop -  Lisa Truyers...BI congres 2016-2: Diving into weblog data with SAS on Hadoop -  Lisa Truyers...
BI congres 2016-2: Diving into weblog data with SAS on Hadoop - Lisa Truyers...
 
BI congres 2014-5: from BI to big data - Jan Aertsen - Pentaho
BI congres 2014-5: from BI to big data - Jan Aertsen - PentahoBI congres 2014-5: from BI to big data - Jan Aertsen - Pentaho
BI congres 2014-5: from BI to big data - Jan Aertsen - Pentaho
 
Ask bigger questions
Ask bigger questionsAsk bigger questions
Ask bigger questions
 
Traditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonTraditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A Comparison
 
Big Data Discovery
Big Data DiscoveryBig Data Discovery
Big Data Discovery
 
Stream based Data Integration
Stream based Data IntegrationStream based Data Integration
Stream based Data Integration
 

Viewers also liked

Data Is Your Next Product Opportunity
Data Is Your Next Product Opportunity Data Is Your Next Product Opportunity
Data Is Your Next Product Opportunity
Pentaho
 
Introduction to streaming and messaging flume,kafka,SQS,kinesis
Introduction to streaming and messaging  flume,kafka,SQS,kinesis Introduction to streaming and messaging  flume,kafka,SQS,kinesis
Introduction to streaming and messaging flume,kafka,SQS,kinesis
Omid Vahdaty
 
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Roland Bouman
 
Kettle: Pentaho Data Integration tool
Kettle: Pentaho Data Integration toolKettle: Pentaho Data Integration tool
Kettle: Pentaho Data Integration tool
Alex Rayón Jerez
 
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Chris Fregly
 
Pentaho Data Integration Introduction
Pentaho Data Integration IntroductionPentaho Data Integration Introduction
Pentaho Data Integration Introduction
mattcasters
 
Data Mashups for Analytics
Data Mashups for AnalyticsData Mashups for Analytics
Data Mashups for Analytics
Pentaho
 
Introduction To Pentaho
Introduction To PentahoIntroduction To Pentaho
Introduction To Pentaho
DataminingTools Inc
 

Viewers also liked (8)

Data Is Your Next Product Opportunity
Data Is Your Next Product Opportunity Data Is Your Next Product Opportunity
Data Is Your Next Product Opportunity
 
Introduction to streaming and messaging flume,kafka,SQS,kinesis
Introduction to streaming and messaging  flume,kafka,SQS,kinesis Introduction to streaming and messaging  flume,kafka,SQS,kinesis
Introduction to streaming and messaging flume,kafka,SQS,kinesis
 
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
 
Kettle: Pentaho Data Integration tool
Kettle: Pentaho Data Integration toolKettle: Pentaho Data Integration tool
Kettle: Pentaho Data Integration tool
 
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
 
Pentaho Data Integration Introduction
Pentaho Data Integration IntroductionPentaho Data Integration Introduction
Pentaho Data Integration Introduction
 
Data Mashups for Analytics
Data Mashups for AnalyticsData Mashups for Analytics
Data Mashups for Analytics
 
Introduction To Pentaho
Introduction To PentahoIntroduction To Pentaho
Introduction To Pentaho
 

Similar to Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQL and Analytical Databases

Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
Martin Bém
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
DataWorks Summit
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
Skillwise Group
 
Enterprise Data Warehousing Positioning
Enterprise Data Warehousing PositioningEnterprise Data Warehousing Positioning
Enterprise Data Warehousing Positioning
EdenH6
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
Skillwise Group
 
Atlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slidesAtlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slides
Qubole
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?
James Serra
 
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseUsing the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Rizaldy Ignacio
 
Accelerating Data Warehouse Modernization
Accelerating Data Warehouse ModernizationAccelerating Data Warehouse Modernization
Accelerating Data Warehouse Modernization
DataWorks Summit/Hadoop Summit
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
James Serra
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & Hadoop
Blackvard
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
James Serra
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
kcmallu
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
Ashnikbiz
 
Trafodion overview
Trafodion overviewTrafodion overview
Trafodion overview
Rohit Jain
 
Business Intelligence Architecture
Business Intelligence ArchitectureBusiness Intelligence Architecture
Business Intelligence Architecture
Philippe Julio
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Precisely
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
James Serra
 

Similar to Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQL and Analytical Databases (20)

Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Enterprise Data Warehousing Positioning
Enterprise Data Warehousing PositioningEnterprise Data Warehousing Positioning
Enterprise Data Warehousing Positioning
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Atlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slidesAtlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slides
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?
 
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseUsing the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
 
Accelerating Data Warehouse Modernization
Accelerating Data Warehouse ModernizationAccelerating Data Warehouse Modernization
Accelerating Data Warehouse Modernization
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & Hadoop
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 
4AA6-4492ENW
4AA6-4492ENW4AA6-4492ENW
4AA6-4492ENW
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
Trafodion overview
Trafodion overviewTrafodion overview
Trafodion overview
 
Business Intelligence Architecture
Business Intelligence ArchitectureBusiness Intelligence Architecture
Business Intelligence Architecture
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
 

More from Pentaho

Filling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview Presentation
Filling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview PresentationFilling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview Presentation
Filling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview Presentation
Pentaho
 
The Next Big Thing in Big Data
The Next Big Thing in Big DataThe Next Big Thing in Big Data
The Next Big Thing in Big Data
Pentaho
 
Improving the Business of Healthcare through Better Analytics
Improving the Business of Healthcare through Better Analytics Improving the Business of Healthcare through Better Analytics
Improving the Business of Healthcare through Better Analytics
Pentaho
 
Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...
Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...
Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...
Pentaho
 
Pentaho Healthcare Solutions
Pentaho Healthcare SolutionsPentaho Healthcare Solutions
Pentaho Healthcare SolutionsPentaho
 
Pentaho Business Analytics for ISVs and SaaS providers in healthcare
Pentaho Business Analytics for ISVs and SaaS providers in healthcarePentaho Business Analytics for ISVs and SaaS providers in healthcare
Pentaho Business Analytics for ISVs and SaaS providers in healthcarePentaho
 
Bay Area Hadoop User Group
Bay Area Hadoop User GroupBay Area Hadoop User Group
Bay Area Hadoop User Group
Pentaho
 

More from Pentaho (7)

Filling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview Presentation
Filling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview PresentationFilling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview Presentation
Filling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview Presentation
 
The Next Big Thing in Big Data
The Next Big Thing in Big DataThe Next Big Thing in Big Data
The Next Big Thing in Big Data
 
Improving the Business of Healthcare through Better Analytics
Improving the Business of Healthcare through Better Analytics Improving the Business of Healthcare through Better Analytics
Improving the Business of Healthcare through Better Analytics
 
Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...
Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...
Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...
 
Pentaho Healthcare Solutions
Pentaho Healthcare SolutionsPentaho Healthcare Solutions
Pentaho Healthcare Solutions
 
Pentaho Business Analytics for ISVs and SaaS providers in healthcare
Pentaho Business Analytics for ISVs and SaaS providers in healthcarePentaho Business Analytics for ISVs and SaaS providers in healthcare
Pentaho Business Analytics for ISVs and SaaS providers in healthcare
 
Bay Area Hadoop User Group
Bay Area Hadoop User GroupBay Area Hadoop User Group
Bay Area Hadoop User Group
 

Recently uploaded

Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
ViralQR
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 

Recently uploaded (20)

Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 

Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQL and Analytical Databases

  • 1. Reducing the Implementation Efforts of Hadoop, NoSQL & Analytical Databases © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 2. Goals for Today 2 To understand: •  Landscape of Big Data Databases Today •  Current Approaches to Implementation •  How Pentaho Reduces Time, Effort, and Complexity
  • 3. The Big Data Landscape 3 - Most businesses have 1 or more of these databases - Often used together to form a complete solution - Typically created / sold / supported by different vendor - Different programming interfaces for development and administration NoSQL Hadoop Analytical Transactional /DW
  • 4. The Big Data Landscape 4 - Most businesses have 1 or more of these databases - Often used together to form a complete solution - Typically created / sold / supported by different vendor - Different programming interfaces for development and administration NoSQL Hadoop Analytical Transactional /DW
  • 5. Transactional Databases •  Your everyday, garden-variety database •  Used to store data for line of business applications (ERP, CRM, HR) •  Serves as the “system of record” for business facts, e.g. –  Inventory levels –  Customer account status –  Payroll data •  Offers rich sources of analytical information (potential facts and dimensions) •  Fast and reliable for INSERT|UPDATE|DELETE operations 5
  • 6. Hadoop, HDFS and Hive •  Hadoop is a set of related open source Apache projects •  MapReduce is a programming framework •  HDFS is the Hadoop Distributed File System –  Used to store structured (tabular) and unstructured files •  Structured files in HDFS can be queried via Hive –  Hive provides a basic SQL interface on top of HDFS files –  Hive generates MapReduce programs to fulfill the SQL requests •  Ideal for: –  Unstructured data that doesn’t fit in a Transactional or NoSQL database –  Email, twitter posts, images, voice logs, etc. 6
  • 7. NoSQL Databases •  Non-relational databases –  No (or very limited) support for the SQL language •  Developers use programs, not queries to access data –  Flexible definition of structure (“extend as you go”) •  Limited metadata –  No referential integrity / limited support for transactions •  But: –  They can ingest (capture and persist) very large amounts of data with very low overhead, across multiple servers •  1000’s (to 100’s of thousands) of inserts per second –  Excellent for large numbers of reads based on key values •  Millions of reads per second 7
  • 8. Analytical Databases •  Databases designed for Business Intelligence –  Offer full support for SQL and relational metadata –  Use specialized techniques to improve query performance •  Compression •  Column-based storage (e.g. “M” or “F”) •  Clusters of independent servers •  Advanced mathematics for index & query optimization –  Support high-speed “bulk” inserts of structured data –  May be offered as standalone software or via an appliance •  Ideal for: –  Answering summary queries requiring complex filters, joins & groupings –  Online Analytical Processing (OLAP) clients 8
  • 9. Current Implementation Approaches Complex and Time Consuming 9 Load/ Ingest Manipulate/ Process Access Model Visualize/ Analyze Predict Decisions Ingest data into big data store Make data ready for analysis • Transform • Aggregate • Strucure-ize Direct to big data store OR Move slice to DM/DW Build metadata model: •  Dimensions •  Measures •  Hierarchies Report Dashboard Analyze Pivot Drill Act, then iterate based on insights STRUCTURED DATA CRM, POS, ERP, etc. UNSTRUCTURED DATA Forecasting Clustering Classification Association Regression Coding Coding Coding Coding CodingCoding Current Developer Approach: ETL Tool BI Tool Model Builder BI Tool Data Mining Tool Current IT Approach: Coding Coding
  • 10. A Smarter Approach 10 Load/ Ingest Manipulate/ Process Access Model Visualize/ Analyze Predict Decisions Ingest data into big data store Make data ready for analysis • Transform • Aggregate • Strucure-ize Direct to big data store OR Move slice to DM/DW Build metadata model: •  Dimensions •  Measures •  Hierarchies Report Dashboard Analyze Pivot Drill Act, then iterate based on insights STRUCTURED DATA CRM, POS, ERP, etc. UNSTRUCTURED DATA Forecasting Clustering Classification Association Regression Coding Coding Coding Coding CodingCoding Current Developer Approach: ETL Tool BI Tool Model Builder BI Tool Data Mining Tool Current IT Approach: Coding Coding Business Analytics for Big Data
  • 11. Complete platform from data to analytics –  Continuity of one platform –  Eliminate disjointed steps –  Days instead of weeks High productivity visual development –  Use existing skills for Hadoop –  Visual development for MapReduce jobs –  Visual development for Sqoop, Oozie Fast performance within Hadoop –  Parallel execution of MapReduce jobs across the Hadoop cluster Visual Data Preparation Coding VS. Pentaho MapReduce 11 Pentaho Visual Development for Big Data 15x Faster to Design & Deploy Big Data Analytics
  • 12. Sample Scenario •  A telecommunication company needs to: –  Ingest, store and retrieve call detail records (CDRs) from multiple digital switches –  Maintain a detailed profile of its commercial customers –  Analyze the volume of calls over time by SIC code, region, state, county and city •  Which database technology would it use for each of these applications? 12
  • 13. Pentaho Data Integration for Big Data 13 •  Transform the data for reporting without coding •  Execute in the Hadoop Cluster without java coding •  Visually orchestrate job workflows •  Visualize and analyze the data Leverage Native Capabilities of Big Data Vendors
  • 14. Complete Visual Data Management and Big Data Analytics Hadoop NoSQL Analytic Databases Data Discovery, Visualization Enterprise & Ad Hoc Reporting Data Ingestion, Manipulation, Integration Predictive Analytics 14 Big Data Fabric