SlideShare a Scribd company logo
Getting Started & Successful
with Big Data
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
@Pentaho #BigDataWebSeries
Your Hosts Today
Paul Brook
Cloud EMEA Program Manager
Dell
2© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Davy Nys
VP EMEA & APAC
Pentaho
Chuck Yarbrough
Technical Solutions Marketing
Pentaho
Pentaho Webinar Series
3© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Sign-up at: pentaho.com
Goals for Today
4© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
To Understand:
• How to get a Hadoop
cluster up and running
• Where Hadoop and
other pieces fit into the
architecture
• How you can easily get
data in & out Hadoop
• How to leverage
Hadoop with Pentaho
• Initial Best Practices
Complete Analytics and Visual Data Management
HadoopNoSQL Databases
Data Discovery
&
Visualization
Enterprise
&
Ad Hoc Reporting
Predictive Analytics
&
Machine Learning
Data Ingestion, Manipulation
&
Integration
Analytic Databases
© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
5
Data Warehouse Optimization
Data Sources Big Data Architecture
Data Warehouse
(Master & Transactional Data)
ERP
CRM
CDR
Analytic
Data Mart(s)
Analytic
Data Mart(s)
Analytic
Data Mart(s)
Logs
Logs
Other Data
Raw Data
Parsed Data
Analytic Datasets
Master Data
Tape
Archive
Steps To Start with Hadoop
7© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Hadoop
Installation
• Install locally – as Pseudo-Distributed
mode
• Leverage tools like Dell Crowbar
• Cloud sandbox
• Easy download & installation
• Start on desktop
Pentaho
Installation
1
2
• Extract or access data from source
systems
• Load it (in its raw form) into Hadoop
• Tokenize & parse as required
• Transform & enrich
• Load into destination
Start
Loading3
Data Architecture and Integration Challenges
Data Sources Big Data Architecture
Data Warehouse
(Master & Transactional Data)
ERP
CRM
CDR
Analytic
Data Mart(s)
Analytic
Data Mart(s)
Analytic
Data Mart(s)
Logs
Logs
Other Data
Raw Data
Parsed Data
Analytic Datasets
Master Data
NOSQL
Data Architecture and Integration Challenges
Data Sources Big Data Architecture
Data Warehouse
(Master & Transactional Data)
ERP
CRM
CDR
Analytic
Data Mart(s)
Analytic
Data Mart(s)
Analytic
Data Mart(s)
Logs
Logs
Other Data
Raw Data
Parsed Data
Analytic Datasets
Master Data
NOSQL
Extract
Transform
Load
Orchestration & Integration
MR
Data Architecture and Integration Challenges
Data Sources Big Data Architecture
Data Warehouse
(Master & Transactional Data)
ERP
CRM
CDR
Analytic
Data Mart(s)
Analytic
Data Mart(s)
Analytic
Data Mart(s)
Logs
Logs
Other Data
Raw Data
Parsed Data
Analytic Datasets
Master Data
NOSQL
Extract
Transform
Load
Orchestration & Integration
MR
Solution Architecture & Demo
11© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Fast and easy way to deploy
Hadoop clusters with Dell1
Global Marketing
Fast and easy way to
deploy Hadoop
clusters with Dell
Global Marketing
Well we are ready, but
how will the Hardware
Team know how to size
and design the Hadoop
cluster……..?
I don’t know….and it
may take a long time to
build the Hadoop cluster
Time is a critical
factor, we need to get
this project moving
Global Marketing
Reduce time to Cluster Sizing, Design &
Deployment
Faster time to productive operations
Optimize and adapt for your needs
Deliver the best return on investment
Reduce risk &
increase
flexibility
with
Dell
Dell.com/Crowbar
Global Marketing
Dell | Hadoop
Solution
“Dell … was one of the first of
the hardware vendors to grasp
the fact that cloud is about
provisioning services, not about
the hardware.”
Maxwell Cooter, Cloud Pro
Excels at supporting complex big data
analyses across large collections of
structured and unstructured data
• Hadoop handles a variety of workloads,
including search, log processing, data
warehousing, recommendation systems and
video/image analysis
• Work on the most modern scale-out
architectures using a clean-sheet design data
framework
• Without vendor lock-in
Apache Hadoop software
Crowbar software framework with a
Hadoop barclamp
PowerEdge C8000 Series, C6220, R720, R720XD
Force10 or PowerConnect switches
Reference Architecture
Deployment Guide
Joint Service and Support
Proven solutions
Proven components
Partner Ecosystem
Global Marketing
Crowbar
• Accelerates
multi-node
deployments
• Simplifies
maintenance
• Streamlines
ongoing
updates
Built with DevOps
• Provides an operational model for managing big
data clusters and cloud
Field-proven technologies
• Build on locally deployed Chef Server
• Raw servers to full cluster in <2 hours
• Hardened with more than a year of deployments
Apache 2 open source
• Multi-apps (Hadoop & OpenStack)
• Multi-OS (Ubuntu, RHEL, CentOS, SUSE)
NOT limited to Dell hardware
Crowbar Software Framework
A modular, open source framework
Global Marketing
Deploy a Hadoop cluster in ~2 hours
Reduce software
licensing fees
100%
Use Crowbar to:
• Automate the deployment and
configuration of a Hadoop cluster
• Quickly provision bare-metal
servers from box to cluster with
minimal intervention
• Maintain, upgrade and evolve your
Hadoop cluster over time
• Leverage an open source
framework backed by a growing
global developer ecosystem
Reduce
development time
4-6 mo.
Crowbar
software
framewo
rk
Evolve to meet your needs over time with built in DevOps
Global Marketing
Crowbar dashboard provides visibility
Global Marketing
Leverage developer expertise worldwide
Download the open source software:
https://github.com/dellcloudedge/crowb
ar
Participate in an active community
http://lists.us.dell.com/mailman/listinfo
/crowbar
Get resources on the Wiki:
https://github.com/dellcloudedge/crowb
ar/wiki
Visit Dell.com/Crowbar,
Crowbar@Dell.com
Global Marketing
Dell.com/Crowbar
pentaho.com/download
• Install on a local desktop – no need for a
cluster
• “Managed Code” no additional installations
• Pentaho will write to the Hadoop Distributed
Cache for execution
Pentaho Installation
21© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Scheduling Integration Manipulation Orchestration
2
Start Loading
Loading into HDFS & HIVE
– Hadoop Copy Files
– Specify source files / destination
22© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
3
Loading into HBASE
– Zookeeper host & port
– Specify HBASE Mapping
Solution Architecture & Demo
23© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Demo
Maximize Performance
24© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
As much as
15x faster than
hand-written
code.
Parallel
execution as
MapReduce
in the Hadoop
cluster.
Additional Best Practices
25© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Leverage
Hadoop
• Don’t do database lookups inside a
Mapper/Reducer – bring the data set into HDFS
• Don’t transfer data between two clustering
technologies – network overload
• Start with a small data set and validate logic &
performance outside the cluster
• Gradually increase volumes and fine tune the
application, cluster, data stores & network
Don’t Boil
the Ocean
• Leverage the various technologies available
• A combination of easy to use tools, powerful
scripting and custom coding provides the best mix
It’s
AND…AND
Solution Architecture & Demo
26© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Q & A
27© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Contact Us or Sign-up at:
pentaho.com

More Related Content

What's hot

Embedded Analytics in CRM and Marketing
Embedded Analytics in CRM and Marketing Embedded Analytics in CRM and Marketing
Embedded Analytics in CRM and Marketing
Pentaho
 
30 for 30: Quick Start Your Pentaho Evaluation
30 for 30: Quick Start Your Pentaho Evaluation30 for 30: Quick Start Your Pentaho Evaluation
30 for 30: Quick Start Your Pentaho Evaluation
Pentaho
 
Pentaho Business Analytics for ISVs and SaaS providers in healthcare
Pentaho Business Analytics for ISVs and SaaS providers in healthcarePentaho Business Analytics for ISVs and SaaS providers in healthcare
Pentaho Business Analytics for ISVs and SaaS providers in healthcarePentaho
 
Big Data Predictions for 2015
Big Data Predictions for 2015 Big Data Predictions for 2015
Big Data Predictions for 2015
Pentaho
 
Pentaho Analytics for MongoDB - presentation from MongoDB World 2014
Pentaho Analytics for MongoDB - presentation from MongoDB World 2014Pentaho Analytics for MongoDB - presentation from MongoDB World 2014
Pentaho Analytics for MongoDB - presentation from MongoDB World 2014
Pentaho
 
Bay Area Hadoop User Group
Bay Area Hadoop User GroupBay Area Hadoop User Group
Bay Area Hadoop User Group
Pentaho
 
Up Your Analytics Game with Pentaho and Vertica
Up Your Analytics Game with Pentaho and Vertica Up Your Analytics Game with Pentaho and Vertica
Up Your Analytics Game with Pentaho and Vertica
Pentaho
 
Pentaho big data camp - 5 min
Pentaho   big data camp - 5 minPentaho   big data camp - 5 min
Pentaho big data camp - 5 min
ianfyfe
 
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB
 
Pentaho - Jake Cornelius - Hadoop World 2010
Pentaho - Jake Cornelius - Hadoop World 2010Pentaho - Jake Cornelius - Hadoop World 2010
Pentaho - Jake Cornelius - Hadoop World 2010
Cloudera, Inc.
 
Slides pentaho-hadoop-weka
Slides pentaho-hadoop-wekaSlides pentaho-hadoop-weka
Slides pentaho-hadoop-wekalucboudreau
 
Data Integration and Advanced Analytics for MongoDB: Blend, Enrich and Analyz...
Data Integration and Advanced Analytics for MongoDB: Blend, Enrich and Analyz...Data Integration and Advanced Analytics for MongoDB: Blend, Enrich and Analyz...
Data Integration and Advanced Analytics for MongoDB: Blend, Enrich and Analyz...MongoDB
 
MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...
MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...
MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...MongoDB
 
Webinar | Using Hadoop Analytics to Gain a Big Data Advantage
Webinar | Using Hadoop Analytics to Gain a Big Data AdvantageWebinar | Using Hadoop Analytics to Gain a Big Data Advantage
Webinar | Using Hadoop Analytics to Gain a Big Data Advantage
Cloudera, Inc.
 
Competitive edgewithmongod bandpentaho_2014sep_v3[1]
Competitive edgewithmongod bandpentaho_2014sep_v3[1]Competitive edgewithmongod bandpentaho_2014sep_v3[1]
Competitive edgewithmongod bandpentaho_2014sep_v3[1]
Pentaho
 
Big Data Solutions Executive Overview
Big Data Solutions Executive OverviewBig Data Solutions Executive Overview
Big Data Solutions Executive Overview
RCG Global Services
 
BI congres 2016-2: Diving into weblog data with SAS on Hadoop - Lisa Truyers...
BI congres 2016-2: Diving into weblog data with SAS on Hadoop -  Lisa Truyers...BI congres 2016-2: Diving into weblog data with SAS on Hadoop -  Lisa Truyers...
BI congres 2016-2: Diving into weblog data with SAS on Hadoop - Lisa Truyers...
BICC Thomas More
 
Hadoop uk user group meeting final
Hadoop uk user group meeting finalHadoop uk user group meeting final
Hadoop uk user group meeting finalSkills Matter
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
Hortonworks
 
Open Analytics 2014 - Pedro Alves - Innovation though Open Source
Open Analytics 2014 - Pedro Alves - Innovation though Open SourceOpen Analytics 2014 - Pedro Alves - Innovation though Open Source
Open Analytics 2014 - Pedro Alves - Innovation though Open Source
OpenAnalytics Spain
 

What's hot (20)

Embedded Analytics in CRM and Marketing
Embedded Analytics in CRM and Marketing Embedded Analytics in CRM and Marketing
Embedded Analytics in CRM and Marketing
 
30 for 30: Quick Start Your Pentaho Evaluation
30 for 30: Quick Start Your Pentaho Evaluation30 for 30: Quick Start Your Pentaho Evaluation
30 for 30: Quick Start Your Pentaho Evaluation
 
Pentaho Business Analytics for ISVs and SaaS providers in healthcare
Pentaho Business Analytics for ISVs and SaaS providers in healthcarePentaho Business Analytics for ISVs and SaaS providers in healthcare
Pentaho Business Analytics for ISVs and SaaS providers in healthcare
 
Big Data Predictions for 2015
Big Data Predictions for 2015 Big Data Predictions for 2015
Big Data Predictions for 2015
 
Pentaho Analytics for MongoDB - presentation from MongoDB World 2014
Pentaho Analytics for MongoDB - presentation from MongoDB World 2014Pentaho Analytics for MongoDB - presentation from MongoDB World 2014
Pentaho Analytics for MongoDB - presentation from MongoDB World 2014
 
Bay Area Hadoop User Group
Bay Area Hadoop User GroupBay Area Hadoop User Group
Bay Area Hadoop User Group
 
Up Your Analytics Game with Pentaho and Vertica
Up Your Analytics Game with Pentaho and Vertica Up Your Analytics Game with Pentaho and Vertica
Up Your Analytics Game with Pentaho and Vertica
 
Pentaho big data camp - 5 min
Pentaho   big data camp - 5 minPentaho   big data camp - 5 min
Pentaho big data camp - 5 min
 
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
 
Pentaho - Jake Cornelius - Hadoop World 2010
Pentaho - Jake Cornelius - Hadoop World 2010Pentaho - Jake Cornelius - Hadoop World 2010
Pentaho - Jake Cornelius - Hadoop World 2010
 
Slides pentaho-hadoop-weka
Slides pentaho-hadoop-wekaSlides pentaho-hadoop-weka
Slides pentaho-hadoop-weka
 
Data Integration and Advanced Analytics for MongoDB: Blend, Enrich and Analyz...
Data Integration and Advanced Analytics for MongoDB: Blend, Enrich and Analyz...Data Integration and Advanced Analytics for MongoDB: Blend, Enrich and Analyz...
Data Integration and Advanced Analytics for MongoDB: Blend, Enrich and Analyz...
 
MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...
MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...
MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...
 
Webinar | Using Hadoop Analytics to Gain a Big Data Advantage
Webinar | Using Hadoop Analytics to Gain a Big Data AdvantageWebinar | Using Hadoop Analytics to Gain a Big Data Advantage
Webinar | Using Hadoop Analytics to Gain a Big Data Advantage
 
Competitive edgewithmongod bandpentaho_2014sep_v3[1]
Competitive edgewithmongod bandpentaho_2014sep_v3[1]Competitive edgewithmongod bandpentaho_2014sep_v3[1]
Competitive edgewithmongod bandpentaho_2014sep_v3[1]
 
Big Data Solutions Executive Overview
Big Data Solutions Executive OverviewBig Data Solutions Executive Overview
Big Data Solutions Executive Overview
 
BI congres 2016-2: Diving into weblog data with SAS on Hadoop - Lisa Truyers...
BI congres 2016-2: Diving into weblog data with SAS on Hadoop -  Lisa Truyers...BI congres 2016-2: Diving into weblog data with SAS on Hadoop -  Lisa Truyers...
BI congres 2016-2: Diving into weblog data with SAS on Hadoop - Lisa Truyers...
 
Hadoop uk user group meeting final
Hadoop uk user group meeting finalHadoop uk user group meeting final
Hadoop uk user group meeting final
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
 
Open Analytics 2014 - Pedro Alves - Innovation though Open Source
Open Analytics 2014 - Pedro Alves - Innovation though Open SourceOpen Analytics 2014 - Pedro Alves - Innovation though Open Source
Open Analytics 2014 - Pedro Alves - Innovation though Open Source
 

Viewers also liked

Cisco cdr reporting it’s easy if you do it smart
Cisco cdr reporting  it’s easy if you do it smartCisco cdr reporting  it’s easy if you do it smart
Cisco cdr reporting it’s easy if you do it smart
Reza Mousavi
 
Gsm cell analysis
Gsm cell analysisGsm cell analysis
Gsm cell analysisMuxi ESL
 
CDR template revised Ver 3.1.1 presentation
CDR template revised Ver 3.1.1 presentationCDR template revised Ver 3.1.1 presentation
CDR template revised Ver 3.1.1 presentationArya dash
 
DaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solutionDaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solution
Schubert Zhang
 
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
Cloudera, Inc.
 
Open Bank Project September 2014 at Open Data CH
Open Bank Project September 2014  at Open Data CHOpen Bank Project September 2014  at Open Data CH
Open Bank Project September 2014 at Open Data CH
TESOBE
 
Oracle GoldenGate 12c CDR Presentation for ECO
Oracle GoldenGate 12c CDR Presentation for ECOOracle GoldenGate 12c CDR Presentation for ECO
Oracle GoldenGate 12c CDR Presentation for ECO
Bobby Curtis
 
Big Data Expo 2015 - Pentaho The Future of Analytics
Big Data Expo 2015 - Pentaho The Future of AnalyticsBig Data Expo 2015 - Pentaho The Future of Analytics
Big Data Expo 2015 - Pentaho The Future of Analytics
BigDataExpo
 
CDR-Stats : VoIP Analytics Solution for Asterisk and FreeSWITCH with MongoDB
CDR-Stats : VoIP Analytics Solution for Asterisk and FreeSWITCH with MongoDBCDR-Stats : VoIP Analytics Solution for Asterisk and FreeSWITCH with MongoDB
CDR-Stats : VoIP Analytics Solution for Asterisk and FreeSWITCH with MongoDB
Areski Belaid
 
Re-identification of Anomized CDR datasets using Social networlk Data
Re-identification of Anomized CDR datasets using Social networlk DataRe-identification of Anomized CDR datasets using Social networlk Data
Re-identification of Anomized CDR datasets using Social networlk Data
Alket Cecaj
 
Big Data CDR Analyzer - Kanthaka
Big Data CDR Analyzer - KanthakaBig Data CDR Analyzer - Kanthaka
Big Data CDR Analyzer - Kanthaka
Pushpalanka Jayawardhana
 
Integrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseIntegrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseDataWorks Summit
 
Asterisk (IP-PBX) CDR Log Rotation
Asterisk (IP-PBX) CDR Log RotationAsterisk (IP-PBX) CDR Log Rotation
Asterisk (IP-PBX) CDR Log Rotation
William Lee
 
Paging and Location Update
Paging and Location UpdatePaging and Location Update
Paging and Location Update
Abidullah Zarghoon
 

Viewers also liked (15)

AltiGen Cdr Manual
AltiGen Cdr ManualAltiGen Cdr Manual
AltiGen Cdr Manual
 
Cisco cdr reporting it’s easy if you do it smart
Cisco cdr reporting  it’s easy if you do it smartCisco cdr reporting  it’s easy if you do it smart
Cisco cdr reporting it’s easy if you do it smart
 
Gsm cell analysis
Gsm cell analysisGsm cell analysis
Gsm cell analysis
 
CDR template revised Ver 3.1.1 presentation
CDR template revised Ver 3.1.1 presentationCDR template revised Ver 3.1.1 presentation
CDR template revised Ver 3.1.1 presentation
 
DaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solutionDaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solution
 
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
 
Open Bank Project September 2014 at Open Data CH
Open Bank Project September 2014  at Open Data CHOpen Bank Project September 2014  at Open Data CH
Open Bank Project September 2014 at Open Data CH
 
Oracle GoldenGate 12c CDR Presentation for ECO
Oracle GoldenGate 12c CDR Presentation for ECOOracle GoldenGate 12c CDR Presentation for ECO
Oracle GoldenGate 12c CDR Presentation for ECO
 
Big Data Expo 2015 - Pentaho The Future of Analytics
Big Data Expo 2015 - Pentaho The Future of AnalyticsBig Data Expo 2015 - Pentaho The Future of Analytics
Big Data Expo 2015 - Pentaho The Future of Analytics
 
CDR-Stats : VoIP Analytics Solution for Asterisk and FreeSWITCH with MongoDB
CDR-Stats : VoIP Analytics Solution for Asterisk and FreeSWITCH with MongoDBCDR-Stats : VoIP Analytics Solution for Asterisk and FreeSWITCH with MongoDB
CDR-Stats : VoIP Analytics Solution for Asterisk and FreeSWITCH with MongoDB
 
Re-identification of Anomized CDR datasets using Social networlk Data
Re-identification of Anomized CDR datasets using Social networlk DataRe-identification of Anomized CDR datasets using Social networlk Data
Re-identification of Anomized CDR datasets using Social networlk Data
 
Big Data CDR Analyzer - Kanthaka
Big Data CDR Analyzer - KanthakaBig Data CDR Analyzer - Kanthaka
Big Data CDR Analyzer - Kanthaka
 
Integrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseIntegrating Hadoop Into the Enterprise
Integrating Hadoop Into the Enterprise
 
Asterisk (IP-PBX) CDR Log Rotation
Asterisk (IP-PBX) CDR Log RotationAsterisk (IP-PBX) CDR Log Rotation
Asterisk (IP-PBX) CDR Log Rotation
 
Paging and Location Update
Paging and Location UpdatePaging and Location Update
Paging and Location Update
 

Similar to Big Data Integration Webinar: Getting Started With Hadoop Big Data

2013 05 Oracle big_dataapplianceoverview
2013 05 Oracle big_dataapplianceoverview2013 05 Oracle big_dataapplianceoverview
2013 05 Oracle big_dataapplianceoverview
jdijcks
 
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Sumeet Singh
 
Oracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analyticsOracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analytics
jdijcks
 
What it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! PerspectivesWhat it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! Perspectives
DataWorks Summit
 
Oracle Openworld Presentation with Paul Kent (SAS) on Big Data Appliance and ...
Oracle Openworld Presentation with Paul Kent (SAS) on Big Data Appliance and ...Oracle Openworld Presentation with Paul Kent (SAS) on Big Data Appliance and ...
Oracle Openworld Presentation with Paul Kent (SAS) on Big Data Appliance and ...
jdijcks
 
What it takes to bring Hadoop to a production-ready state
What it takes to bring Hadoop to a production-ready stateWhat it takes to bring Hadoop to a production-ready state
What it takes to bring Hadoop to a production-ready state
ClouderaUserGroups
 
Big Data Infrastructure
Big Data InfrastructureBig Data Infrastructure
Big Data Infrastructure
Trivadis
 
Web Briefing: Unlock the power of Hadoop to enable interactive analytics
Web Briefing: Unlock the power of Hadoop to enable interactive analyticsWeb Briefing: Unlock the power of Hadoop to enable interactive analytics
Web Briefing: Unlock the power of Hadoop to enable interactive analyticsKognitio
 
Carpe Datum: Building Big Data Analytical Applications with HP Haven
Carpe Datum: Building Big Data Analytical Applications with HP HavenCarpe Datum: Building Big Data Analytical Applications with HP Haven
Carpe Datum: Building Big Data Analytical Applications with HP Haven
DataWorks Summit
 
Journey to SAS Analytics Grid with SAS, R, Python
Journey to SAS Analytics Grid with SAS, R, PythonJourney to SAS Analytics Grid with SAS, R, Python
Journey to SAS Analytics Grid with SAS, R, Python
Sumit Sarkar
 
Hitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop Solution
Hitachi Vantara
 
Cw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-clouderaCw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-clouderainevitablecloud
 
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
TheInevitableCloud
 
Hp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar Slides
Hortonworks
 
Pentaho Big Data Analytics with Vertica and Hadoop
Pentaho Big Data Analytics with Vertica and HadoopPentaho Big Data Analytics with Vertica and Hadoop
Pentaho Big Data Analytics with Vertica and Hadoop
Mark Kromer
 
SQLDay2013_MarcinSzeliga_SQLServer2012FastTrackDWReferenceArchitectures
SQLDay2013_MarcinSzeliga_SQLServer2012FastTrackDWReferenceArchitecturesSQLDay2013_MarcinSzeliga_SQLServer2012FastTrackDWReferenceArchitectures
SQLDay2013_MarcinSzeliga_SQLServer2012FastTrackDWReferenceArchitecturesPolish SQL Server User Group
 
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Cloudera, Inc.
 
clusterstor-hadoop-data-sheet
clusterstor-hadoop-data-sheetclusterstor-hadoop-data-sheet
clusterstor-hadoop-data-sheetAndrei Khurshudov
 

Similar to Big Data Integration Webinar: Getting Started With Hadoop Big Data (20)

2013 05 Oracle big_dataapplianceoverview
2013 05 Oracle big_dataapplianceoverview2013 05 Oracle big_dataapplianceoverview
2013 05 Oracle big_dataapplianceoverview
 
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
 
Oracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analyticsOracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analytics
 
What it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! PerspectivesWhat it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! Perspectives
 
Oracle Openworld Presentation with Paul Kent (SAS) on Big Data Appliance and ...
Oracle Openworld Presentation with Paul Kent (SAS) on Big Data Appliance and ...Oracle Openworld Presentation with Paul Kent (SAS) on Big Data Appliance and ...
Oracle Openworld Presentation with Paul Kent (SAS) on Big Data Appliance and ...
 
What it takes to bring Hadoop to a production-ready state
What it takes to bring Hadoop to a production-ready stateWhat it takes to bring Hadoop to a production-ready state
What it takes to bring Hadoop to a production-ready state
 
Big Data Infrastructure
Big Data InfrastructureBig Data Infrastructure
Big Data Infrastructure
 
Robin_Hadoop
Robin_HadoopRobin_Hadoop
Robin_Hadoop
 
Web Briefing: Unlock the power of Hadoop to enable interactive analytics
Web Briefing: Unlock the power of Hadoop to enable interactive analyticsWeb Briefing: Unlock the power of Hadoop to enable interactive analytics
Web Briefing: Unlock the power of Hadoop to enable interactive analytics
 
Carpe Datum: Building Big Data Analytical Applications with HP Haven
Carpe Datum: Building Big Data Analytical Applications with HP HavenCarpe Datum: Building Big Data Analytical Applications with HP Haven
Carpe Datum: Building Big Data Analytical Applications with HP Haven
 
Hortonworks.bdb
Hortonworks.bdbHortonworks.bdb
Hortonworks.bdb
 
Journey to SAS Analytics Grid with SAS, R, Python
Journey to SAS Analytics Grid with SAS, R, PythonJourney to SAS Analytics Grid with SAS, R, Python
Journey to SAS Analytics Grid with SAS, R, Python
 
Hitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop Solution
 
Cw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-clouderaCw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-cloudera
 
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
 
Hp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar Slides
 
Pentaho Big Data Analytics with Vertica and Hadoop
Pentaho Big Data Analytics with Vertica and HadoopPentaho Big Data Analytics with Vertica and Hadoop
Pentaho Big Data Analytics with Vertica and Hadoop
 
SQLDay2013_MarcinSzeliga_SQLServer2012FastTrackDWReferenceArchitectures
SQLDay2013_MarcinSzeliga_SQLServer2012FastTrackDWReferenceArchitecturesSQLDay2013_MarcinSzeliga_SQLServer2012FastTrackDWReferenceArchitectures
SQLDay2013_MarcinSzeliga_SQLServer2012FastTrackDWReferenceArchitectures
 
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
 
clusterstor-hadoop-data-sheet
clusterstor-hadoop-data-sheetclusterstor-hadoop-data-sheet
clusterstor-hadoop-data-sheet
 

More from Pentaho

Data Mashups for Analytics
Data Mashups for AnalyticsData Mashups for Analytics
Data Mashups for Analytics
Pentaho
 
Filling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview Presentation
Filling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview PresentationFilling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview Presentation
Filling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview Presentation
Pentaho
 
The Next Big Thing in Big Data
The Next Big Thing in Big DataThe Next Big Thing in Big Data
The Next Big Thing in Big Data
Pentaho
 
Data Is Your Next Product Opportunity
Data Is Your Next Product Opportunity Data Is Your Next Product Opportunity
Data Is Your Next Product Opportunity
Pentaho
 
Improving the Business of Healthcare through Better Analytics
Improving the Business of Healthcare through Better Analytics Improving the Business of Healthcare through Better Analytics
Improving the Business of Healthcare through Better Analytics
Pentaho
 
Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...
Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...
Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...
Pentaho
 
Pentaho Healthcare Solutions
Pentaho Healthcare SolutionsPentaho Healthcare Solutions
Pentaho Healthcare SolutionsPentaho
 

More from Pentaho (7)

Data Mashups for Analytics
Data Mashups for AnalyticsData Mashups for Analytics
Data Mashups for Analytics
 
Filling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview Presentation
Filling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview PresentationFilling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview Presentation
Filling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview Presentation
 
The Next Big Thing in Big Data
The Next Big Thing in Big DataThe Next Big Thing in Big Data
The Next Big Thing in Big Data
 
Data Is Your Next Product Opportunity
Data Is Your Next Product Opportunity Data Is Your Next Product Opportunity
Data Is Your Next Product Opportunity
 
Improving the Business of Healthcare through Better Analytics
Improving the Business of Healthcare through Better Analytics Improving the Business of Healthcare through Better Analytics
Improving the Business of Healthcare through Better Analytics
 
Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...
Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...
Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...
 
Pentaho Healthcare Solutions
Pentaho Healthcare SolutionsPentaho Healthcare Solutions
Pentaho Healthcare Solutions
 

Recently uploaded

DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 

Recently uploaded (20)

DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 

Big Data Integration Webinar: Getting Started With Hadoop Big Data

  • 1. Getting Started & Successful with Big Data © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 @Pentaho #BigDataWebSeries
  • 2. Your Hosts Today Paul Brook Cloud EMEA Program Manager Dell 2© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Davy Nys VP EMEA & APAC Pentaho Chuck Yarbrough Technical Solutions Marketing Pentaho
  • 3. Pentaho Webinar Series 3© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Sign-up at: pentaho.com
  • 4. Goals for Today 4© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 To Understand: • How to get a Hadoop cluster up and running • Where Hadoop and other pieces fit into the architecture • How you can easily get data in & out Hadoop • How to leverage Hadoop with Pentaho • Initial Best Practices
  • 5. Complete Analytics and Visual Data Management HadoopNoSQL Databases Data Discovery & Visualization Enterprise & Ad Hoc Reporting Predictive Analytics & Machine Learning Data Ingestion, Manipulation & Integration Analytic Databases © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 5
  • 6. Data Warehouse Optimization Data Sources Big Data Architecture Data Warehouse (Master & Transactional Data) ERP CRM CDR Analytic Data Mart(s) Analytic Data Mart(s) Analytic Data Mart(s) Logs Logs Other Data Raw Data Parsed Data Analytic Datasets Master Data Tape Archive
  • 7. Steps To Start with Hadoop 7© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Hadoop Installation • Install locally – as Pseudo-Distributed mode • Leverage tools like Dell Crowbar • Cloud sandbox • Easy download & installation • Start on desktop Pentaho Installation 1 2 • Extract or access data from source systems • Load it (in its raw form) into Hadoop • Tokenize & parse as required • Transform & enrich • Load into destination Start Loading3
  • 8. Data Architecture and Integration Challenges Data Sources Big Data Architecture Data Warehouse (Master & Transactional Data) ERP CRM CDR Analytic Data Mart(s) Analytic Data Mart(s) Analytic Data Mart(s) Logs Logs Other Data Raw Data Parsed Data Analytic Datasets Master Data NOSQL
  • 9. Data Architecture and Integration Challenges Data Sources Big Data Architecture Data Warehouse (Master & Transactional Data) ERP CRM CDR Analytic Data Mart(s) Analytic Data Mart(s) Analytic Data Mart(s) Logs Logs Other Data Raw Data Parsed Data Analytic Datasets Master Data NOSQL Extract Transform Load Orchestration & Integration MR
  • 10. Data Architecture and Integration Challenges Data Sources Big Data Architecture Data Warehouse (Master & Transactional Data) ERP CRM CDR Analytic Data Mart(s) Analytic Data Mart(s) Analytic Data Mart(s) Logs Logs Other Data Raw Data Parsed Data Analytic Datasets Master Data NOSQL Extract Transform Load Orchestration & Integration MR
  • 11. Solution Architecture & Demo 11© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Fast and easy way to deploy Hadoop clusters with Dell1
  • 12. Global Marketing Fast and easy way to deploy Hadoop clusters with Dell
  • 13. Global Marketing Well we are ready, but how will the Hardware Team know how to size and design the Hadoop cluster……..? I don’t know….and it may take a long time to build the Hadoop cluster Time is a critical factor, we need to get this project moving
  • 14. Global Marketing Reduce time to Cluster Sizing, Design & Deployment Faster time to productive operations Optimize and adapt for your needs Deliver the best return on investment Reduce risk & increase flexibility with Dell Dell.com/Crowbar
  • 15. Global Marketing Dell | Hadoop Solution “Dell … was one of the first of the hardware vendors to grasp the fact that cloud is about provisioning services, not about the hardware.” Maxwell Cooter, Cloud Pro Excels at supporting complex big data analyses across large collections of structured and unstructured data • Hadoop handles a variety of workloads, including search, log processing, data warehousing, recommendation systems and video/image analysis • Work on the most modern scale-out architectures using a clean-sheet design data framework • Without vendor lock-in Apache Hadoop software Crowbar software framework with a Hadoop barclamp PowerEdge C8000 Series, C6220, R720, R720XD Force10 or PowerConnect switches Reference Architecture Deployment Guide Joint Service and Support Proven solutions Proven components Partner Ecosystem
  • 16. Global Marketing Crowbar • Accelerates multi-node deployments • Simplifies maintenance • Streamlines ongoing updates Built with DevOps • Provides an operational model for managing big data clusters and cloud Field-proven technologies • Build on locally deployed Chef Server • Raw servers to full cluster in <2 hours • Hardened with more than a year of deployments Apache 2 open source • Multi-apps (Hadoop & OpenStack) • Multi-OS (Ubuntu, RHEL, CentOS, SUSE) NOT limited to Dell hardware Crowbar Software Framework A modular, open source framework
  • 17. Global Marketing Deploy a Hadoop cluster in ~2 hours Reduce software licensing fees 100% Use Crowbar to: • Automate the deployment and configuration of a Hadoop cluster • Quickly provision bare-metal servers from box to cluster with minimal intervention • Maintain, upgrade and evolve your Hadoop cluster over time • Leverage an open source framework backed by a growing global developer ecosystem Reduce development time 4-6 mo. Crowbar software framewo rk Evolve to meet your needs over time with built in DevOps
  • 18. Global Marketing Crowbar dashboard provides visibility
  • 19. Global Marketing Leverage developer expertise worldwide Download the open source software: https://github.com/dellcloudedge/crowb ar Participate in an active community http://lists.us.dell.com/mailman/listinfo /crowbar Get resources on the Wiki: https://github.com/dellcloudedge/crowb ar/wiki Visit Dell.com/Crowbar, Crowbar@Dell.com
  • 21. pentaho.com/download • Install on a local desktop – no need for a cluster • “Managed Code” no additional installations • Pentaho will write to the Hadoop Distributed Cache for execution Pentaho Installation 21© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Scheduling Integration Manipulation Orchestration 2
  • 22. Start Loading Loading into HDFS & HIVE – Hadoop Copy Files – Specify source files / destination 22© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 3 Loading into HBASE – Zookeeper host & port – Specify HBASE Mapping
  • 23. Solution Architecture & Demo 23© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Demo
  • 24. Maximize Performance 24© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 As much as 15x faster than hand-written code. Parallel execution as MapReduce in the Hadoop cluster.
  • 25. Additional Best Practices 25© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Leverage Hadoop • Don’t do database lookups inside a Mapper/Reducer – bring the data set into HDFS • Don’t transfer data between two clustering technologies – network overload • Start with a small data set and validate logic & performance outside the cluster • Gradually increase volumes and fine tune the application, cluster, data stores & network Don’t Boil the Ocean • Leverage the various technologies available • A combination of easy to use tools, powerful scripting and custom coding provides the best mix It’s AND…AND
  • 26. Solution Architecture & Demo 26© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Q & A
  • 27. 27© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Contact Us or Sign-up at: pentaho.com

Editor's Notes

  1. Two major
  2. TAKE-AWAYSPentaho provides complete integrated DI+BI for every leading big data platform.
  3. The company decided to invest in Hadoop to ingest the raw CDR data into Hadoop along with other data. This frees DW capacity for high value transactional data; while also, lowering cost and meeting compliance requirements. And since the data is rescued from tape, it becomes available for reporting and analysis.
  4. The company decided to invest in Hadoop to ingest the raw CDR data into Hadoop along with other data. This frees DW capacity for high value transactional data; while also, lowering cost and meeting compliance requirements. And since the data is rescued from tape, it becomes available for reporting and analysis.
  5. The company decided to invest in Hadoop to ingest the raw CDR data into Hadoop along with other data. This frees DW capacity for high value transactional data; while also, lowering cost and meeting compliance requirements. And since the data is rescued from tape, it becomes available for reporting and analysis.
  6. The company decided to invest in Hadoop to ingest the raw CDR data into Hadoop along with other data. This frees DW capacity for high value transactional data; while also, lowering cost and meeting compliance requirements. And since the data is rescued from tape, it becomes available for reporting and analysis.
  7. Delivered as a hardware, software, and services Reference Architecture (RA) which can scale from 6-nodes up to 720-nodesCurrently utilizes PowerEdge C 2100/C6100/C6105 R720, R720XD servers and PowerConnect 6248 or Force 10 switchesDell CrowbarAutomated solution deployment and configuration (Bare metal, OS, Solution Stack, and Monitoring)CDH3 EnterpriseCloudera Hadoop DistributionCloudera Management ToolsCloudera SupportPartner EcosystemSoftware and services capabilities to address broader customer needs around HadoopEnabling non-technical business users to leverage HadoopSimplify getting data into HadoopIntuitive analytics reporting and dashboardsSolution Provided viaReference ArchitectureDeployment GuideDell Digital LockerDell Deployment Services
  8. First OpenStack cloud solution provider in marketPioneer OpenStack partner (Only Day 1 hardware provider)Most history with the OpenStack technology = expertize + RA’s that have been tested longer and fuller than newcomersDell offers a deep partnership ecosystemSingle point of support and purchase to reduce the problem of dealing with multiple vendorsONLY company providing automated software to do multi-node OpenStack provisioning: CrowbarDell developed software that we opensourced in the community.OpenStack expertsize