SlideShare a Scribd company logo
© Hortonworks Inc. 2013
Modern Data Architecture
… and the Data Lake
John Haddad
Senior Director Product Marketing - Informatica
Jim Walker
Director Product Marketing - Hortonworks
Page 1
© Hortonworks Inc. 2013
Your Presenters
• John Haddad
– Senior Director Product Marketing, Informatica
– Over 25 years experience developing and
marketing enterprise applications
– Enjoys art, science, and the great outdoors
• Jim Walker
– Director Product Marketing, Hortonworks
– Over 20 years in data management as a
developer and a marketer
– Amateur Photographer
Page 2
© Hortonworks Inc. 2013
Today’s Topics
• Introduction
• Drivers for the Modern Data Architecture (MDA)
• Apache Hadoop in the MDA
• Informatica’s role in the MDA
• Q&A
Page 3
© Hortonworks Inc. 2013
Enterprise Data Architecture
Page 4
APPLICATIONS	
  DATA	
  SYSTEMS	
  
REPOSITORIES	
  
RDBMS	
   EDW	
   MPP	
  
DATA	
  SOURCES	
  
OLTP,	
  POS	
  
SYSTEMS	
  
Tradi8onal	
  Sources	
  	
  
(RDBMS,	
  OLTP,	
  OLAP)	
  
Business	
  
Analy8cs	
  
Custom	
  
Applica8ons	
  
Packaged	
  
Applica8ons	
  
© Hortonworks Inc. 2013
Traditional Approach – Under Pressure
Page 5
APPLICATIONS	
  DATA	
  SYSTEMS	
  
REPOSITORIES	
  
RDBMS	
   EDW	
   MPP	
  
DATA	
  SOURCES	
  
OLTP,	
  POS	
  
SYSTEMS	
  
Tradi8onal	
  Sources	
  	
  
(RDBMS,	
  OLTP,	
  OLAP)	
  
Business	
  
Analy8cs	
  
Custom	
  
Applica8ons	
  
Packaged	
  
Applica8ons	
  
New	
  Sources	
  	
  
(sen8ment,	
  clickstream,	
  geo,	
  sensor,	
  …)	
  
Source: IDC
2.8	
  ZB	
  in	
  2012	
  
85%	
  from	
  New	
  Data	
  Types	
  
15x	
  Machine	
  Data	
  by	
  2020	
  
40	
  ZB	
  by	
  2020	
  
© Hortonworks Inc. 2013
Modern Data Architecture Enabled
Page 6
APPLICATIONS	
  DATA	
  SYSTEMS	
  
REPOSITORIES	
  
RDBMS	
   EDW	
   MPP	
  
DATA	
  SOURCES	
  
OLTP,	
  POS	
  
SYSTEMS	
  
Tradi8onal	
  Sources	
  	
  
(RDBMS,	
  OLTP,	
  OLAP)	
  
Business	
  
Analy8cs	
  
Custom	
  
Applica8ons	
  
Packaged	
  
Applica8ons	
  
New	
  Sources	
  	
  
(sen8ment,	
  clickstream,	
  geo,	
  sensor,	
  …)	
  
OPERATIONAL	
  
TOOLS	
  
MANAGE	
  &	
  
MONITOR	
  
DEV	
  &	
  DATA	
  
TOOLS	
  
BUILD	
  &	
  
TEST	
  
© Hortonworks Inc. 2013
Hadoop Powers Modern Data Architecture
Page 7
Apache Hadoop is an open source project
governed by the Apache Software Foundation
(ASF) that allows you to gain insight from massive
amounts of structured and unstructured data
quickly and without significant investment.
Hadoop Cluster
compute
&
storage
. . .
. . .
. .
compute
&
storage
.
.
Hadoop clusters provide
scale-out storage and
distributed data processing
on commodity hardware
© Hortonworks Inc. 2013
Driving Efficiency Driving Opportunity
Drivers for Hadoop Adoption
Modern Data Architecture
Hadoop has a central role in next
generation data architectures while
integrating with existing data systems
Business Applications
Use Hadoop to extract insights that
enable new customer value and
competitive edge
Existing
Traditional
Server log
Clickstream
Big Data Sets
Emerging
Sentiment/Social
Machine/Sensor
Geo-locations
© Hortonworks Inc. 2013
Opportunity in types of data
1.  Sentiment
Understand how your customers feel about your brand and
products – right now
2.  Clickstream
Capture and analyze website visitors’ data trails and
optimize your website
3.  Sensor/Machine
Discover patterns in data streaming automatically from
remote sensors and machines
4.  Geographic
Analyze location-based data to manage operations where
they occur
5.  Server Logs
Research logs to diagnose process failures and prevent
security breaches
6.  Unstructured (txt, video, pictures, etc..)
Understand patterns in files across millions of web pages,
emails, and documents
Value
Page 9
© Hortonworks Inc. 2013
Efficiency in Modern Data Architecture
•  Drive efficiency via
modern data
architecture
•  Store data once and
access it in many
ways
•  Often referred to a
data lake or data
repository
•  Infrastructure
platform driven
•  IT-oriented, TCO
based
Page 10
APPLICATIONS	
  DATA	
  SYSTEMS	
  
REPOSITORIES	
  
RDBMS	
   EDW	
   MPP	
  
DATA	
  SOURCES	
  
OLTP,	
  POS	
  
SYSTEMS	
  
Tradi8onal	
  Sources	
  	
  
(RDBMS,	
  OLTP,	
  OLAP)	
  
Business	
  
Analy8cs	
  
Custom	
  
Applica8ons	
  
Packaged	
  
Applica8ons	
  
New	
  Sources	
  	
  
(sen8ment,	
  clickstream,	
  geo,	
  sensor,	
  …)	
  
© Hortonworks Inc. 2013
Page 11
APPLICATIONS	
  DATA	
  SYSTEMS	
  
TRADITIONAL	
  REPOS	
  
DEV	
  &	
  DATA	
  
TOOLS	
  
OPERATIONAL	
  
TOOLS	
  
Viewpoint
Microsoft Applications
DATA	
  SOURCES	
  
DATA	
  INTEGRATION	
  
Engineered for Interoperability
Tradi8onal	
  Sources	
  	
  
(RDBMS,	
  OLTP,	
  OLAP)	
  
New	
  Sources	
  	
  
(sen8ment,	
  clickstream,	
  geo,	
  sensor,	
  …)	
  
© Hortonworks Inc. 2013
Integrated
Interoperable with
existing data center
investments Skills
Leverage your existing
skills: development,
operations, analytics
Requirements for Hadoop Adoption
Page 12
Key Services
Platform, operational and
data services essential for
the enterprise
3Requirements for Hadoop’s Role
in the Modern Data Architecture
© Hortonworks Inc. 2013
Today’s Topics
• Introduction
• Drivers for the Modern Data Architecture (MDA)
• Apache Hadoop’s role in the MDA
• Informatica’s role in the MDA
• Q&A
Page 13
© Hortonworks Inc. 2013
Hortonworks & Informatica
Visual Development Environment
Enterprise
Repositories
EDW
LOAD
Data
Virtualization
Batch
CEP
MDM
INTERFACE
HIVE
JDBC
HDFS API
AMBARI
MAPREDUCE
YARN
HDFS
DATA REFINEMENT
HIVE (HiveQL and UDFs)
ProfileProfile
Parse
ETL
Cleanse
Match
HDFSAPI
LOAD
Reference
Architecture
SOURCE
DATA
Batch
Replicate
Stream
Archive
JMS Queue’s
Servers &
Mainframe
Files
Databases
Sensor data
Social
Data Lake Processes
Mobile Apps
Transactions,
OLTP, OLAP
Social Media, Web Logs
Machine Device,
Scientific
Documents and Emails
9. Govern & enrich
with metadata
3. Stream real-time
data
8. Explore & validate
data
4. Mask
sensitive data
2. Replicate changed
data & schemas
Visualization
& Analytics
11. Subscribe to
datasets
Data
Integration
Hub
1. Load or archive
batch data
Data
Virtualization
5. Access customer
“golden record
MDM
Enterprise
Data Warehouse
10. Correlate real-time
events with historical
patterns & trends
6. Refine &
curate data
7. Move results
to EDW
Telco Call Detail Record (CDR)
Use-Case
Use-Case: CDR Processing
•  Each job picks up a number of files containing Text CDRs
(Call Detail Records)
•  First task is to merge partial call records
•  Some records may be partial – ex. multiple records for a single
call due to a dropped line or switching cell towers
•  Partial records need to be merged and total call time needs to be
added to duration for the merged record
•  Partial records for a single call may reside in multiple files or be
included in different jobs.
•  Incomplete partial records need to be reprocessed by
consecutive jobs
•  Second task is to sort all processed CDRs by calling number
Input CDR File Example
These 3 numbers
uniquely identify a call Partial calls starts with
1 and end with 0
Some partial
records are
incomplete
Processed
completed records
are sorted by caller
Output CDR Files
Completed Calls
Partial Calls
Duration times are
added to the
merged records
Partial records are
merged into a single
completed record
Partial records will
be reprocessed
Logical Design
Partial records only
Separate partial records
from completed records
Completed
records only
Separate
incomplete and
complete partial
records
Select incomplete
partial records
Aggregate all
completed and
partial-completed
records
Viewing Data at Design Time
More Data at Design Time
Constructing Logical Expressions
More Logic
Check Outcomes
Choose Where to Process
Hadoop Execution Plan
Monitor Processing
Results in HDFS
CDR Pipeline
Sort records
by Key Summarize
by Key Group
Filter by
Province ID
Filter by
Collection
Date
City Code
Lookup
Read Files
Write report
•  Scenario – Filter records by Date, City and Province;
Aggregate and summarize records by a composite Key
Design Environment
Adding Transactional Source
•  Scenario - Report website use (Facebook, Twitter, etc.)
by Age and by Postal Code
Read WAP
log records
Get MSISDN
and URL fields
Lookup Age,
Postal Code by
MSISDN
Count URL
frequency Calculate
percentages
Connecting to Relational Source
Result
•  Easily combine big data sources with transactional data
•  Example – Report website use (Facebook, Twitter, etc.) by Age and by
Region
Look-up of
Age, Region
by MSISDN
CRM
EDW
Log
Files,
HDFS
© Hortonworks Inc. 2013
Integrated
Interoperable with
existing data center
investments Skills
Leverage your existing
skills: development,
operations, analytics
Requirements for Hadoop Adoption
Page 35
Key Services
Platform, operational and
data services essential for
the enterprise
3Requirements for Hadoop’s Role
in the Modern Data Architecture
© Hortonworks Inc. 2013
Next Steps:
Page 36
Learn more about Informatica and Hadoop
http://www.informatica.com/us/vision/harnessing-big-
data/hadoop/
Get started on Hadoop with Hortonworks
Sandbox
http://hortonworks.com/products/hortonworks-
sandbox/
Follow us:
@hortonworks, @informatica

More Related Content

What's hot

Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Khalid Salama
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data Mesh
LibbySchulze
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
Databricks
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
Databricks
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
Databricks
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Denodo
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architecture
Sudheer Kondla
 
Data Architecture for Solutions.pdf
Data Architecture for Solutions.pdfData Architecture for Solutions.pdf
Data Architecture for Solutions.pdf
Alan McSweeney
 
Databricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With DataDatabricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With Data
Databricks
 
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Tristan Baker
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
Denodo
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
James Serra
 
Business Intelligence (BI) and Data Management Basics
Business Intelligence (BI) and Data Management  Basics Business Intelligence (BI) and Data Management  Basics
Business Intelligence (BI) and Data Management Basics
amorshed
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
James Serra
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
James Serra
 
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Dr. Arif Wider
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
Dalibor Wijas
 
Data Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationData Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital Transformation
DATAVERSITY
 

What's hot (20)

Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data Mesh
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data Lakes
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architecture
 
Data Architecture for Solutions.pdf
Data Architecture for Solutions.pdfData Architecture for Solutions.pdf
Data Architecture for Solutions.pdf
 
Databricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With DataDatabricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With Data
 
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Business Intelligence (BI) and Data Management Basics
Business Intelligence (BI) and Data Management  Basics Business Intelligence (BI) and Data Management  Basics
Business Intelligence (BI) and Data Management Basics
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
 
Data Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationData Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital Transformation
 

Viewers also liked

Architecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the EnterpriseArchitecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the Enterprise
Amazon Web Services
 
The Future of Data
The Future of DataThe Future of Data
The Future of Data
blynnbuckley
 
Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI
Holden Ackerman
 
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep dutta
CWIN17 India / Insights platform architecture v1 0   virtual - subhadeep duttaCWIN17 India / Insights platform architecture v1 0   virtual - subhadeep dutta
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep dutta
Capgemini
 
(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS
Amazon Web Services
 
Ai big dataconference_eugene_polonichko_azure data lake
Ai big dataconference_eugene_polonichko_azure data lake Ai big dataconference_eugene_polonichko_azure data lake
Ai big dataconference_eugene_polonichko_azure data lake
Olga Zinkevych
 
50 Shades of Data - how, when and why Big,Relational,NoSQL,Elastic,Event,CQRS...
50 Shades of Data - how, when and why Big,Relational,NoSQL,Elastic,Event,CQRS...50 Shades of Data - how, when and why Big,Relational,NoSQL,Elastic,Event,CQRS...
50 Shades of Data - how, when and why Big,Relational,NoSQL,Elastic,Event,CQRS...
Lucas Jellema
 
A beginners guide to Cloudera Hadoop
A beginners guide to Cloudera HadoopA beginners guide to Cloudera Hadoop
A beginners guide to Cloudera Hadoop
David Yahalom
 

Viewers also liked (8)

Architecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the EnterpriseArchitecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the Enterprise
 
The Future of Data
The Future of DataThe Future of Data
The Future of Data
 
Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI
 
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep dutta
CWIN17 India / Insights platform architecture v1 0   virtual - subhadeep duttaCWIN17 India / Insights platform architecture v1 0   virtual - subhadeep dutta
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep dutta
 
(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS
 
Ai big dataconference_eugene_polonichko_azure data lake
Ai big dataconference_eugene_polonichko_azure data lake Ai big dataconference_eugene_polonichko_azure data lake
Ai big dataconference_eugene_polonichko_azure data lake
 
50 Shades of Data - how, when and why Big,Relational,NoSQL,Elastic,Event,CQRS...
50 Shades of Data - how, when and why Big,Relational,NoSQL,Elastic,Event,CQRS...50 Shades of Data - how, when and why Big,Relational,NoSQL,Elastic,Event,CQRS...
50 Shades of Data - how, when and why Big,Relational,NoSQL,Elastic,Event,CQRS...
 
A beginners guide to Cloudera Hadoop
A beginners guide to Cloudera HadoopA beginners guide to Cloudera Hadoop
A beginners guide to Cloudera Hadoop
 

Similar to Modern Data Architecture for a Data Lake with Informatica and Hortonworks Data Platform

Modern Data Architecture: In-Memory with Hadoop - the new BI
Modern Data Architecture: In-Memory with Hadoop - the new BIModern Data Architecture: In-Memory with Hadoop - the new BI
Modern Data Architecture: In-Memory with Hadoop - the new BI
Kognitio
 
Hortonworks kognitio webinar 10 dec 2013
Hortonworks kognitio webinar 10 dec 2013Hortonworks kognitio webinar 10 dec 2013
Hortonworks kognitio webinar 10 dec 2013
Michael Hiskey
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
Hortonworks
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks
Hortonworks
 
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
Hortonworks
 
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
Revolution Analytics
 
Apache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudApache Hadoop on the Open Cloud
Apache Hadoop on the Open Cloud
Hortonworks
 
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu BariApache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
jaxconf
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
Hortonworks
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
Slim Baltagi
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Hortonworks
 
OOP 2014
OOP 2014OOP 2014
Hortonworks Big Data & Hadoop
Hortonworks Big Data & HadoopHortonworks Big Data & Hadoop
Hortonworks Big Data & Hadoop
Mark Ginnebaugh
 
Web Briefing: Unlock the power of Hadoop to enable interactive analytics
Web Briefing: Unlock the power of Hadoop to enable interactive analyticsWeb Briefing: Unlock the power of Hadoop to enable interactive analytics
Web Briefing: Unlock the power of Hadoop to enable interactive analytics
Kognitio
 
Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014
Hortonworks
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Innovative Management Services
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
Hortonworks
 
Hortonworks Hadoop @ Oslo Hadoop User Group
Hortonworks Hadoop @ Oslo Hadoop User GroupHortonworks Hadoop @ Oslo Hadoop User Group
Hortonworks Hadoop @ Oslo Hadoop User Group
Mats Johansson
 
Meetup oslo hortonworks HDP
Meetup oslo hortonworks HDPMeetup oslo hortonworks HDP
Meetup oslo hortonworks HDP
Alexander Bakos Leirvåg
 
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with HadoopBig Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
Precisely
 

Similar to Modern Data Architecture for a Data Lake with Informatica and Hortonworks Data Platform (20)

Modern Data Architecture: In-Memory with Hadoop - the new BI
Modern Data Architecture: In-Memory with Hadoop - the new BIModern Data Architecture: In-Memory with Hadoop - the new BI
Modern Data Architecture: In-Memory with Hadoop - the new BI
 
Hortonworks kognitio webinar 10 dec 2013
Hortonworks kognitio webinar 10 dec 2013Hortonworks kognitio webinar 10 dec 2013
Hortonworks kognitio webinar 10 dec 2013
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks
 
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
 
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
 
Apache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudApache Hadoop on the Open Cloud
Apache Hadoop on the Open Cloud
 
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu BariApache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
 
OOP 2014
OOP 2014OOP 2014
OOP 2014
 
Hortonworks Big Data & Hadoop
Hortonworks Big Data & HadoopHortonworks Big Data & Hadoop
Hortonworks Big Data & Hadoop
 
Web Briefing: Unlock the power of Hadoop to enable interactive analytics
Web Briefing: Unlock the power of Hadoop to enable interactive analyticsWeb Briefing: Unlock the power of Hadoop to enable interactive analytics
Web Briefing: Unlock the power of Hadoop to enable interactive analytics
 
Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
 
Hortonworks Hadoop @ Oslo Hadoop User Group
Hortonworks Hadoop @ Oslo Hadoop User GroupHortonworks Hadoop @ Oslo Hadoop User Group
Hortonworks Hadoop @ Oslo Hadoop User Group
 
Meetup oslo hortonworks HDP
Meetup oslo hortonworks HDPMeetup oslo hortonworks HDP
Meetup oslo hortonworks HDP
 
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with HadoopBig Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
 

More from Hortonworks

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
Hortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Hortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
Hortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Hortonworks
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
Hortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Hortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Hortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
Hortonworks
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
Hortonworks
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
Hortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
Hortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Hortonworks
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Hortonworks
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
Hortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Hortonworks
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
Hortonworks
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Hortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Hortonworks
 

More from Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Recently uploaded

How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Things to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUUThings to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUU
FODUU
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfAI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
Techgropse Pvt.Ltd.
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 

Recently uploaded (20)

How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Things to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUUThings to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUU
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfAI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 

Modern Data Architecture for a Data Lake with Informatica and Hortonworks Data Platform

  • 1. © Hortonworks Inc. 2013 Modern Data Architecture … and the Data Lake John Haddad Senior Director Product Marketing - Informatica Jim Walker Director Product Marketing - Hortonworks Page 1
  • 2. © Hortonworks Inc. 2013 Your Presenters • John Haddad – Senior Director Product Marketing, Informatica – Over 25 years experience developing and marketing enterprise applications – Enjoys art, science, and the great outdoors • Jim Walker – Director Product Marketing, Hortonworks – Over 20 years in data management as a developer and a marketer – Amateur Photographer Page 2
  • 3. © Hortonworks Inc. 2013 Today’s Topics • Introduction • Drivers for the Modern Data Architecture (MDA) • Apache Hadoop in the MDA • Informatica’s role in the MDA • Q&A Page 3
  • 4. © Hortonworks Inc. 2013 Enterprise Data Architecture Page 4 APPLICATIONS  DATA  SYSTEMS   REPOSITORIES   RDBMS   EDW   MPP   DATA  SOURCES   OLTP,  POS   SYSTEMS   Tradi8onal  Sources     (RDBMS,  OLTP,  OLAP)   Business   Analy8cs   Custom   Applica8ons   Packaged   Applica8ons  
  • 5. © Hortonworks Inc. 2013 Traditional Approach – Under Pressure Page 5 APPLICATIONS  DATA  SYSTEMS   REPOSITORIES   RDBMS   EDW   MPP   DATA  SOURCES   OLTP,  POS   SYSTEMS   Tradi8onal  Sources     (RDBMS,  OLTP,  OLAP)   Business   Analy8cs   Custom   Applica8ons   Packaged   Applica8ons   New  Sources     (sen8ment,  clickstream,  geo,  sensor,  …)   Source: IDC 2.8  ZB  in  2012   85%  from  New  Data  Types   15x  Machine  Data  by  2020   40  ZB  by  2020  
  • 6. © Hortonworks Inc. 2013 Modern Data Architecture Enabled Page 6 APPLICATIONS  DATA  SYSTEMS   REPOSITORIES   RDBMS   EDW   MPP   DATA  SOURCES   OLTP,  POS   SYSTEMS   Tradi8onal  Sources     (RDBMS,  OLTP,  OLAP)   Business   Analy8cs   Custom   Applica8ons   Packaged   Applica8ons   New  Sources     (sen8ment,  clickstream,  geo,  sensor,  …)   OPERATIONAL   TOOLS   MANAGE  &   MONITOR   DEV  &  DATA   TOOLS   BUILD  &   TEST  
  • 7. © Hortonworks Inc. 2013 Hadoop Powers Modern Data Architecture Page 7 Apache Hadoop is an open source project governed by the Apache Software Foundation (ASF) that allows you to gain insight from massive amounts of structured and unstructured data quickly and without significant investment. Hadoop Cluster compute & storage . . . . . . . . compute & storage . . Hadoop clusters provide scale-out storage and distributed data processing on commodity hardware
  • 8. © Hortonworks Inc. 2013 Driving Efficiency Driving Opportunity Drivers for Hadoop Adoption Modern Data Architecture Hadoop has a central role in next generation data architectures while integrating with existing data systems Business Applications Use Hadoop to extract insights that enable new customer value and competitive edge Existing Traditional Server log Clickstream Big Data Sets Emerging Sentiment/Social Machine/Sensor Geo-locations
  • 9. © Hortonworks Inc. 2013 Opportunity in types of data 1.  Sentiment Understand how your customers feel about your brand and products – right now 2.  Clickstream Capture and analyze website visitors’ data trails and optimize your website 3.  Sensor/Machine Discover patterns in data streaming automatically from remote sensors and machines 4.  Geographic Analyze location-based data to manage operations where they occur 5.  Server Logs Research logs to diagnose process failures and prevent security breaches 6.  Unstructured (txt, video, pictures, etc..) Understand patterns in files across millions of web pages, emails, and documents Value Page 9
  • 10. © Hortonworks Inc. 2013 Efficiency in Modern Data Architecture •  Drive efficiency via modern data architecture •  Store data once and access it in many ways •  Often referred to a data lake or data repository •  Infrastructure platform driven •  IT-oriented, TCO based Page 10 APPLICATIONS  DATA  SYSTEMS   REPOSITORIES   RDBMS   EDW   MPP   DATA  SOURCES   OLTP,  POS   SYSTEMS   Tradi8onal  Sources     (RDBMS,  OLTP,  OLAP)   Business   Analy8cs   Custom   Applica8ons   Packaged   Applica8ons   New  Sources     (sen8ment,  clickstream,  geo,  sensor,  …)  
  • 11. © Hortonworks Inc. 2013 Page 11 APPLICATIONS  DATA  SYSTEMS   TRADITIONAL  REPOS   DEV  &  DATA   TOOLS   OPERATIONAL   TOOLS   Viewpoint Microsoft Applications DATA  SOURCES   DATA  INTEGRATION   Engineered for Interoperability Tradi8onal  Sources     (RDBMS,  OLTP,  OLAP)   New  Sources     (sen8ment,  clickstream,  geo,  sensor,  …)  
  • 12. © Hortonworks Inc. 2013 Integrated Interoperable with existing data center investments Skills Leverage your existing skills: development, operations, analytics Requirements for Hadoop Adoption Page 12 Key Services Platform, operational and data services essential for the enterprise 3Requirements for Hadoop’s Role in the Modern Data Architecture
  • 13. © Hortonworks Inc. 2013 Today’s Topics • Introduction • Drivers for the Modern Data Architecture (MDA) • Apache Hadoop’s role in the MDA • Informatica’s role in the MDA • Q&A Page 13
  • 14. © Hortonworks Inc. 2013 Hortonworks & Informatica Visual Development Environment Enterprise Repositories EDW LOAD Data Virtualization Batch CEP MDM INTERFACE HIVE JDBC HDFS API AMBARI MAPREDUCE YARN HDFS DATA REFINEMENT HIVE (HiveQL and UDFs) ProfileProfile Parse ETL Cleanse Match HDFSAPI LOAD Reference Architecture SOURCE DATA Batch Replicate Stream Archive JMS Queue’s Servers & Mainframe Files Databases Sensor data Social
  • 15. Data Lake Processes Mobile Apps Transactions, OLTP, OLAP Social Media, Web Logs Machine Device, Scientific Documents and Emails 9. Govern & enrich with metadata 3. Stream real-time data 8. Explore & validate data 4. Mask sensitive data 2. Replicate changed data & schemas Visualization & Analytics 11. Subscribe to datasets Data Integration Hub 1. Load or archive batch data Data Virtualization 5. Access customer “golden record MDM Enterprise Data Warehouse 10. Correlate real-time events with historical patterns & trends 6. Refine & curate data 7. Move results to EDW
  • 16. Telco Call Detail Record (CDR) Use-Case
  • 17. Use-Case: CDR Processing •  Each job picks up a number of files containing Text CDRs (Call Detail Records) •  First task is to merge partial call records •  Some records may be partial – ex. multiple records for a single call due to a dropped line or switching cell towers •  Partial records need to be merged and total call time needs to be added to duration for the merged record •  Partial records for a single call may reside in multiple files or be included in different jobs. •  Incomplete partial records need to be reprocessed by consecutive jobs •  Second task is to sort all processed CDRs by calling number
  • 18. Input CDR File Example These 3 numbers uniquely identify a call Partial calls starts with 1 and end with 0 Some partial records are incomplete Processed completed records are sorted by caller
  • 19. Output CDR Files Completed Calls Partial Calls Duration times are added to the merged records Partial records are merged into a single completed record Partial records will be reprocessed
  • 20. Logical Design Partial records only Separate partial records from completed records Completed records only Separate incomplete and complete partial records Select incomplete partial records Aggregate all completed and partial-completed records
  • 21. Viewing Data at Design Time
  • 22. More Data at Design Time
  • 26. Choose Where to Process
  • 30. CDR Pipeline Sort records by Key Summarize by Key Group Filter by Province ID Filter by Collection Date City Code Lookup Read Files Write report •  Scenario – Filter records by Date, City and Province; Aggregate and summarize records by a composite Key
  • 32. Adding Transactional Source •  Scenario - Report website use (Facebook, Twitter, etc.) by Age and by Postal Code Read WAP log records Get MSISDN and URL fields Lookup Age, Postal Code by MSISDN Count URL frequency Calculate percentages
  • 34. Result •  Easily combine big data sources with transactional data •  Example – Report website use (Facebook, Twitter, etc.) by Age and by Region Look-up of Age, Region by MSISDN CRM EDW Log Files, HDFS
  • 35. © Hortonworks Inc. 2013 Integrated Interoperable with existing data center investments Skills Leverage your existing skills: development, operations, analytics Requirements for Hadoop Adoption Page 35 Key Services Platform, operational and data services essential for the enterprise 3Requirements for Hadoop’s Role in the Modern Data Architecture
  • 36. © Hortonworks Inc. 2013 Next Steps: Page 36 Learn more about Informatica and Hadoop http://www.informatica.com/us/vision/harnessing-big- data/hadoop/ Get started on Hadoop with Hortonworks Sandbox http://hortonworks.com/products/hortonworks- sandbox/ Follow us: @hortonworks, @informatica