SlideShare a Scribd company logo
1 of 43
Download to read offline
Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Building A Modern Data Architecture (MDA)
Using Enterprise Hadoop
Slim Baltagi, Systems Architect
Hortonworks Inc.
Open-BDA Hadoop Summit 2014
November 18th, 2014
Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Your Presenter
Slim Baltagi
•  Currently a Systems Architect in the Professional Services Organization of
Hortonworks in the central region (US and Canada).
•  Over 4 years of Hadoop experience working on 9 Big Data projects.
•  Slim has over 16 years of IT experience working in various architecture,
design, development and consulting roles.
•  Slim Baltagi holds a master’s degree in Mathematics and is an ABD in
computer science from Université Laval, Québec, Canada.
•  Twitter: @SlimBaltagi
Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
© Hortonworks Inc. 2013
Outline
1. Drivers
for an
MDA
2. What’s
an MDA
3.
Hadoop’s
role in an
MDA
4. Use
Cases
related to
an MDA
5. Learn
More
6. Q&A
Page 3
Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Traditional Data Architecture Under PressureAPPLICATIONS	
  DATA	
  	
  SYSTEM	
  SOURCES	
  
Business	
  	
  
Analy:cs	
  
Custom	
  
Applica:ons	
  
Packaged	
  
Applica:ons	
  
Exis:ng	
  Sources	
  	
  
(CRM,	
  ERP,	
  Clickstream,	
  Logs)	
  
SILO	
  
SILO	
  
RDBMS	
  
SILO	
   SILO	
  
SILO	
   SILO	
  
EDW	
   MPP	
  
Data	
  growth:	
  New	
  Data	
  Types	
  
OLTP,	
  ERP,	
  CRM	
  Systems	
  
Unstructured	
  docs,	
  emails	
  
Clickstream	
  
Server	
  logs	
  
Social/Web	
  Data	
  
Sensor.	
  Machine	
  Data	
  
Geoloca:on	
  
85% 
Source: IDC
??
"   Can’t manage new
data paradigm
"   Constrains data to
specific schema
"   Siloed data
"   Limited scalability
"   Economically
unfeasible
"   Limited analytics
Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
A Modern Data Architecture for New DataAPPLICATIONS	
  DATA	
  	
  SYSTEM	
  
REPOSITORIES	
  
SOURCES	
  
Exis:ng	
  Sources	
  	
  
(CRM,	
  ERP,	
  Clickstream,	
  Logs)	
  
RDBMS	
   EDW	
   MPP	
  
Business	
  Analy:cs	
  
Custom	
  Applica:ons	
  
Packaged	
  Applica:ons	
  
OLTP,	
  ERP,	
  CRM	
  Systems	
  
Unstructured	
  documents,	
  emails	
  
Clickstream	
  
Server	
  logs	
  
Sen>ment,	
  Web	
  Data	
  
Sensor.	
  Machine	
  Data	
  
Geoloca>on	
  
New Data Requirements:
•  Scale
•  Economics
•  Flexibility
Traditional Data Architecture
Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Enterprise Goals for the Modern Data Architecture
ü  Centrally manage new and existing data
ü  Provide single view of the customer,
product, supply chain
ü  Run batch, interactive & real time analytic
applications on shared datasets
ü  Assure enterprise-grade security,
operations and governance
ü  Leverage new and existing data center
infrastructure investments
ü  Scalable and affordable; low cost per TB
ü  Deployment flexibility
APPLICATIONSDATASYSTEM
Business
Analytics
Custom
Applications
Packaged
Applications
RDBMS
EDW
MPP
YARN: Data Operating System
1 ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° N
Interactive Real-TimeBatch
CRM
ERP
Other
1 ° ° °
° ° ° °
HDFS
(Hadoop Distributed File System)
SOURCES
EXISTING	
  
Systems	
  
Clickstream	
   Web	
  &	
  
Social	
  
Geoloca:on	
   Sensor	
  &	
  
Machine	
  
Server	
  	
  
Logs	
  
Unstructured	
  
Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
1. Drivers for a Modern Data Architecture (MDA)
•  Semi-Structured and Unstructured – NEW DATA
Unstructured documents, emails, Sentiment, Web Data, Sensor, Machine Data,
Geolocation, ...
•  Enterprise Data Warehouse Optimization – REDUCED COSTS
Low-value computing tasks such as ETL consume significant EDW resources.
When offloaded to Hadoop, these ETL processes can be performed much
more efficiently, freeing up your data warehouse to perform high-value
functions like analytics and operations.
Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
1. Drivers for a Modern Data Architecture
(Continued)
•  Advanced Analytics – NEW ANALYTICS APPS
Unlike schema-on-write, which transforms data into specified schema upon
load, Hadoop empowers you to store data in any format, and then create
schema at that moment when you choose to analyze your data. This
unprecedented flexibility opens up new possibilities for iterative analytics and
delivers new business value.
•  Single Cluster, Multiple Workloads – ANY WORKLOAD
With Apache Hadoop YARN supporting multiple access methods (such as
batch, interactive, streaming and real-time) on a common data set, Hadoop
enables you to transform and view data in multiple ways simultaneously,
dramatically reducing time to insight.
Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
© Hortonworks Inc. 2013
Outline
1. Drivers
for an
MDA
2. What’s
an MDA
3.
Hadoop’s
role in an
MDA
4. Use
Cases
related to
an MDA
5. Learn
More
6. Q&A
Page 9
Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
2. What’s a Modern Data Architecture (MDA)?
•  Apache Hadoop is a core component of a Modern Data Architecture,
allowing organizations to collect, store, analyze and manipulate massive
quantities of data on their own terms—regardless of the source of that data,
how old it is, where it is stored, or under what format.
•  The Hortonworks Data Platform (HDP) delivers Enterprise Apache Hadoop,
deeply integrated with existing systems to create a highly efficient, highly
scalable way to manage all your enterprise data.
•  Integrate new & existing data sets, with existing tools & skills.
•  Make all data available for shared access and processing in multitenant
infrastructure
•  Batch, interactive & real-time use cases
Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
4. Hadoop’s role in an MDA
Page 12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
3. What’s a Modern Data Architecture (MDA)?
Page 13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Page 14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Page 15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
© Hortonworks Inc. 2013
Outline
1. Drivers
for an
MDA
2. What’s
an MDA
3.
Hadoop’s
role in an
MDA
4. Use
Cases
related to
an MDA
5. Learn
More
6. Q&A
Page 15
Page 16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Key Drivers of Hadoop
OPERATIONS	
  TOOLS	
  
Provision,
Manage &
Monitor
DEV	
  &	
  DATA	
  TOOLS	
  
Build &
Test
DATA	
  	
  SYSTEM	
  
REPOSITORIES	
  
SOURCES	
  
RDBMS	
   EDW	
   MPP	
  
APPLICATIONS	
  
Business	
  	
  
Analy:cs	
  
Custom	
  
Applica:ons	
  
Packaged	
  
Applica:ons	
  
Unlock	
  New	
  Approach	
  to	
  Analy:cs	
  
•  Agile	
  analy>cs	
  via	
  “Schema	
  on	
  Read”	
  with	
  
ability	
  to	
  store	
  all	
  data	
  in	
  na>ve	
  format	
  
•  Create	
  new	
  apps	
  from	
  new	
  types	
  of	
  data	
  
A
Op:mize	
  Investments,	
  Cut	
  Costs	
  
•  Focus	
  EDW	
  on	
  high	
  value	
  workloads	
  
•  Use	
  commodity	
  servers	
  &	
  storage	
  to	
  
enable	
  all	
  data	
  (original	
  and	
  historical)	
  to	
  
be	
  accessible	
  for	
  ongoing	
  explora>on	
  
B
Enable	
  a	
  Modern	
  Data	
  Architecture	
  
•  Integrate	
  new	
  &	
  exis>ng	
  data	
  sets	
  
•  Make	
  all	
  data	
  available	
  for	
  shared	
  access	
  and	
  
processing	
  in	
  mul>tenant	
  infrastructure	
  
•  Batch,	
  interac>ve	
  &	
  real-­‐>me	
  use	
  cases	
  
•  Integrated	
  with	
  exis>ng	
  tools	
  &	
  skills	
  
C
EXISTING	
  
Systems	
  
Clickstream	
   Web	
  &	
  
Social	
  
Geoloca:on	
   Sensor	
  &	
  
Machine	
  
Server	
  	
  
Logs	
  
Unstructured	
  
YARN: Data Operating System
° ° ° ° ° ° ° ° °
Interactive Real-TimeBatch
HDFS: Hadoop Distributed File System
Page 17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hadoop: It’s About Scale & Structure
Hadoop
schema
governance
best fit use
processing
Required on write Required on read
Standards and structured Multiple Structures
Limited, no data processing Processing coupled with data
data typesStructured Multi and unstructured
Complex ACID Transactions
Operational Data Store
Data Discovery
Processing unstructured data
Interactive Analytics
Traditional
RDBMS
SCALE
(storage & processing)
transactionsOptimized, reliable Optimized for analytics
Page 18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
YARN and HDP Enables the Modern Data Architecture
YARN is the architectural center of
Hadoop and HDP
•  YARN enables a common data set
across all applications
•  Batch, interactive & real-time
workloads
•  Support multi-tenant access &
processing
HDP enables Apache Hadoop to
become Enterprise Viable Data
Platform with centralized services
•  Security
•  Governance
•  Operations
•  Productization
Enabled broad ecosystem
adoption
Hortonworks drove this innovation of Hadoop through YARN
Hortonworks Data Platform 2.2
YARN: Data Operating System
(Cluster Resource Management)
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
Script
Pig
SQL
Hive
Tez
Tez
Java
Scala
Cascading
Tez
° °
° °
° ° ° ° °
° ° ° ° °
HDFS
(Hadoop Distributed File System)
Stream
Storm
Search
Solr
NoSQL
HBase
Accumulo
Slider
 Slider
SECURITYGOVERNANCE OPERATIONSBATCH, INTERACTIVE & REAL-TIME DATA ACCESS
In-
Memory
Spark
Provision,
Manage &
Monitor
Ambari
Zookeeper
Scheduling
Oozie
Data Workflow,
Lifecycle &
Governance
Falcon
Sqoop
Flume
Kafka
NFS
WebHDFS
Authentication
Authorization
Audit
Data Protection
Storage: HDFS
Resources: YARN
Access: Hive
Pipeline: Falcon
Cluster: Ranger
Cluster: Knox
Deployment ChoiceLinux Windows Cloud
Others
ISV
Engines
On-Premises
Page 19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Page 20 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
© Hortonworks Inc. 2013
Outline
1. Drivers
for an
MDA
2. What’s
an MDA
3.
Hadoop’s
role in an
MDA
4. Use
Cases
related to
an MDA
5. Learn
More
6. Q&A
Page 20
Page 21 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
…to real-time personalizationFrom static branding
…to repair before breakFrom break then fix
…to designer medicineFrom mass treatment
…to automated algorithmsFrom educated investing
…to 1x1 targetingFrom mass branding
A shift in Advertising
A shift in Financial Services
A shift in Healthcare
A shift in Retail
A shift in Manufacturing
Hadoop enables
organizations to cost
effectively store and use
all of the data available
in a way that shifts the
business from…
Reactive
Proactive
Shift to Data-driven Means Treating Data like
Capital
Page 22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Create New Applications from New Types of Data
INDUSTRY USE CASE
Sentiment
& Web
Clickstream
& Behavior
Machine
& Sensor
Geographic Server Logs
Structured &
Unstructured
Financial Services
New Account Risk Screens ✔ ✔
Trading Risk ✔ ✔
Insurance Underwriting ✔ ✔ ✔
Telecom
Call Detail Records (CDR) ✔ ✔
Infrastructure Investment ✔ ✔
Real-time Bandwidth Allocation ✔ ✔ ✔ ✔ ✔
Retail
360° View of the Customer ✔ ✔ ✔
Localized, Personalized Promotions ✔
Website Optimization ✔
Manufacturing
Supply Chain and Logistics ✔
Assembly Line Quality Assurance ✔
Crowd-sourced Quality Assurance ✔
Healthcare
Use Genomic Data in Medical Trials ✔ ✔
Monitor Patient Vitals in Real-Time ✔ ✔
Pharmaceuticals
Recruit and Retain Patients for Drug Trials ✔ ✔
Improve Prescription Adherence ✔ ✔ ✔ ✔
Oil & Gas
Unify Exploration & Production Data ✔ ✔ ✔ ✔
Monitor Rig Safety in Real-Time ✔ ✔ ✔
Government
ETL Offload/Federal Budgetary Pressures ✔ ✔
Sentiment Analysis for Government Programs ✔
Page 23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
4.1 Advertising
•  Mine Grocery & Drug Store POS Data to Identify High-Value Shoppers
•  Target Ads to Customers in Specific Cultural or Linguistic Segments
•  Syndicate Videos According to Behavior, Demographics & Channel
•  ETL Toy Market Research Data for Longer Retention & Deeper Insight
•  Optimize Online Ad Placement for Retail Websites
Page 24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
5. Use Cases related to an MDA (Continued)
Page 25 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
4.2 Financial Services
•  Screen New Account Applications for Risk of Default
•  Monetize Anonymous Banking Data in Secondary Markets
•  Improve Underwriting Efficiency for Usage-Based Auto Insurance
•  Analyze Insurance Claims with a Shared Data Lake
•  Maintain Sub-Second SLAs with a Hadoop “Ticker Plant”
•  Surveillance of Trading Logs for Anti-Laundering Analysis
Page 26 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Page 27 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
4.3 Healthcare
•  Access Genomic Data for Medical Trials
•  Monitor Patient Vitals in Real-Time
•  Reduce Cardiac Re-Admittance Rates
•  Machine Learning to Screen for Autism with In-Home Testing
•  Store Medical Research Data Forever
•  Recruit Research Cohorts for Pharmaceutical Trials
•  Track Equipment and Medicines with RFID Data
•  Improve Prescription Adherence
Page 28 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Page 29 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
4.4 Manufacturing
•  Assure Just-In-Time Delivery of Raw Materials
•  Control Quality with Real-Time & Historical Assembly Line Data
•  Avoid Stoppages with Proactive Equipment Maintenance
•  Increase Yields in Drug Manufacturing
•  Crowdsource Quality Assurance
Page 30 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Page 31 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
4.5 Oil & Gas
•  Slow Decline Curves with Production Parameter Optimization
•  Define Operational Set Points for Each Well & Receive Alerts on Deviations
•  Optimize Lease Bidding with Reliable Yield Predictions
•  Report on Compliance with Environmental , Health and Safety Regulations
•  Repair Equipment Preventatively with Targeted Maintenance
•  Integrate Exploration with Seismic Image Processing
Page 32 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Page 33 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
4.6 Public Sector
•  Understand Public Sentiment About Government Performance
•  Protect Critical Networks from Threats (Both Internal and External)
•  Prevent Fraud and Waste
•  Analyze Social Media to Identify Terrorist Threats
•  Decrease Budget Pressures by Offloading Expensive SQL Workloads
•  Crowdsource Reporting for Repairs to Roads and Public Infrastructure
•  Fulfill “Open Records” and Freedom of Information Requests
Page 34 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
4.7 Retail
•  Build a 360 degrees View of the Customer
•  Analyze Brand Sentiment
•  Localize & Personalize Promotions
•  Optimize Websites
•  Optimize Store Layouts
Page 35 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Page 36 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
4.8 Telecom
•  Analyze Call Detail Records (CDRs)
•  Service Equipment Proactively
•  Rationalize Infrastructure Investments
•  Recommend Next Product to Buy (NPTB)
•  Allocate Bandwidth in Real-time
•  Develop New Products
Page 37 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Page 38 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
© Hortonworks Inc. 2013
What is a Data Lake?
• An architectural pattern in the data center that uses Hadoop to deliver
deeper insight across a large, broad, diverse set of data at efficient scale
§ But What is it?
– It is a PLATFORM for your data. (It is not a database)
– Multipurpose open PLATFORM to land all data in a single place and interact with it many
ways.
§ A platform that allows for the ecosystem to provide higher level services (SAS, SAP,
Microsoft, Streaming, MPP, In-memory, etc..)
§ Provides first class APIs and frameworks to enable this integration
§ Provides first class data management capabilities (metadata management, security,
transformation pipelines, replication, retention, etc..)
Page 38
Page 39 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Knox – Perimeter Level Security
compute
&
storage
. . .
. . .
. .
compute
&
storage
.
.
YARN
Data Lake HDP Grid
AMBARI
HDP Data Lake Reference Architecture
Page 39
HCATALOG
(table & user-defined metadata)
Step 2: Model/Apply Metadata
Use Case Type 1:
Materialize & Exchange
Opens up Hadoop to many
new use cases
Stream Processing,
Real-time Search,
MPI
YARN
Apps
INTERACTIVE
Hive Server
(Tez/Stinger)
Query/
Analytics/Reporting
Tools
Tableau/Excel
Datameer/Platfora/SAP
Use Case Type 2:
Explore/Visualize
FALCON (data pipeline & flow management)
Manage Steps 1-4: Data Management with Falcon
Oozie (Batch scheduler)
(data processing)
HIVE PIG Mahout
Exchange
HBase
Client
Sqoop/Hive
Downstream
Data Sources
OLTP
HBase
EDW
(Teradata)
Storm
SAS
SOLR
TEZ
Step 3: Transform, Aggregate & Materialize
MR2
Step 4: Schedule and Orchestrate
Ingestion
SQOOP
FLUME
Web HDFS
NFS
SOURCE DATA
ClickStream Data
Sales
Transaction/Data
Product Data
Marketing/
Inventory
Social Data
EDW
File
JMS
REST
HTTP
Streaming
STORM
Step 1:Extract & Load
Page 40 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
© Hortonworks Inc. 2013
Outline
1. Drivers
for an
MDA
2. What’s
an MDA
3.
Hadoop’s
role in an
MDA
4. Use
Cases
related to
an MDA
5. Learn
More
6. Q&A
Page 40
Page 41 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
© Hortonworks Inc. 2013
5. Learn More …
Page 41
Resource Location
MDA White Paper http://info.hortonworks.com/data-lake-hadoop-whitepaper.html
Learn more about Modern Data Architecture (MDA)
MDA Web Page http://hortonworks.com/hadoop-modern-data-architecture/
Explore Use Cases by Industry
Hortonworks
Sandbox
http://hortonworks.com/products/hortonworks-sandbox/
Get Started on Hadoop with Hortonworks Sandbox
Hadoop Tutorials http://info.hortonworks.com/On-demand-Tutorials_Sign-Up-Page.html
On-Demand Hadoop Tutorials Delivered to Your Inbox
Enterprise Data
Lake
http://hortonworks.com/blog/enterprise-hadoop-journey-data-lake/
Enterprise Hadoop and the journey to Data Lake
Page 42 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
© Hortonworks Inc. 2013
Outline
1. Drivers
for an
MDA
2. What’s
an MDA
3.
Hadoop’s
role in an
MDA
4. Use
Cases
related to
an MDA
5. Learn
More
6. Q&A
Page 42
Page 43 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
6. Q&A…
Thank you!

More Related Content

What's hot

Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with AmbariAmbari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with AmbariHortonworks
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata Hortonworks
 
Oncrawl elasticsearch meetup france #12
Oncrawl elasticsearch meetup france #12Oncrawl elasticsearch meetup france #12
Oncrawl elasticsearch meetup france #12Tanguy MOAL
 
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...Revolution Analytics
 
Planing and optimizing data lake architecture
Planing and optimizing data lake architecturePlaning and optimizing data lake architecture
Planing and optimizing data lake architectureMilos Milovanovic
 
Apache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudApache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudHortonworks
 
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...DataWorks Summit
 
Webinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalWebinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalHortonworks
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...Mark Rittman
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...DataWorks Summit/Hadoop Summit
 
Cloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarCloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarHortonworks
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 DataWorks Summit
 
Hadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data ProcessingHadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data ProcessingHortonworks
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Innovative Management Services
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesDenodo
 
Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success DataWorks Summit/Hadoop Summit
 
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016StampedeCon
 
Big Data & Data Lakes Building Blocks
Big Data & Data Lakes Building BlocksBig Data & Data Lakes Building Blocks
Big Data & Data Lakes Building BlocksAmazon Web Services
 
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...Hortonworks
 

What's hot (20)

Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with AmbariAmbari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
 
Solving Big Data Problems using Hortonworks
Solving Big Data Problems using Hortonworks Solving Big Data Problems using Hortonworks
Solving Big Data Problems using Hortonworks
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
 
Oncrawl elasticsearch meetup france #12
Oncrawl elasticsearch meetup france #12Oncrawl elasticsearch meetup france #12
Oncrawl elasticsearch meetup france #12
 
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
 
Planing and optimizing data lake architecture
Planing and optimizing data lake architecturePlaning and optimizing data lake architecture
Planing and optimizing data lake architecture
 
Apache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudApache Hadoop on the Open Cloud
Apache Hadoop on the Open Cloud
 
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
 
Webinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalWebinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_final
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
 
Cloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarCloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinar
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
 
Hadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data ProcessingHadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data Processing
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data Lakes
 
Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success
 
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
 
Big Data & Data Lakes Building Blocks
Big Data & Data Lakes Building BlocksBig Data & Data Lakes Building Blocks
Big Data & Data Lakes Building Blocks
 
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
 

Viewers also liked

Aljoscha Krettek - Portable stateful big data processing in Apache Beam
Aljoscha Krettek - Portable stateful big data processing in Apache BeamAljoscha Krettek - Portable stateful big data processing in Apache Beam
Aljoscha Krettek - Portable stateful big data processing in Apache BeamVerverica
 
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry confluent
 
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiHadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiSlim Baltagi
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataDataWorks Summit/Hadoop Summit
 
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitAnalysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitSlim Baltagi
 
Kafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsKafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsSlim Baltagi
 
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-BaltagiApache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-BaltagiSlim Baltagi
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsApache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsSlim Baltagi
 
Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaSlim Baltagi
 
Flink Case Study: Amadeus
Flink Case Study: AmadeusFlink Case Study: Amadeus
Flink Case Study: AmadeusFlink Forward
 
Flink Case Study: OKKAM
Flink Case Study: OKKAMFlink Case Study: OKKAM
Flink Case Study: OKKAMFlink Forward
 
Flink Case Study: Capital One
Flink Case Study: Capital OneFlink Case Study: Capital One
Flink Case Study: Capital OneFlink Forward
 
Making Great User Experiences, Pittsburgh Scrum MeetUp, Oct 17, 2017
Making Great User Experiences, Pittsburgh Scrum MeetUp, Oct 17, 2017Making Great User Experiences, Pittsburgh Scrum MeetUp, Oct 17, 2017
Making Great User Experiences, Pittsburgh Scrum MeetUp, Oct 17, 2017Carol Smith
 

Viewers also liked (14)

Aljoscha Krettek - Portable stateful big data processing in Apache Beam
Aljoscha Krettek - Portable stateful big data processing in Apache BeamAljoscha Krettek - Portable stateful big data processing in Apache Beam
Aljoscha Krettek - Portable stateful big data processing in Apache Beam
 
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
 
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiHadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing data
 
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitAnalysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
 
Kafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsKafka Streams for Java enthusiasts
Kafka Streams for Java enthusiasts
 
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-BaltagiApache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsApache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming Analytics
 
Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache Kafka
 
Flink vs. Spark
Flink vs. SparkFlink vs. Spark
Flink vs. Spark
 
Flink Case Study: Amadeus
Flink Case Study: AmadeusFlink Case Study: Amadeus
Flink Case Study: Amadeus
 
Flink Case Study: OKKAM
Flink Case Study: OKKAMFlink Case Study: OKKAM
Flink Case Study: OKKAM
 
Flink Case Study: Capital One
Flink Case Study: Capital OneFlink Case Study: Capital One
Flink Case Study: Capital One
 
Making Great User Experiences, Pittsburgh Scrum MeetUp, Oct 17, 2017
Making Great User Experiences, Pittsburgh Scrum MeetUp, Oct 17, 2017Making Great User Experiences, Pittsburgh Scrum MeetUp, Oct 17, 2017
Making Great User Experiences, Pittsburgh Scrum MeetUp, Oct 17, 2017
 

Similar to Building a Modern Data Architecture with Enterprise Hadoop

A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...Hortonworks
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopPOSSCON
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataHortonworks
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopHortonworks
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataWANdisco Plc
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopHortonworks
 
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Hortonworks
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataHortonworks
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Hortonworks
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopHortonworks
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopHortonworks
 
Bridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldBridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldCA Technologies
 
Enterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the UnionEnterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the UnionHortonworks
 
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGskumpf
 
Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014Hortonworks
 
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Hortonworks
 

Similar to Building a Modern Data Architecture with Enterprise Hadoop (20)

A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache Hadoop
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
 
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
 
Bridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldBridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven World
 
Enterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the UnionEnterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the Union
 
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - Webinar
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
 
Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014
 
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
 

More from Slim Baltagi

How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?Slim Baltagi
 
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-BaltagiModern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-BaltagiSlim Baltagi
 
Modern big data and machine learning in the era of cloud, docker and kubernetes
Modern big data and machine learning in the era of cloud, docker and kubernetesModern big data and machine learning in the era of cloud, docker and kubernetes
Modern big data and machine learning in the era of cloud, docker and kubernetesSlim Baltagi
 
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeApache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeSlim Baltagi
 
Apache Fink 1.0: A New Era for Real-World Streaming Analytics
Apache Fink 1.0: A New Era  for Real-World Streaming AnalyticsApache Fink 1.0: A New Era  for Real-World Streaming Analytics
Apache Fink 1.0: A New Era for Real-World Streaming AnalyticsSlim Baltagi
 
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics FrameworksOverview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics FrameworksSlim Baltagi
 
Apache Flink community Update for March 2016 - Slim Baltagi
Apache Flink community Update for March 2016 - Slim BaltagiApache Flink community Update for March 2016 - Slim Baltagi
Apache Flink community Update for March 2016 - Slim BaltagiSlim Baltagi
 
Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Slim Baltagi
 
Unified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache FlinkUnified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache FlinkSlim Baltagi
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksSlim Baltagi
 
Apache Flink Crash Course by Slim Baltagi and Srini Palthepu
Apache Flink Crash Course by Slim Baltagi and Srini PalthepuApache Flink Crash Course by Slim Baltagi and Srini Palthepu
Apache Flink Crash Course by Slim Baltagi and Srini PalthepuSlim Baltagi
 
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkOverview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkSlim Baltagi
 
Big Data at CME Group: Challenges and Opportunities
Big Data at CME Group: Challenges and Opportunities Big Data at CME Group: Challenges and Opportunities
Big Data at CME Group: Challenges and Opportunities Slim Baltagi
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkSlim Baltagi
 
A Big Data Journey: Bringing Open Source to Finance
A Big Data Journey: Bringing Open Source to FinanceA Big Data Journey: Bringing Open Source to Finance
A Big Data Journey: Bringing Open Source to FinanceSlim Baltagi
 

More from Slim Baltagi (15)

How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?
 
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-BaltagiModern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
 
Modern big data and machine learning in the era of cloud, docker and kubernetes
Modern big data and machine learning in the era of cloud, docker and kubernetesModern big data and machine learning in the era of cloud, docker and kubernetes
Modern big data and machine learning in the era of cloud, docker and kubernetes
 
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeApache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
 
Apache Fink 1.0: A New Era for Real-World Streaming Analytics
Apache Fink 1.0: A New Era  for Real-World Streaming AnalyticsApache Fink 1.0: A New Era  for Real-World Streaming Analytics
Apache Fink 1.0: A New Era for Real-World Streaming Analytics
 
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics FrameworksOverview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
 
Apache Flink community Update for March 2016 - Slim Baltagi
Apache Flink community Update for March 2016 - Slim BaltagiApache Flink community Update for March 2016 - Slim Baltagi
Apache Flink community Update for March 2016 - Slim Baltagi
 
Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink
 
Unified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache FlinkUnified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache Flink
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics Frameworks
 
Apache Flink Crash Course by Slim Baltagi and Srini Palthepu
Apache Flink Crash Course by Slim Baltagi and Srini PalthepuApache Flink Crash Course by Slim Baltagi and Srini Palthepu
Apache Flink Crash Course by Slim Baltagi and Srini Palthepu
 
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkOverview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
 
Big Data at CME Group: Challenges and Opportunities
Big Data at CME Group: Challenges and Opportunities Big Data at CME Group: Challenges and Opportunities
Big Data at CME Group: Challenges and Opportunities
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to Spark
 
A Big Data Journey: Bringing Open Source to Finance
A Big Data Journey: Bringing Open Source to FinanceA Big Data Journey: Bringing Open Source to Finance
A Big Data Journey: Bringing Open Source to Finance
 

Recently uploaded

Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 

Recently uploaded (20)

Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 

Building a Modern Data Architecture with Enterprise Hadoop

  • 1. Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Building A Modern Data Architecture (MDA) Using Enterprise Hadoop Slim Baltagi, Systems Architect Hortonworks Inc. Open-BDA Hadoop Summit 2014 November 18th, 2014
  • 2. Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Your Presenter Slim Baltagi •  Currently a Systems Architect in the Professional Services Organization of Hortonworks in the central region (US and Canada). •  Over 4 years of Hadoop experience working on 9 Big Data projects. •  Slim has over 16 years of IT experience working in various architecture, design, development and consulting roles. •  Slim Baltagi holds a master’s degree in Mathematics and is an ABD in computer science from Université Laval, Québec, Canada. •  Twitter: @SlimBaltagi
  • 3. Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved © Hortonworks Inc. 2013 Outline 1. Drivers for an MDA 2. What’s an MDA 3. Hadoop’s role in an MDA 4. Use Cases related to an MDA 5. Learn More 6. Q&A Page 3
  • 4. Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Traditional Data Architecture Under PressureAPPLICATIONS  DATA    SYSTEM  SOURCES   Business     Analy:cs   Custom   Applica:ons   Packaged   Applica:ons   Exis:ng  Sources     (CRM,  ERP,  Clickstream,  Logs)   SILO   SILO   RDBMS   SILO   SILO   SILO   SILO   EDW   MPP   Data  growth:  New  Data  Types   OLTP,  ERP,  CRM  Systems   Unstructured  docs,  emails   Clickstream   Server  logs   Social/Web  Data   Sensor.  Machine  Data   Geoloca:on   85% Source: IDC ?? "   Can’t manage new data paradigm "   Constrains data to specific schema "   Siloed data "   Limited scalability "   Economically unfeasible "   Limited analytics
  • 5. Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved A Modern Data Architecture for New DataAPPLICATIONS  DATA    SYSTEM   REPOSITORIES   SOURCES   Exis:ng  Sources     (CRM,  ERP,  Clickstream,  Logs)   RDBMS   EDW   MPP   Business  Analy:cs   Custom  Applica:ons   Packaged  Applica:ons   OLTP,  ERP,  CRM  Systems   Unstructured  documents,  emails   Clickstream   Server  logs   Sen>ment,  Web  Data   Sensor.  Machine  Data   Geoloca>on   New Data Requirements: •  Scale •  Economics •  Flexibility Traditional Data Architecture
  • 6. Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Enterprise Goals for the Modern Data Architecture ü  Centrally manage new and existing data ü  Provide single view of the customer, product, supply chain ü  Run batch, interactive & real time analytic applications on shared datasets ü  Assure enterprise-grade security, operations and governance ü  Leverage new and existing data center infrastructure investments ü  Scalable and affordable; low cost per TB ü  Deployment flexibility APPLICATIONSDATASYSTEM Business Analytics Custom Applications Packaged Applications RDBMS EDW MPP YARN: Data Operating System 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N Interactive Real-TimeBatch CRM ERP Other 1 ° ° ° ° ° ° ° HDFS (Hadoop Distributed File System) SOURCES EXISTING   Systems   Clickstream   Web  &   Social   Geoloca:on   Sensor  &   Machine   Server     Logs   Unstructured  
  • 7. Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 1. Drivers for a Modern Data Architecture (MDA) •  Semi-Structured and Unstructured – NEW DATA Unstructured documents, emails, Sentiment, Web Data, Sensor, Machine Data, Geolocation, ... •  Enterprise Data Warehouse Optimization – REDUCED COSTS Low-value computing tasks such as ETL consume significant EDW resources. When offloaded to Hadoop, these ETL processes can be performed much more efficiently, freeing up your data warehouse to perform high-value functions like analytics and operations.
  • 8. Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 1. Drivers for a Modern Data Architecture (Continued) •  Advanced Analytics – NEW ANALYTICS APPS Unlike schema-on-write, which transforms data into specified schema upon load, Hadoop empowers you to store data in any format, and then create schema at that moment when you choose to analyze your data. This unprecedented flexibility opens up new possibilities for iterative analytics and delivers new business value. •  Single Cluster, Multiple Workloads – ANY WORKLOAD With Apache Hadoop YARN supporting multiple access methods (such as batch, interactive, streaming and real-time) on a common data set, Hadoop enables you to transform and view data in multiple ways simultaneously, dramatically reducing time to insight.
  • 9. Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved © Hortonworks Inc. 2013 Outline 1. Drivers for an MDA 2. What’s an MDA 3. Hadoop’s role in an MDA 4. Use Cases related to an MDA 5. Learn More 6. Q&A Page 9
  • 10. Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 2. What’s a Modern Data Architecture (MDA)? •  Apache Hadoop is a core component of a Modern Data Architecture, allowing organizations to collect, store, analyze and manipulate massive quantities of data on their own terms—regardless of the source of that data, how old it is, where it is stored, or under what format. •  The Hortonworks Data Platform (HDP) delivers Enterprise Apache Hadoop, deeply integrated with existing systems to create a highly efficient, highly scalable way to manage all your enterprise data. •  Integrate new & existing data sets, with existing tools & skills. •  Make all data available for shared access and processing in multitenant infrastructure •  Batch, interactive & real-time use cases
  • 11. Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 4. Hadoop’s role in an MDA
  • 12. Page 12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 3. What’s a Modern Data Architecture (MDA)?
  • 13. Page 13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 14. Page 14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 15. Page 15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved © Hortonworks Inc. 2013 Outline 1. Drivers for an MDA 2. What’s an MDA 3. Hadoop’s role in an MDA 4. Use Cases related to an MDA 5. Learn More 6. Q&A Page 15
  • 16. Page 16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Key Drivers of Hadoop OPERATIONS  TOOLS   Provision, Manage & Monitor DEV  &  DATA  TOOLS   Build & Test DATA    SYSTEM   REPOSITORIES   SOURCES   RDBMS   EDW   MPP   APPLICATIONS   Business     Analy:cs   Custom   Applica:ons   Packaged   Applica:ons   Unlock  New  Approach  to  Analy:cs   •  Agile  analy>cs  via  “Schema  on  Read”  with   ability  to  store  all  data  in  na>ve  format   •  Create  new  apps  from  new  types  of  data   A Op:mize  Investments,  Cut  Costs   •  Focus  EDW  on  high  value  workloads   •  Use  commodity  servers  &  storage  to   enable  all  data  (original  and  historical)  to   be  accessible  for  ongoing  explora>on   B Enable  a  Modern  Data  Architecture   •  Integrate  new  &  exis>ng  data  sets   •  Make  all  data  available  for  shared  access  and   processing  in  mul>tenant  infrastructure   •  Batch,  interac>ve  &  real-­‐>me  use  cases   •  Integrated  with  exis>ng  tools  &  skills   C EXISTING   Systems   Clickstream   Web  &   Social   Geoloca:on   Sensor  &   Machine   Server     Logs   Unstructured   YARN: Data Operating System ° ° ° ° ° ° ° ° ° Interactive Real-TimeBatch HDFS: Hadoop Distributed File System
  • 17. Page 17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Hadoop: It’s About Scale & Structure Hadoop schema governance best fit use processing Required on write Required on read Standards and structured Multiple Structures Limited, no data processing Processing coupled with data data typesStructured Multi and unstructured Complex ACID Transactions Operational Data Store Data Discovery Processing unstructured data Interactive Analytics Traditional RDBMS SCALE (storage & processing) transactionsOptimized, reliable Optimized for analytics
  • 18. Page 18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved YARN and HDP Enables the Modern Data Architecture YARN is the architectural center of Hadoop and HDP •  YARN enables a common data set across all applications •  Batch, interactive & real-time workloads •  Support multi-tenant access & processing HDP enables Apache Hadoop to become Enterprise Viable Data Platform with centralized services •  Security •  Governance •  Operations •  Productization Enabled broad ecosystem adoption Hortonworks drove this innovation of Hadoop through YARN Hortonworks Data Platform 2.2 YARN: Data Operating System (Cluster Resource Management) 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° Script Pig SQL Hive Tez Tez Java Scala Cascading Tez ° ° ° ° ° ° ° ° ° ° ° ° ° ° HDFS (Hadoop Distributed File System) Stream Storm Search Solr NoSQL HBase Accumulo Slider Slider SECURITYGOVERNANCE OPERATIONSBATCH, INTERACTIVE & REAL-TIME DATA ACCESS In- Memory Spark Provision, Manage & Monitor Ambari Zookeeper Scheduling Oozie Data Workflow, Lifecycle & Governance Falcon Sqoop Flume Kafka NFS WebHDFS Authentication Authorization Audit Data Protection Storage: HDFS Resources: YARN Access: Hive Pipeline: Falcon Cluster: Ranger Cluster: Knox Deployment ChoiceLinux Windows Cloud Others ISV Engines On-Premises
  • 19. Page 19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 20. Page 20 © Hortonworks Inc. 2011 – 2014. All Rights Reserved © Hortonworks Inc. 2013 Outline 1. Drivers for an MDA 2. What’s an MDA 3. Hadoop’s role in an MDA 4. Use Cases related to an MDA 5. Learn More 6. Q&A Page 20
  • 21. Page 21 © Hortonworks Inc. 2011 – 2014. All Rights Reserved …to real-time personalizationFrom static branding …to repair before breakFrom break then fix …to designer medicineFrom mass treatment …to automated algorithmsFrom educated investing …to 1x1 targetingFrom mass branding A shift in Advertising A shift in Financial Services A shift in Healthcare A shift in Retail A shift in Manufacturing Hadoop enables organizations to cost effectively store and use all of the data available in a way that shifts the business from… Reactive Proactive Shift to Data-driven Means Treating Data like Capital
  • 22. Page 22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Create New Applications from New Types of Data INDUSTRY USE CASE Sentiment & Web Clickstream & Behavior Machine & Sensor Geographic Server Logs Structured & Unstructured Financial Services New Account Risk Screens ✔ ✔ Trading Risk ✔ ✔ Insurance Underwriting ✔ ✔ ✔ Telecom Call Detail Records (CDR) ✔ ✔ Infrastructure Investment ✔ ✔ Real-time Bandwidth Allocation ✔ ✔ ✔ ✔ ✔ Retail 360° View of the Customer ✔ ✔ ✔ Localized, Personalized Promotions ✔ Website Optimization ✔ Manufacturing Supply Chain and Logistics ✔ Assembly Line Quality Assurance ✔ Crowd-sourced Quality Assurance ✔ Healthcare Use Genomic Data in Medical Trials ✔ ✔ Monitor Patient Vitals in Real-Time ✔ ✔ Pharmaceuticals Recruit and Retain Patients for Drug Trials ✔ ✔ Improve Prescription Adherence ✔ ✔ ✔ ✔ Oil & Gas Unify Exploration & Production Data ✔ ✔ ✔ ✔ Monitor Rig Safety in Real-Time ✔ ✔ ✔ Government ETL Offload/Federal Budgetary Pressures ✔ ✔ Sentiment Analysis for Government Programs ✔
  • 23. Page 23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 4.1 Advertising •  Mine Grocery & Drug Store POS Data to Identify High-Value Shoppers •  Target Ads to Customers in Specific Cultural or Linguistic Segments •  Syndicate Videos According to Behavior, Demographics & Channel •  ETL Toy Market Research Data for Longer Retention & Deeper Insight •  Optimize Online Ad Placement for Retail Websites
  • 24. Page 24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 5. Use Cases related to an MDA (Continued)
  • 25. Page 25 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 4.2 Financial Services •  Screen New Account Applications for Risk of Default •  Monetize Anonymous Banking Data in Secondary Markets •  Improve Underwriting Efficiency for Usage-Based Auto Insurance •  Analyze Insurance Claims with a Shared Data Lake •  Maintain Sub-Second SLAs with a Hadoop “Ticker Plant” •  Surveillance of Trading Logs for Anti-Laundering Analysis
  • 26. Page 26 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 27. Page 27 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 4.3 Healthcare •  Access Genomic Data for Medical Trials •  Monitor Patient Vitals in Real-Time •  Reduce Cardiac Re-Admittance Rates •  Machine Learning to Screen for Autism with In-Home Testing •  Store Medical Research Data Forever •  Recruit Research Cohorts for Pharmaceutical Trials •  Track Equipment and Medicines with RFID Data •  Improve Prescription Adherence
  • 28. Page 28 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 29. Page 29 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 4.4 Manufacturing •  Assure Just-In-Time Delivery of Raw Materials •  Control Quality with Real-Time & Historical Assembly Line Data •  Avoid Stoppages with Proactive Equipment Maintenance •  Increase Yields in Drug Manufacturing •  Crowdsource Quality Assurance
  • 30. Page 30 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 31. Page 31 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 4.5 Oil & Gas •  Slow Decline Curves with Production Parameter Optimization •  Define Operational Set Points for Each Well & Receive Alerts on Deviations •  Optimize Lease Bidding with Reliable Yield Predictions •  Report on Compliance with Environmental , Health and Safety Regulations •  Repair Equipment Preventatively with Targeted Maintenance •  Integrate Exploration with Seismic Image Processing
  • 32. Page 32 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 33. Page 33 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 4.6 Public Sector •  Understand Public Sentiment About Government Performance •  Protect Critical Networks from Threats (Both Internal and External) •  Prevent Fraud and Waste •  Analyze Social Media to Identify Terrorist Threats •  Decrease Budget Pressures by Offloading Expensive SQL Workloads •  Crowdsource Reporting for Repairs to Roads and Public Infrastructure •  Fulfill “Open Records” and Freedom of Information Requests
  • 34. Page 34 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 4.7 Retail •  Build a 360 degrees View of the Customer •  Analyze Brand Sentiment •  Localize & Personalize Promotions •  Optimize Websites •  Optimize Store Layouts
  • 35. Page 35 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 36. Page 36 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 4.8 Telecom •  Analyze Call Detail Records (CDRs) •  Service Equipment Proactively •  Rationalize Infrastructure Investments •  Recommend Next Product to Buy (NPTB) •  Allocate Bandwidth in Real-time •  Develop New Products
  • 37. Page 37 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 38. Page 38 © Hortonworks Inc. 2011 – 2014. All Rights Reserved © Hortonworks Inc. 2013 What is a Data Lake? • An architectural pattern in the data center that uses Hadoop to deliver deeper insight across a large, broad, diverse set of data at efficient scale § But What is it? – It is a PLATFORM for your data. (It is not a database) – Multipurpose open PLATFORM to land all data in a single place and interact with it many ways. § A platform that allows for the ecosystem to provide higher level services (SAS, SAP, Microsoft, Streaming, MPP, In-memory, etc..) § Provides first class APIs and frameworks to enable this integration § Provides first class data management capabilities (metadata management, security, transformation pipelines, replication, retention, etc..) Page 38
  • 39. Page 39 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Knox – Perimeter Level Security compute & storage . . . . . . . . compute & storage . . YARN Data Lake HDP Grid AMBARI HDP Data Lake Reference Architecture Page 39 HCATALOG (table & user-defined metadata) Step 2: Model/Apply Metadata Use Case Type 1: Materialize & Exchange Opens up Hadoop to many new use cases Stream Processing, Real-time Search, MPI YARN Apps INTERACTIVE Hive Server (Tez/Stinger) Query/ Analytics/Reporting Tools Tableau/Excel Datameer/Platfora/SAP Use Case Type 2: Explore/Visualize FALCON (data pipeline & flow management) Manage Steps 1-4: Data Management with Falcon Oozie (Batch scheduler) (data processing) HIVE PIG Mahout Exchange HBase Client Sqoop/Hive Downstream Data Sources OLTP HBase EDW (Teradata) Storm SAS SOLR TEZ Step 3: Transform, Aggregate & Materialize MR2 Step 4: Schedule and Orchestrate Ingestion SQOOP FLUME Web HDFS NFS SOURCE DATA ClickStream Data Sales Transaction/Data Product Data Marketing/ Inventory Social Data EDW File JMS REST HTTP Streaming STORM Step 1:Extract & Load
  • 40. Page 40 © Hortonworks Inc. 2011 – 2014. All Rights Reserved © Hortonworks Inc. 2013 Outline 1. Drivers for an MDA 2. What’s an MDA 3. Hadoop’s role in an MDA 4. Use Cases related to an MDA 5. Learn More 6. Q&A Page 40
  • 41. Page 41 © Hortonworks Inc. 2011 – 2014. All Rights Reserved © Hortonworks Inc. 2013 5. Learn More … Page 41 Resource Location MDA White Paper http://info.hortonworks.com/data-lake-hadoop-whitepaper.html Learn more about Modern Data Architecture (MDA) MDA Web Page http://hortonworks.com/hadoop-modern-data-architecture/ Explore Use Cases by Industry Hortonworks Sandbox http://hortonworks.com/products/hortonworks-sandbox/ Get Started on Hadoop with Hortonworks Sandbox Hadoop Tutorials http://info.hortonworks.com/On-demand-Tutorials_Sign-Up-Page.html On-Demand Hadoop Tutorials Delivered to Your Inbox Enterprise Data Lake http://hortonworks.com/blog/enterprise-hadoop-journey-data-lake/ Enterprise Hadoop and the journey to Data Lake
  • 42. Page 42 © Hortonworks Inc. 2011 – 2014. All Rights Reserved © Hortonworks Inc. 2013 Outline 1. Drivers for an MDA 2. What’s an MDA 3. Hadoop’s role in an MDA 4. Use Cases related to an MDA 5. Learn More 6. Q&A Page 42
  • 43. Page 43 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 6. Q&A… Thank you!