Accelerating Success with Rapid Data
Integration for the Modern Data
Architecture
John Kreisa, Hortonworks
Lawrence Schwartz, Attunity
Speakers
Lawrence	
  Schwartz,	
  
A/unity	
  
John	
  Kreisa,	
  
Hortonworks	
  
Customer Momentum
•  230+ customers (as of Q3 2014)
Hortonworks Data Platform
•  Completely open multi-tenant platform for any app & any
data.
•  A centralized architecture of consistent enterprise
services for resource management, security, operations,
and governance.
Partner for Customer Success
•  Open source community leadership focus on enterprise
needs
•  Unrivaled world class support
•  Founded in 2011
•  Original 24 architects,
developers,
operators of Hadoop from
Yahoo!
•  600+ Employees
•  1000+ Ecosystem Partners
Hadoop for the Enterprise:
Implement a Modern Data Architecture with HDP
Traditional systems under pressure
Challenges
•  Constrains data to app
•  Can’t manage new data
•  Costly to Scale
Business Value
Clickstream
Geolocation
Web Data
Internet of Things
Docs, emails
Server logs
2012
2.8 Zettabytes
2020
40 Zettabytes
LAGGARDS
INDUSTRY
LEADERS
1
2 New Data
ERP CRM SCM
New
Traditional
Hadoop emerged as foundation of new data architecture
Apache Hadoop is an open source data platform for managing
large volumes of high velocity and variety of data
•  Built by Yahoo! to be the heartbeat of its ad & search business
•  Donated to Apache Software Foundation in 2005 with rapid
adoption by large web properties & early adopter enterprises
Hadoop Advantages
ü  Manages new data paradigm
ü  Handles data at scale
ü  Cost effective
ü  Open source
Application
Storage
HDFS
Batch Processing
MapReduce
The Modern Data Architecture
Provision,
Manage &
Monitor
APPLICATIONS	
  DATA	
  	
  SYSTEM	
  
OPERATIONAL	
  TOOLS	
  
DEV	
  &	
  DATA	
  TOOLS	
  
INFRASTRUCTURE	
  
Build & Test
On Premise or in
the Cloud
SOURCES	
  
OLTP,	
  ERP,	
  
CRM	
  Systems	
  
Documents,	
  	
  
Emails	
  
Web	
  Logs,	
  
Click	
  Streams	
  
Social	
  
Networks	
  
Machine	
  
Generated	
  
Sensor	
  
Data	
  
GeolocaCon	
  
Data	
  
Repositories
RDBMS
EDW
MPP
HDP
Governance
&Integration
Security
Operations
Data Access
Data Management
YARN
Data
Marts
Business
Analytics
Visualization
& Dashboards
Data
Marts
Business
Analytics
Visualization
& Dashboards
Hadoop Driver: Cost OptimizationANALYTICS
Data
Marts
Business
Analytics
Visualization
& Dashboards
ANALYTICSDATASYSTEMS
Data
Marts
Business
Analytics
Visualization
& Dashboards
HDP 2.2
ELT
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
N
Cold Data,
Deeper Archive
& New Sources
Enterprise Data
Warehouse
Hot
MPP
In-Memory
Clickstream	
   Web	
  	
  
&	
  Social	
  
GeolocaMon	
   Sensor	
  	
  
&	
  Machine	
  
Server	
  	
  
Logs	
  
Unstructured	
  
Existing Systems
ERP	
   CRM	
   SCM	
  
SOURCES
Archive Data off EDW
Move rarely used data to
Hadoop as active archive,
store more data longer
Offload costly ETL
Free your EDW to perform
high-value functions like
analytics & operations, not
ETL
Enrich the value of
your EDW
Use Hadoop to refine new
data sources, such as web
and machine data for new
analytical context
The Modern Data Architecture & Attunity
Provision,
Manage &
Monitor
APPLICATIONS	
  DATA	
  	
  SYSTEM	
  
OPERATIONAL	
  TOOLS	
  
DEV	
  &	
  DATA	
  TOOLS	
  
INFRASTRUCTURE	
  
Build & Test
On Premise or in
the Cloud
SOURCES	
  
OLTP,	
  ERP,	
  
CRM	
  Systems	
  
Documents,	
  	
  
Emails	
  
Web	
  Logs,	
  
Click	
  Streams	
  
Social	
  
Networks	
  
Machine	
  
Generated	
  
Sensor	
  
Data	
  
GeolocaCon	
  
Data	
  
Repositories
RDBMS
EDW
MPP
HDP
Governance
&Integration
Security
Operations
Data Access
Data Management
YARN
Data
Marts
Business
Analytics
Visualization
& Dashboards
Data
Marts
Business
Analytics
Visualization
& Dashboards
Data
Integration
Attunity Corporate Overview
Overview	
  
§  Exchange	
  (Ticker): 	
  NASDAQ	
  (ATTU)	
  
§  Headquarters: 	
  Burlington,	
  MA	
  
§  Customers: 	
  >	
  2000	
  in	
  60	
  countries
	
  
	
  	
  
Making	
  Any	
  Data	
  Available	
  AnyMme,	
  Anywhere	
  
Analytics / BI
Distribution / DR
Archiving / Testing
We	
  Move	
  
the	
  Data	
  
that	
  Moves	
  
Our	
  
Customers’	
  
Business	
  
To Where the Data Needs to BeERP
CRM
POS
Legacy
Logs
Sensors
Files
9	
  
Data	
  
Warehouse	
  
Database	
   Cloud	
  
Hadoop	
  
Global	
  Offices	
  
To Use Data, You Must Move it!
10	
  
Data Needs to Be Moved to Be Useful
» 80%	
  of	
  the	
  work	
  that	
  data	
  
scien0sts	
  put	
  into	
  big	
  data	
  projects	
  
is	
  spent	
  on	
  data	
  integra-on	
  and	
  
resolving	
  data	
  quality	
  issues.	
  
Source:	
  “For	
  Big	
  Data	
  ScienCsts,	
  “Janitor	
  Work”	
  is	
  Key	
  Hurtle	
  to	
  Insights,”	
  by	
  Steve	
  Lohr,	
  New	
  York	
  
Times,	
  August	
  17,	
  2014	
  
Data Integration Remains a Major Challenge
1.  Long	
  rollout	
  
2.  Lots	
  of	
  personnel	
  
3.  Mixed	
  systems	
  
4.  Hard	
  to	
  maintain	
  
5.  Not	
  real-­‐Mme	
  
Turning Data Into Value
More Data
Less Time
Less Cost
13	
  
Data	
   Value	
  
The	
  A/unity	
  SoluMon	
  for	
  Big	
  Data	
  	
  
•  Fully automated, end-to-end. No scripting
•  Fast, high performance integration
•  Optimized for a broad range of platforms
•  Single pane of glass monitoring
•  Real-time change data capture
Attunity’s Big Solutions for Big Data
InformaMon	
  availability	
  soluMons	
  that	
  deliver	
  compeMMve	
  advantage	
  
14	
  
Business	
  Data	
  
(Oracle,	
  SQL	
  Server,	
  Teradata,	
  etc…)	
  
Machine	
  and	
  File	
  Data	
  
(logs,	
  sensors,	
  files,	
  etc…)	
  
ApplicaMon	
  Data	
  
(SAP,	
  Salesforce,	
  etc…)	
  
Cloud	
  Data	
  
(AWS	
  RDS,	
  Redshic,	
  etc…)	
  
15	
  
Attunity Offerings
15	
  
BUSINESS DATA
Attunity Replicate and Maestro
APPLICATION DATA
Attunity Gold Client
»  High-performance data replication
software to accelerate and reduce the
costs of distributing, sharing and
ensuring the availability of data
»  Software for SAP that reduces storage
requirements, improves the quality and
availability of test data, restores development
integrity, and helps ensure data security.
MACHINE AND FILE
Attunity RepliWeb, Replicate, and Maestro
»  Attunity Replicate, RepliWeb and
Maestro offer highly scalable replication
and synchronization for unstructured
files, machine data and Hadoop
CLOUD DATA
Attunity CloudBeam
»  Attunity CloudBeam is a SaaS platform
offering services for uploading and
synchronizing Big Data to, from, and between
cloud environments
‘Sqooping’ Big Data –
Loading Data the Hard Way
»  Apache Sqoop -– great tool, but not
enough
»  Designed for transferring bulk data between
Hadoop and databases
»  Not capable of CDC
»  Doesn't optimize network traffic
»  Script based interface importing data table
at the time
»  Limited number of standard database connectors
16	
  
Sqoop command line interface
Attunity Replicate Architecture
17	
  
»  Advanced	
  Monitoring	
  and	
  Control	
  
»  Click-­‐to-­‐Replicate	
  Design	
  
»  Fast	
  Loading	
  and	
  	
  
Real-­‐Time	
  CDC	
  
»  Broadest	
  Placorm	
  Support	
  
»  Non-­‐intrusive	
  Architecture	
  
Move	
  Any	
  Data,	
  Any	
  Time,	
  Any	
  Where.	
  
Use Case: Cable Provider
Modern Data Architecture with Hadoop
The Journey to the Data Lake
Bulk Load
Change Data
Click-­‐2-­‐Replicate	
  Design.	
  
Drag.	
  Drop.	
  Done.	
  
Databases	
  
Data	
  Feed	
  Sources	
  
CSV	
  
Data Refresh
Data Append
Finance	
  
Support	
  
MarkeMng	
  
Sales	
  
Engineering	
  
ODS	
   Business	
  Units	
  
Data Lake
Use Case: Managed Health Care –
Creating Golden Data Set
Ad-­‐hoc	
  	
  
AnalyMcs	
  
Bulk Load
Change Data
Click-­‐2-­‐Replicate	
  Design.	
  
Drag.	
  Drop.	
  Done.	
  
Databases	
  
Data	
  Feed	
  Sources	
  
CSV	
  
BI	
  	
  
ReporMng	
  
VisualizaMon	
  
&	
  AnalyMcs	
  
ODS	
  
Data Refresh
Data Append
ETL	
  
Staging
Area
Business	
  
TransformaMon	
  
Rules	
  Applied	
  
Use Case: Financial Services Institution –
Fraud Detection
Ad-­‐hoc	
  	
  
AnalyMcs	
  
Bulk Load
Change Data
Data	
  Feed	
  Sources	
  
BI	
  	
  
ReporMng	
  
VisualizaMon	
  
&	
  AnalyMcs	
  
ODS	
  
(PostgreSQL)	
  
Data Refresh
Data Append
ETL	
  
Staging
Area
Business	
  
TransformaMon	
  
Rules	
  Applied	
  
CDC	
  
ATTUNITY MAESTRO	
  
EDW/Data	
  
Mart	
  
	
  
 	
  	
  
Use Case: Sales Management Software
Data Consolidation
ATTUNITY MAESTRO	
  
MAESTRO NODE	
  MAESTRO NODE	
  MAESTRO NODE	
  
Headquarters	
  (HQ)	
  
Regional	
  Data	
  Center	
  
Data	
  From	
  SaaS	
  Customers	
  
21	
  
Replicate
Server	
  
California	
   New York	
  
Customer 1	
   Customer 2	
   Customer 3	
   Customer	
  4	
   Customer 5	
  
HQ	
  
…	
  
Replicate
Server	
  
Replicate
Server	
  
Replicate
Server	
  
Replicate
Server	
  
Replicate
Server	
  
…	
  
Data Lake
Who’s Our Lucky Winner?
Next Steps
Download the Hortonworks Attunity Paper
“The Modern Data Architecture and
Automating Data Transfer”
Hortonworks.com/partner/Attunity/
Learn Hadoop – Download the Sandbox
Hortonworks.com/sandbox/
Learn More about Attunity & Hortonworks
Attunity.com/hortonworks
Hortonworks.com/partner/Attunity/
Thank You!
HDP delivers a completely open data platform
Hortonworks Data Platform provides Hadoop for the Enterprise: a centralized
architecture of core enterprise services, for any application and any data.
Completely Open
•  HDP incorporates every element required of an enterprise data platform: data storage, data access,
governance, security, operations
Hortonworks Data Platform 2.2
YARN: Data Operating System
(Cluster Resource Management)
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
ApachePig
° °
° °
° ° °
° ° °
HDFS
(Hadoop Distributed File System)
GOVERNANCE BATCH, INTERACTIVE & REAL-TIME DATA ACCESS
Apache Falcon
ApacheHive
Cascading
ApacheHBase
ApacheAccumulo
ApacheSolr
ApacheSpark
ApacheStorm
Apache Sqoop
Apache Flume
Apache Kafka
SECURITY
Apache Ranger
Apache Knox
Apache Falcon
OPERATIONS
Apache Ambari
Apache
Zookeeper
Apache Oozie

Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Modern Data Architecture

  • 1.
    Accelerating Success withRapid Data Integration for the Modern Data Architecture John Kreisa, Hortonworks Lawrence Schwartz, Attunity
  • 2.
    Speakers Lawrence  Schwartz,   A/unity   John  Kreisa,   Hortonworks  
  • 3.
    Customer Momentum •  230+customers (as of Q3 2014) Hortonworks Data Platform •  Completely open multi-tenant platform for any app & any data. •  A centralized architecture of consistent enterprise services for resource management, security, operations, and governance. Partner for Customer Success •  Open source community leadership focus on enterprise needs •  Unrivaled world class support •  Founded in 2011 •  Original 24 architects, developers, operators of Hadoop from Yahoo! •  600+ Employees •  1000+ Ecosystem Partners Hadoop for the Enterprise: Implement a Modern Data Architecture with HDP
  • 4.
    Traditional systems underpressure Challenges •  Constrains data to app •  Can’t manage new data •  Costly to Scale Business Value Clickstream Geolocation Web Data Internet of Things Docs, emails Server logs 2012 2.8 Zettabytes 2020 40 Zettabytes LAGGARDS INDUSTRY LEADERS 1 2 New Data ERP CRM SCM New Traditional
  • 5.
    Hadoop emerged asfoundation of new data architecture Apache Hadoop is an open source data platform for managing large volumes of high velocity and variety of data •  Built by Yahoo! to be the heartbeat of its ad & search business •  Donated to Apache Software Foundation in 2005 with rapid adoption by large web properties & early adopter enterprises Hadoop Advantages ü  Manages new data paradigm ü  Handles data at scale ü  Cost effective ü  Open source Application Storage HDFS Batch Processing MapReduce
  • 6.
    The Modern DataArchitecture Provision, Manage & Monitor APPLICATIONS  DATA    SYSTEM   OPERATIONAL  TOOLS   DEV  &  DATA  TOOLS   INFRASTRUCTURE   Build & Test On Premise or in the Cloud SOURCES   OLTP,  ERP,   CRM  Systems   Documents,     Emails   Web  Logs,   Click  Streams   Social   Networks   Machine   Generated   Sensor   Data   GeolocaCon   Data   Repositories RDBMS EDW MPP HDP Governance &Integration Security Operations Data Access Data Management YARN Data Marts Business Analytics Visualization & Dashboards Data Marts Business Analytics Visualization & Dashboards
  • 7.
    Hadoop Driver: CostOptimizationANALYTICS Data Marts Business Analytics Visualization & Dashboards ANALYTICSDATASYSTEMS Data Marts Business Analytics Visualization & Dashboards HDP 2.2 ELT ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N Cold Data, Deeper Archive & New Sources Enterprise Data Warehouse Hot MPP In-Memory Clickstream   Web     &  Social   GeolocaMon   Sensor     &  Machine   Server     Logs   Unstructured   Existing Systems ERP   CRM   SCM   SOURCES Archive Data off EDW Move rarely used data to Hadoop as active archive, store more data longer Offload costly ETL Free your EDW to perform high-value functions like analytics & operations, not ETL Enrich the value of your EDW Use Hadoop to refine new data sources, such as web and machine data for new analytical context
  • 8.
    The Modern DataArchitecture & Attunity Provision, Manage & Monitor APPLICATIONS  DATA    SYSTEM   OPERATIONAL  TOOLS   DEV  &  DATA  TOOLS   INFRASTRUCTURE   Build & Test On Premise or in the Cloud SOURCES   OLTP,  ERP,   CRM  Systems   Documents,     Emails   Web  Logs,   Click  Streams   Social   Networks   Machine   Generated   Sensor   Data   GeolocaCon   Data   Repositories RDBMS EDW MPP HDP Governance &Integration Security Operations Data Access Data Management YARN Data Marts Business Analytics Visualization & Dashboards Data Marts Business Analytics Visualization & Dashboards Data Integration
  • 9.
    Attunity Corporate Overview Overview   §  Exchange  (Ticker):  NASDAQ  (ATTU)   §  Headquarters:  Burlington,  MA   §  Customers:  >  2000  in  60  countries       Making  Any  Data  Available  AnyMme,  Anywhere   Analytics / BI Distribution / DR Archiving / Testing We  Move   the  Data   that  Moves   Our   Customers’   Business   To Where the Data Needs to BeERP CRM POS Legacy Logs Sensors Files 9   Data   Warehouse   Database   Cloud   Hadoop   Global  Offices  
  • 10.
    To Use Data,You Must Move it! 10  
  • 11.
    Data Needs toBe Moved to Be Useful » 80%  of  the  work  that  data   scien0sts  put  into  big  data  projects   is  spent  on  data  integra-on  and   resolving  data  quality  issues.   Source:  “For  Big  Data  ScienCsts,  “Janitor  Work”  is  Key  Hurtle  to  Insights,”  by  Steve  Lohr,  New  York   Times,  August  17,  2014  
  • 12.
    Data Integration Remainsa Major Challenge 1.  Long  rollout   2.  Lots  of  personnel   3.  Mixed  systems   4.  Hard  to  maintain   5.  Not  real-­‐Mme  
  • 13.
    Turning Data IntoValue More Data Less Time Less Cost 13   Data   Value   The  A/unity  SoluMon  for  Big  Data     •  Fully automated, end-to-end. No scripting •  Fast, high performance integration •  Optimized for a broad range of platforms •  Single pane of glass monitoring •  Real-time change data capture
  • 14.
    Attunity’s Big Solutionsfor Big Data InformaMon  availability  soluMons  that  deliver  compeMMve  advantage   14   Business  Data   (Oracle,  SQL  Server,  Teradata,  etc…)   Machine  and  File  Data   (logs,  sensors,  files,  etc…)   ApplicaMon  Data   (SAP,  Salesforce,  etc…)   Cloud  Data   (AWS  RDS,  Redshic,  etc…)  
  • 15.
    15   Attunity Offerings 15   BUSINESS DATA Attunity Replicate and Maestro APPLICATION DATA Attunity Gold Client »  High-performance data replication software to accelerate and reduce the costs of distributing, sharing and ensuring the availability of data »  Software for SAP that reduces storage requirements, improves the quality and availability of test data, restores development integrity, and helps ensure data security. MACHINE AND FILE Attunity RepliWeb, Replicate, and Maestro »  Attunity Replicate, RepliWeb and Maestro offer highly scalable replication and synchronization for unstructured files, machine data and Hadoop CLOUD DATA Attunity CloudBeam »  Attunity CloudBeam is a SaaS platform offering services for uploading and synchronizing Big Data to, from, and between cloud environments
  • 16.
    ‘Sqooping’ Big Data– Loading Data the Hard Way »  Apache Sqoop -– great tool, but not enough »  Designed for transferring bulk data between Hadoop and databases »  Not capable of CDC »  Doesn't optimize network traffic »  Script based interface importing data table at the time »  Limited number of standard database connectors 16   Sqoop command line interface
  • 17.
    Attunity Replicate Architecture 17   »  Advanced  Monitoring  and  Control   »  Click-­‐to-­‐Replicate  Design   »  Fast  Loading  and     Real-­‐Time  CDC   »  Broadest  Placorm  Support   »  Non-­‐intrusive  Architecture   Move  Any  Data,  Any  Time,  Any  Where.  
  • 18.
    Use Case: CableProvider Modern Data Architecture with Hadoop The Journey to the Data Lake Bulk Load Change Data Click-­‐2-­‐Replicate  Design.   Drag.  Drop.  Done.   Databases   Data  Feed  Sources   CSV   Data Refresh Data Append Finance   Support   MarkeMng   Sales   Engineering   ODS   Business  Units   Data Lake
  • 19.
    Use Case: ManagedHealth Care – Creating Golden Data Set Ad-­‐hoc     AnalyMcs   Bulk Load Change Data Click-­‐2-­‐Replicate  Design.   Drag.  Drop.  Done.   Databases   Data  Feed  Sources   CSV   BI     ReporMng   VisualizaMon   &  AnalyMcs   ODS   Data Refresh Data Append ETL   Staging Area Business   TransformaMon   Rules  Applied  
  • 20.
    Use Case: FinancialServices Institution – Fraud Detection Ad-­‐hoc     AnalyMcs   Bulk Load Change Data Data  Feed  Sources   BI     ReporMng   VisualizaMon   &  AnalyMcs   ODS   (PostgreSQL)   Data Refresh Data Append ETL   Staging Area Business   TransformaMon   Rules  Applied   CDC   ATTUNITY MAESTRO   EDW/Data   Mart    
  • 21.
          UseCase: Sales Management Software Data Consolidation ATTUNITY MAESTRO   MAESTRO NODE  MAESTRO NODE  MAESTRO NODE   Headquarters  (HQ)   Regional  Data  Center   Data  From  SaaS  Customers   21   Replicate Server   California   New York   Customer 1   Customer 2   Customer 3   Customer  4   Customer 5   HQ   …   Replicate Server   Replicate Server   Replicate Server   Replicate Server   Replicate Server   …   Data Lake
  • 22.
  • 23.
    Next Steps Download theHortonworks Attunity Paper “The Modern Data Architecture and Automating Data Transfer” Hortonworks.com/partner/Attunity/ Learn Hadoop – Download the Sandbox Hortonworks.com/sandbox/ Learn More about Attunity & Hortonworks Attunity.com/hortonworks Hortonworks.com/partner/Attunity/
  • 24.
  • 25.
    HDP delivers acompletely open data platform Hortonworks Data Platform provides Hadoop for the Enterprise: a centralized architecture of core enterprise services, for any application and any data. Completely Open •  HDP incorporates every element required of an enterprise data platform: data storage, data access, governance, security, operations Hortonworks Data Platform 2.2 YARN: Data Operating System (Cluster Resource Management) 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ApachePig ° ° ° ° ° ° ° ° ° ° HDFS (Hadoop Distributed File System) GOVERNANCE BATCH, INTERACTIVE & REAL-TIME DATA ACCESS Apache Falcon ApacheHive Cascading ApacheHBase ApacheAccumulo ApacheSolr ApacheSpark ApacheStorm Apache Sqoop Apache Flume Apache Kafka SECURITY Apache Ranger Apache Knox Apache Falcon OPERATIONS Apache Ambari Apache Zookeeper Apache Oozie