Data is exponentially increasing in both types and volumes, creating opportunities for businesses. Watch this video and learn from three Big Data experts: John Kreisa, VP Strategic Marketing at Hortonworks, Imad Birouty, Director of Technical Product Marketing at Teradata and John Haddad, Senior Director of Product Marketing at Informatica.
Multiple systems are needed to exploit the variety and volume of data sources, including a flexible data repository. Learn more about:
- Apache Hadoop 2 and YARN
- Data Lakes
- Intelligent data management layers needed to manage metadata and usage patterns as well as track consumption across these data platforms.
14. Analysts Recommend:
Shift from a Single Platform to an Ecosystem
“We will abandon the old models
based on the desire to implement
for high-value analytic
applications.”
"Logical" Data Warehouse
15. Marketing
Applications
Business
Intelligence
Data
Mining
Math
and Stats
Languages
ANALYTIC
TOOLS & APPS
Customers
Partners
Business
Analysts
Data
Scientists
USERS
UNIFIED DATA ARCHITECTURE
MOVE MANAGE ACCESS
DATA WAREHOUSE
DISCOVERY PLATFORM
ERP
SCM
CRM
Images
Audio
and Video
Machine
Logs
Text
Web and
Social
SOURCES
DATA
PLATFORM
Marketing
Executives
Operational
Systems
Frontline
Workers
Engineers
Fast Loading
Filtering and
Processing
Online Archival
Business Intelligence
Predictive Analytics
Operational Intelligence
Data Discovery
Path, graph, time-series analysis
Pattern Detection
16. Marketing
Applications
Business
Intelligence
Data
Mining
Math
and Stats
Languages
ANALYTIC
TOOLS & APPS
Customers
Partners
Business
Analysts
Data
Scientists
USERS
UNIFIED DATA ARCHITECTURE
MOVE MANAGE ACCESS
DATA WAREHOUSE
DISCOVERY PLATFORM
ERP
SCM
CRM
Images
Audio
and Video
Machine
Logs
Text
Web and
Social
SOURCES
DATA
PLATFORM
Marketing
Executives
Operational
Systems
Frontline
Workers
Engineers
17. Data Lake Overview
• The single source of raw, historical, and real-time
operational data
• The ability to cost effectively
explore data sets of unknown,
under-appreciated, or
unrecognized value
• The reduction of LOB specific
big data environments, which
reduces costs and analytical
discrepancies
• The co-location of data sets to
enable light, on-the-fly integration
18. Approaches to Data Integration
Schema on Write
• Well understood data
• Relational integrity
• Storage efficiency
Schema On Read
• Dynamic data
• Reduced coordination
• Human readable
Data Warehouse
Data Lake
19. The “Capture Everything” Approach
“Capture only
what’s needed”
IT delivers a platform for
storing, refining, and
analyzing all data
sources
Business explores data
for questions worth
answering
Big Data Method
Multi-structured & Iterative
Analysis
IT structures the data to
answer those questions
Business determines
what questions to ask
Classic Method
Structured & Repeatable
Analysis
“Capture in case
it’s needed”
20. Automobile Sensor Data
Use Case
Value from combining business data with detail data
• Determine which cars to recall for bad battery lot
> Business data held in data warehouse
> Detailed sensor data held in data lake
> Query combines data
> Determine which cars to repair
TERADATA
PRODUCTION
DATA
• VINs
• Service
records
• Warranty data
• DTC
descriptions
HADOOP
RAW MULTI-STRUCTURED
DATA
• Battery
Temperature
Sensor data
Battery Temperature vs. Air Temperature
21. Customer Value Based on Social Influence
Use Case
HADOOP
TERADATA
ASTER
DATABASE
TERADATA
DATABASE
• Determine high value
customers based on history
• Determine customer value
based on social influence
• Determine
customer
sentiment
<=
• Determine
customer
sphere of
influence
$$
22. Data Optimization for the
Modern Data Architecture
John Haddad, Senior Director, Product
Marketing, Informatica
23. The Big Data Journey
The Big Data Journey
Optimize infrastructure for
performance, cost, &
scalability
A single place to
manage the supply and
demand of data
Real-time proactive
customer engagement
Data Warehouse
Optimization
Real-Time
Customer Analytics
Managed Data
Lake
Big Data
business
initiatives
IT driven Business driven
24. Proactive Customer Engagement
Web Logs
Clickstream Data
Streaming
Big Data Integration / Analytics
Master
Data
Mgmt Financial Advisors
Integration
& Quality
Customer / Product
Master
Customer
Customer
Smartphone
Real-Time
Event
Processing
Visualization
Social Data / Signals
Social Data
Connector
FIX, SWIFT,
Market Data
Customer Portal
DATA
PLATFORM
DISCOVERY
PLATFORM
DATA
WAREHOUSE
25. Proactive Patient Member Engagement
Web Logs
Clickstream Data
Streaming
Big Data Integration / Analytics
Care Providers
Integration
& Quality
Patient Member
Patient Member
Smartphone
Real-Time
Event
Processing
Visualization
Social Data / Signals
Social Data
Connector
RFID, Patient
Monitoring
Healthcare &
Patient Forums
Master
Data
Mgmt
Member / Provider
Master
DATA
PLATFORM
DISCOVERY
PLATFORM
DATA
WAREHOUSE
26. Unified Data
Architecture
DATA
PLATFORM
DISCOVERY
PLATFORM
DATA
WAREHOUSE
The Intelligent
Data Platform
Role-Based Data Management
Tools
Infrastructure Services
Data Intelligence
Metadata Meets Machine Learning
Data Infrastructure
Vibe ™ Virtual Data Machine
New
Industry-
Leading
Data Lake Infrastructure
27. Data Lake Architecture
Informatica Developers are Now Hadoop Developers
Visual Development Environment
Enterprise
Repositories
MDM
DATA REFINEMENT
PPrroofifliele
Parse
ETL
Cleanse
Match
LOAD
SOURCE
DATA
Batch
Replicate
Stream
Archive
Databases
Files
Servers &
Mainframe
JMS Queue’s
Social
Sensor data
SQL
Apache
Hive
Apache
MapReduce
Apache
Tez
Apache
YARN
1
°
°
°
°
°
°
°
°
°
°
°
°
°
N
HDFS
(Hadoop
Distributed
File
System)
DELIVER
Batch
Services
Events
Topics
DATA
WAREHOUSE
28. How do you plan to staff your Big Data projects?
4 weeks
4 days!
2X performance!
Vs.
Hadoop
Hand-coders
Informatica developers
Choose tools that leverages existing skills so you can quickly
staff Big Data projects
29. How do you adopt and minimize the impact of new
and rapidly changing technologies?
Hadoop
Development
Deployment
Cloud DI Servers Data
Warehouse
Choose a platform and tools that minimize the need to
rebuild your data pipeline as technologies change
30. How long does it take you to deploy Big Data
projects to production?
Time to Deploy
Available
24x7 Scale
Maximize
Reuse
Performance
Automa4cally
Deploy
Easy
to
Maintain Flexible
to
Change
Time to Deploy
Everything you build in the sandbox should be immediately
deployed as enterprise ready production