SlideShare a Scribd company logo
HADOOP & THE DATA WAREHOUSE:
WHEN TO USE WHICH
Steve Wooledge – Teradata Labs
Jim Walker – Hortonworks
1
Topics
• Trends in enterprise data architectures
• The value of an integrated data warehouse
• The value of Hadoop
• Bringing it all together and next steps
Big Data Comes with BIG HEADACHES
Even free software like Hadoop is causing
companies to spend more money…Many CIOs believe
data is inexpensive because storage has become
inexpensive. But data is inherently messy—it can be
wrong, it can be duplicative, and it can be irrelevant—
which means it requires handling, which is where the
real expenses come in.
“
”
Through 2015, 85% of Fortune 500 organizations will
be unable to exploit big data for competitive advantage.
“ ”Source: The Wall Street Journal. “CIOs’ Big Problem with Big Data”. Aug 2012
Source: Gartner. “Information Innovation: Innovation Key Initiative Overview”. April 2012
Organizations Face Several Obstacles with Big Data
Source: Big Analytics 2012 Survey, Teradata
Difficulty
managing
multiple systems,
new types of data
Hard to find right
skills; Lack of
supportability
for new systems &
“data scientists”
Difficulty
deploying and
integrating new
systems
Difficulty
providing
accessibility to
fast insights on
big data
Shift from a Single Platform to an Ecosystem
“Big Data requirements are solved by
a range of platforms including
analytical databases, discovery
platforms, and NoSQL solutions
beyond Hadoop.”
“We will abandon the old models
based on the desire to implement for
high-value analytic applications.”
"Logical" Data Warehouse
Source: “Big Data Comes of Age”. EMA and 9sight Consulting. Nov 2012.
AUDIO & VIDEO IMAGES TEXT WEB & SOCIAL MACHINE LOGS CRM SCM ERP
DISCOVERY
PLATFORM
CAPTURE | STORE | REFINE
INTEGRATED
DATA WAREHOUSE
LANGUAGES MATH & STATS DATA MINING BUSINESS INTELLIGENCE APPLICATIONS
Engineers
Data Scientists
Business Analysts
Front-Line WorkersCustomers / PartnersMarketing
Operational SystemsExecutives
TERADATA UNIFIED DATA ARCHITECTURE
Topics
• Trends in enterprise data architectures
• The value of an integrated data warehouse
• The value of Hadoop
• Bringing it all together and next steps
DUAL
SYSTEMS
DATA
MARTS
ANALYTICAL
ARCHIVE
TEST/
DEV
The Value of The Data Warehouse
INDEPENDENT
DATA MART
Business Analysts
Knowledge Workers
DATA MININGBUSINESS INTELLIGENCE APPLICATIONS
Customers/Partners
Marketing
Executives
Front-line Workers
Operational Systems
INTEGRATED
DATA WAREHOUSE
DATA
LAB
Integrated Analytics
Advanced
Analytics
Temporal
OLAP
Optimization
Geospatial
Big Data
Integration
Application
Development
Agile
Analytics
Data
Exploration
Benefits
•Easy to consume data
•Rationalization of data
from multiple sources
into single enterprise
view
•Clean, safe, secure data
•Cross-functional
analysis
•Transform once, use
many
•Fast response times
SQL Advantages with an MPP RDBMS
• Full ANSI SQL:
• The lingua franca of business users when accessing data
• Decades of standardization (stable, feature rich, portable)
• Mature 3rd Party SQL based tools that provide business users with
self service direct access to the data
• BI Tools
• In-database statistical packages
• Analytic applications (CRM, SCM, MDM)
• Easily parallelized
• Scalable when manipulating large data sets
6/27/2013 9
ACID Advantages in an MPP RDBMS
• Guarantees database actions are
processed reliably
• Ensures 100% query result accuracy
• Supports updates and deletes
• Needed for applications that require
100% consistency
6/27/2013 10
Atomicity - All of the pieces are
committed or none are committed.
Consistency - Creates a new and
valid state of data, or, if any failure
occurs, returns all data to its original
state.
Isolation - Processed and not yet
committed transactions must remain
isolated from any other
transactions.
Durability - Committed data is
saved such that in event of a failure
and system restart, the data is
available in its correct state.
Tight Vertical Integration
• End-to-end management of resources
• Efficient utilization of resources
• Engineered extremely well for known data
• Fine-grained parallelism and resource management
• Consistency of service level delivery
Best Practices Management:
• Workload functions
• Workload groups
• Exceptions
• Priorities
• Time periods
Low Latency Advantages of MPP RDBMS
Multi-temperature
storage with automated
distribution of data based
on access patterns:
• In-Memory
• Solid-State Drives
• Fast Hard Drives
• Fat Hard Drives
6/27/2013 12
• Indexes
• Statistics
• Advanced partitioning
Cost Based Optimizer Advantages in an MPP RDBMS
• Best practices optimizer determines how
the query will be processed most
efficiently, with no “hints” or degrees of
parallelism necessary.
• In chess, you can look out a few moves
to decide your best next move, but you
can’t envision all move and countermove
sequences for the entire game:
• The Grand Master has the
knowledge, experience, and
intelligence to identify and use
the right strategy.
• With Hadoop, the user takes a
heavy role in optimizing the
execution of queries.
• With an MPP RDBMS, the
software is the optimizer.
6/27/2013 13
Query Rewrite
• semantic optimization
• different types of vendor tools
Fast/Efficient Data Access
• Access path - Indexing
• Partitioning (CP & PPI)
• Advanced partitioning schemes
(range & case based, multilevel,
dynamic)
• IO Optimizations (efficient
scans/sync scan) scan optimization
Query Complexity
• Join costing & planning
• Aggregation
Many ways to process a complex query…
Granular Security Advantages in an MPP RDBMS
• Row level security
• Column level security
• An MPP RDBMS tightly integrates mature security features
• User-level security controls
• Increased user authentication options
• Support for security roles
• Enterprise directory integration
• Auditing and monitoring controls
• Encryption
6/27/2013 14
MPP RDBMS Customer Examples
6/27/2013 15
Topics
• Trends in enterprise data architectures
• The value of an integrated data warehouse
• The value of Hadoop
• Bringing it all together and next steps
© Hortonworks Inc. 2012
By the year 2015, we believe half the worlds
data will be processed by Apache Hadoop
Key Hadoop Features for the EDW
•Storage/Processing
•Metadata
© Hortonworks Inc. 2012
Data
Explosion
The World of Data is Changing
Page 18
By 2015, organizations that build a modern information management
system will outperform their peers financially by 20 percent.
– Gartner, Mark Beyer, “Information Management in the 21st Century”
1 Zettabyte
(ZB)
=
1 Billion
TBs
15x
growth rate of
machine
generated data
by 2020
Source: IDC
© Hortonworks Inc. 2012
StorageApache Hadoop: Center of Big Data Strategy
Open Source data management
with scale-out storage &
distributed processing
Page 19
HDFS
• Distributed across “nodes”
• Natively redundant
• Name node tracks locations
Processing
Map Reduce
• Splits a task across processors
“near” the data & assembles results
• Self-Healing, High Bandwidth
Clustered Storage
Key Characteristics
• Scalable
– Efficiently store and process
petabytes of data
– Linear scale driven by additional
processing and storage
• Reliable
– Redundant storage
– Failover across nodes and racks
• Flexible
– Store all types of data in any format
– Apply schema on analysis and
sharing of the data
• Economical
– Use commodity hardware
– Open source software guards
against vendor lock-in
© Hortonworks Inc. 2012
HCatalog
Table access
Aligned metadata
REST API
• Raw Hadoop data
• Inconsistent, unknown
• Tool specific access
Apache HCatalog provides flexible metadata
services across tools and external access
Metadata Services
• Consistency of metadata and data models across tools
(MapReduce, Pig, HBase and Hive)
• Accessibility: share data as tables in and out of HDFS
• Availability: enables flexible, thin-client access via REST API
Shared table
and schema
management
opens the
platform
© Hortonworks Inc. 2012
Page 21
“how to” deliver an open
source enterprise product
• Identify requirements
• Open community delivery
• Enterprise rigor
Apache
Hadoop
Test &
Patch
Design & Develop
Release
Apache
Pig
Apache
HCatalo
g
Apache
HBase
Other
Apache
Projects
Apache
Hive
Apache
Ambari
An Open Apache Community
Fastest path to innovation is an open community
© Hortonworks Inc. 2012
Big Data: It’s About Scale & Structure
Page 22
RDBMS HadoopNoSQLMPPEDW
best fit use
schemaRequired on write Required on read
speedReads are fast Writes are fast
governanceStandards and structured Loosely structured
processingLimited, no data processing Processing coupled with data
data typesStructured Multi and unstructured
Interactive OLAP Analytics
Complex ACID Transactions
Operational Data Store
Data Discovery
Processing unstructured data
Massive Storage/Processing
costSoftware License Support only
resourcesKnown entity Growing, complexities, wide
© Hortonworks Inc. 2012
An Emerging Data Architecture
Page 23
APPLICATIONSDATASYSTEMS
TRADITIONAL REPOS
RDBMS EDW MPP
DATASOURCES
MOBILE
DATA
OLTP,
POS
SYSTEMS
OPERATIONAL
TOOLS
MANAGE &
MONITOR
Traditional Sources
(RDBMS, OLTP, OLAP)
New Sources
(web logs, email, sensor data, social media)
DEV & DATA
TOOLS
BUILD &
TEST
Business
Analytics
Custom
Applications
Enterprise
Applications
HORTONWORKS
DATA PLATFORM
© Hortonworks Inc. 2012
Interoperating With Your Tools
Page 24
APPLICATIONSDATASYSTEMS
DEV & DATA
TOOLS
OPERATIONAL
TOOLS
Viewpoint
Microsoft Applications
HADOOP
DATASOURCES
MOBILE
DATA
OLTP,
POS
SYSTEMS
Traditional Sources
(RDBMS, OLTP, OLAP)
New Sources
(web logs, email, sensor data, social media)
AUDIO & VIDEO IMAGES TEXT WEB & SOCIAL MACHINE LOGS CRM SCM ERP
DISCOVERY
PLATFORM
CAPTURE | STORE | REFINE
INTEGRATED
DATA WAREHOUSE
LANGUAGES MATH & STATS DATA MINING BUSINESS INTELLIGENCE APPLICATIONS
Engineers
Data Scientists
Business Analysts
Front-Line WorkersCustomers / PartnersMarketing
Operational SystemsExecutives
TERADATA UNIFIED DATA ARCHITECTURE
© Hortonworks Inc. 2012
By the year 2015, we believe half the worlds
data will be processed by Apache Hadoop
Key Hadoop Features for the EDW
•Storage/Processing
•Metadata
© Hortonworks Inc. 2012
By the year 2015, we believe half the worlds
data will be processed by Apache Hadoop
Key Hadoop Features for the EDW
•Storage/Processing
•Metadata
•FAMILIARITY
Organizations Face Several Obstacles with Big Data
Source: Big Analytics 2012 Survey, Teradata
Difficulty
managing
multiple systems,
new types of data
Hard to find right
skills; Lack of
supportability
for new systems &
“data scientists”
Difficulty
deploying and
integrating new
systems
Difficulty
providing
accessibility to
fast insights on
big data
Topics
• Trends in enterprise data architectures
• The value of an integrated data warehouse
• The value of Hadoop
• Bringing it all together and next steps
Confidential and proprietary. Copyright © 2013 Teradata Corporation.30
Teradata Unified Data Architecture
• Hadoop
- Collect ALL
interaction data
• Teradata Aster
- Discovery customer
behavioral patterns
• Teradata
- Operationalize
Insights
The right technology on the right analytical problems using best of
breed technologies
Confidential and proprietary. Copyright © 2013 Teradata Corporation.31
Improved Customer Service and Retention
Hadoop
captures, stores
and transforms
social, images
and call records
Path, pattern &
graph analysis
Data Sources
Multi-Structured
Raw Data
Call Center
Voice Records
Check Images
Traditional Data Flow
Analysis +
Marketing
Automation
(Customer
Campaign)
Capture, Store
and Refine Layer
ETL Tools
Hadoop
Call Data
Integrated DW
DimensionalData
AnalyticResults
Discovery
Platform
Sentiment
Scores
SOCIAL
FEEDS
CLICKSTREAM
DATA
Confidential and proprietary. Copyright © 2013 Teradata Corporation.32
Teradata Workload-Specific Platforms
670
1650
2700
6700
Data Mart
Appliance
Extreme
Data
Appliance
Data
Warehouse
Appliance
Active
Enterprise
Data
Warehouse
Appliance for
Hadoop
Aster Big
Analytics
Appliance
SAS High
Performance
Analytics
Scale Up to 12TB Up to 186PB Up to 1.6PB Up to 61PB Up to 10PB Up to 5PB Up to 52TB
Work-
loads
Test /
Development
or Smaller
Data Marts
Analytical
Archive,
Deep Dive
Analytics
Strategic
Intelligence,
Decision
Support
System, Fast
Scan
Strategic &
Operational
Intelligence,
Real Time
Update, Active
workloads
Appliance for
Storing,
Capturing and
Refining Data.
Hortonworks
HDP 1.1
Discovery
Platform for
Big Data
Analytics with
embedded SQL
MapReduce for
new data types
& sources
Dedicated
appliance for
SAS high-
performance
analytic model
development
700
Confidential and proprietary. Copyright © 2013 Teradata Corporation.33
Teradata Unified Data Architecture
• Hadoop
- Collect ALL
interaction data
• Teradata Aster
- Discovery customer
behavioral patterns
• Teradata
- Operationalize
Insights
The right technology on the right analytical problems using best of
breed technologies
SQL-H SQL-H
Aster-Teradata
Connector
Aster Connector
for Hadoop
Teradata Connector
for Hadoop
Confidential and proprietary. Copyright © 2013 Teradata Corporation.34
Teradata SQL-H™
A Business User’s Bridge to Access Hadoop Data
Teradata SQL-H Gives Business
Users a Better Way to Access
Data Stored in Hadoop
• Trusted: Use existing tools/skills and
enable self-service BI with granular
security
• Allow standard ANSI SQL access to
Hadoop data
• Fast: Queries run on Teradata, data
accessed from Hadoop
• Efficient: Intelligent data access
leveraging the Hadoop HCatalog
Hadoop Layer: HDFS
Pig
Hive
Hadoop
MR
Teradata: SQL-H
HCatalog
Data
DataFiltering
Confidential and proprietary. Copyright © 2013 Teradata Corporation.35
The App Store of Big Data
PATH ANALYSIS
Discover Patterns in Rows of
Sequential Data
TEXT ANALYSIS
Derive Patterns and Extract
Features in Textual Data
STATISTICAL ANALYSIS
High-Performance Processing of
Common Statistical Calculations
SEGMENTATION
Discover Natural Groupings of
Data Points
MARKETING ANALYTICS
Analyze Customer Interactions to
Optimize Marketing Decisions
DATA TRANSFORMATION
Transform Data for More
Advanced Analysis
Graph Analysis
Graph analytics processing and
visualization
SQL-MapReduce
Visualization
Graphing and visualization tools
linked to key functions of the
MapReduce analytics library
Aster Discovery Portfolio: Accelerate Time to Insights
Some of the 80+ out-of-the-box analytical apps
Confidential and proprietary. Copyright © 2013 Teradata Corporation.36
Big Data Analytics & Discovery
Example Customers: Teradata Aster Big Analytics Appliance
XL Axiata
Confidential and proprietary. Copyright © 2013 Teradata Corporation.37
Discovering Deep Insights in Retail
Transforming Web Walks into DNA Sequences
Situation
Large retailer with 700M
visits/year, 2M customers / day
look at 1M products online
Problem
Increase ability of web content
owners to self-serve insights
Solution
Treat web walks like DNA
sequences of simple patterns.
Impact
• Data: loaded logs into Hortonworks
• Loaded 2 months of raw data in 1
hour, vs. 1 day on old system
• Can load a day’s log data in 60 sec
• Sessionize: Creates sequence for
visit, e.g., boils 20 customer clicks
down to 1 line:
• <Home –Search -Look at Product -
Add to Basket – Pay – Exit>
• Analyze: Business analysts can now
do path analysis
• Act:
• Segmentations by behavior can
increase conversion rates by 5-10%.
• Web design changes can drive
another 10-20% more visitors into
the sales funnel
Confidential and proprietary. Copyright © 2013 Teradata Corporation.38
Example: Online Checkout Flow Analysis
• Customers who have reached the checkout process follow an “ideal path”.
• deliveryslots > deliveryinformation > coupons > substitutions > paymentinfo > orderconfirmation
• Determine how and when (and ultimately, why) customers deviate from this path.
• Discover obstacles preventing purchase and optimize visitor flow through the web site.
• The Aster SQL-MapReduce Framework enables a variety of different path visualizations.
Teradata Portfolio for Hadoop
”Taking Hadoop from Silicon Valley to Main Street”
Most Trusted & Flexible Hadoop Platforms for Your Next-Generation
Unified Data Architecture™
1. Teradata Aster Big Analytics Appliance
2. Teradata Appliance for Hadoop
3. Teradata Commodity Offering for Hadoop (Dell)
4. Teradata Software-only for Hadoop (Hortonworks Data Platform)
Complete consulting and training capability
• Big Analytics Services – across the UDA
• Data Integration Optimization – ETL, ELT across the UDA
• Hadoop deployment & mentoring
• Teradata delivering Hortonworks training
• Hadoop Managed Services - operations & administration
Customer Support for Hadoop
• World-class Teradata customer support, backed by Hortonworks
What We Announced Today
Teradata Appliance for Hadoop
Value-Added Software Bringing Hadoop to Enterprise
Access: SQL-H™
Management: Viewpoint, TVI
Administration: Hadoop Builder,
Intelligent start/stop, DataNode
swap, deferred drive replace
High Availability : NameNode
HA, Master Machine Failover
Refining, Metadata,
Entity Resolution
Security & Data Access
HCatalog KerberosKerberos
41 6/27/2013 Teradata Confidential
Complete Consulting and Training Capability
Post-sale Services Areas of Focus
Teradata Analytic
Architecture Services
Services to scope, design, build, operate and maintain an optimal UDA approach
for Teradata, Aster, and Hadoop
Teradata DI Optimization Assess structured/non-structured data, discuss data loading techniques,
determine best platform, optimize load scripts/processes
Teradata Big Analytics Assess data value/cost of capture, identify source of “exhaust” data, create
conceptual architecture, refine and enrich the data, implement initial analytics in
Aster or best-fit tool
Teradata Workshop for
Hadoop
Introduction workshop (across all of UDA)
Teradata Data Staging for
Hadoop
Load data into landing-area; set-up data exploration/refining area; Scope
architecture and analytics; set-up Hadoop repository; Load sample data
Teradata Platform for
Hadoop
Installation guidance and mentoring for Hadoop platform, D-I-Y after installation
Teradata Managed
Services for Hadoop
Operations, management, administration, backup, security, process control for
Hadoop
Teradata Training Courses
for Hadoop
Two comprehensive, multi-day training offerings: 1) Administration of Apache
Hadoop and 2) Developing Solutions Using Apache Hadoop
42 6/27/2013 Teradata Confidential
When to Use Which?
The best approach by workload and data type
Processing as a Function of Schema Requirements and Stage of Data Pipeline
Low Cost
Storage and
Fast Loading
Data Pre-
Processing,
Refining,
Cleansing
“Simple math
at scale”
(Score, filter,
sort, avg.,
count...)
Joins,
Unions,
Aggregates
Analytics
(Iterative and
data mining)
Reporting
Stable
Schema
Evolving
Schema
Aster
(SQL +
MapReduce
Analytics)
Format,
No Schema
Hadoop Hadoop Hadoop Aster Aster
Aster
(MapReduce
Analytics)
Teradata/
Hadoop
Teradata Teradata Teradata Teradata Teradata
Hadoop
Aster /
Hadoop
Aster /
Hadoop
Aster Aster Aster
Hadoop Hadoop Hadoop Aster Aster Aster
Financial Analysis, Ad-Hoc/OLAP
Enterprise-Wide BI and Reporting
Spatial/Temporal
Active Execution
Interactive Data Discovery
Web Clickstream, Set-Top Box Analysis
CDRs, Sensor Logs, JSON
Social Feeds, Text, Image Processing
Audio/Video Storage and Refining
Storage and Batch Transformations
Confidential and proprietary. Copyright © 2013 Teradata Corporation.43
When to Use Which?
The best approach by workload and data type
Processing as a Function of Schema Requirements and Stage of Data Pipeline
Low Cost
Storage and
Fast Loading
Data Pre-
Processing,
Refining,
Cleansing
“Simple math
at scale”
(Score, filter,
sort, avg.,
count...)
Joins,
Unions,
Aggregates
Analytics
(Iterative and
data mining)
Reporting
Stable
Schema
Evolving
Schema
Aster
(SQL +
MapReduce
Analytics)
Format,
No Schema
Hadoop Hadoop Hadoop Aster Aster
Aster
(MapReduce
Analytics)
Teradata/
Hadoop
Teradata Teradata Teradata Teradata Teradata
Hadoop
Aster /
Hadoop
Aster /
Hadoop
Aster Aster Aster
Hadoop Hadoop Hadoop Aster Aster Aster
6/27/2013 44
Questions
and Answers
Thank You!

More Related Content

What's hot

E-Business Suite on Oracle Cloud
E-Business Suite on Oracle CloudE-Business Suite on Oracle Cloud
E-Business Suite on Oracle Cloud
Keith Kiattipong
 
From Data Warehouse to Lakehouse
From Data Warehouse to LakehouseFrom Data Warehouse to Lakehouse
From Data Warehouse to Lakehouse
Modern Data Stack France
 
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa... Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Databricks
 
Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & Approaches
DATAVERSITY
 
Data Architecture for Solutions.pdf
Data Architecture for Solutions.pdfData Architecture for Solutions.pdf
Data Architecture for Solutions.pdf
Alan McSweeney
 
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
Jeffrey T. Pollock
 
Unleashing the Power of OpenAI GPT-3 in FME Data Integration Workflows
Unleashing the Power of OpenAI GPT-3 in FME Data Integration WorkflowsUnleashing the Power of OpenAI GPT-3 in FME Data Integration Workflows
Unleashing the Power of OpenAI GPT-3 in FME Data Integration Workflows
Safe Software
 
Oracle Cloud Infrastructure
Oracle Cloud InfrastructureOracle Cloud Infrastructure
Oracle Cloud Infrastructure
MarketingArrowECS_CZ
 
Data Architecture - The Foundation for Enterprise Architecture and Governance
Data Architecture - The Foundation for Enterprise Architecture and GovernanceData Architecture - The Foundation for Enterprise Architecture and Governance
Data Architecture - The Foundation for Enterprise Architecture and Governance
DATAVERSITY
 
Stash - Data FinOPS
Stash - Data FinOPSStash - Data FinOPS
Stash - Data FinOPS
Modern Data Stack France
 
Sample - Data Warehouse Requirements
Sample -  Data Warehouse RequirementsSample -  Data Warehouse Requirements
Sample - Data Warehouse Requirements
David Walker
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
Thang Bui (Bob)
 
Airflow at lyft for Airflow summit 2020 conference
Airflow at lyft for Airflow summit 2020 conferenceAirflow at lyft for Airflow summit 2020 conference
Airflow at lyft for Airflow summit 2020 conference
Tao Feng
 
Vue d'ensemble Dremio
Vue d'ensemble DremioVue d'ensemble Dremio
Vue d'ensemble Dremio
Modern Data Stack France
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
Jeffrey T. Pollock
 
Building End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPBuilding End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCP
Databricks
 
Self-service analytics @ Leaseplan Digital: from business intelligence to int...
Self-service analytics @ Leaseplan Digital: from business intelligence to int...Self-service analytics @ Leaseplan Digital: from business intelligence to int...
Self-service analytics @ Leaseplan Digital: from business intelligence to int...
webwinkelvakdag
 
The Importance of DataOps in a Multi-Cloud World
The Importance of DataOps in a Multi-Cloud WorldThe Importance of DataOps in a Multi-Cloud World
The Importance of DataOps in a Multi-Cloud World
DATAVERSITY
 
Data Engineering Basics
Data Engineering BasicsData Engineering Basics
Data Engineering Basics
Catherine Kimani
 
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
DataWorks Summit
 

What's hot (20)

E-Business Suite on Oracle Cloud
E-Business Suite on Oracle CloudE-Business Suite on Oracle Cloud
E-Business Suite on Oracle Cloud
 
From Data Warehouse to Lakehouse
From Data Warehouse to LakehouseFrom Data Warehouse to Lakehouse
From Data Warehouse to Lakehouse
 
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa... Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 
Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & Approaches
 
Data Architecture for Solutions.pdf
Data Architecture for Solutions.pdfData Architecture for Solutions.pdf
Data Architecture for Solutions.pdf
 
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
 
Unleashing the Power of OpenAI GPT-3 in FME Data Integration Workflows
Unleashing the Power of OpenAI GPT-3 in FME Data Integration WorkflowsUnleashing the Power of OpenAI GPT-3 in FME Data Integration Workflows
Unleashing the Power of OpenAI GPT-3 in FME Data Integration Workflows
 
Oracle Cloud Infrastructure
Oracle Cloud InfrastructureOracle Cloud Infrastructure
Oracle Cloud Infrastructure
 
Data Architecture - The Foundation for Enterprise Architecture and Governance
Data Architecture - The Foundation for Enterprise Architecture and GovernanceData Architecture - The Foundation for Enterprise Architecture and Governance
Data Architecture - The Foundation for Enterprise Architecture and Governance
 
Stash - Data FinOPS
Stash - Data FinOPSStash - Data FinOPS
Stash - Data FinOPS
 
Sample - Data Warehouse Requirements
Sample -  Data Warehouse RequirementsSample -  Data Warehouse Requirements
Sample - Data Warehouse Requirements
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 
Airflow at lyft for Airflow summit 2020 conference
Airflow at lyft for Airflow summit 2020 conferenceAirflow at lyft for Airflow summit 2020 conference
Airflow at lyft for Airflow summit 2020 conference
 
Vue d'ensemble Dremio
Vue d'ensemble DremioVue d'ensemble Dremio
Vue d'ensemble Dremio
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Building End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPBuilding End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCP
 
Self-service analytics @ Leaseplan Digital: from business intelligence to int...
Self-service analytics @ Leaseplan Digital: from business intelligence to int...Self-service analytics @ Leaseplan Digital: from business intelligence to int...
Self-service analytics @ Leaseplan Digital: from business intelligence to int...
 
The Importance of DataOps in a Multi-Cloud World
The Importance of DataOps in a Multi-Cloud WorldThe Importance of DataOps in a Multi-Cloud World
The Importance of DataOps in a Multi-Cloud World
 
Data Engineering Basics
Data Engineering BasicsData Engineering Basics
Data Engineering Basics
 
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
 

Viewers also liked

Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseDataWorks Summit
 
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsBest Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Cloudera, Inc.
 
Hadoop and Your Data Warehouse
Hadoop and Your Data WarehouseHadoop and Your Data Warehouse
Hadoop and Your Data Warehouse
Caserta
 
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about..."Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
Kai Wähner
 
Teradata Intelligent Memory
Teradata Intelligent MemoryTeradata Intelligent Memory
Teradata Intelligent Memory
inside-BigData.com
 
ETL big data with apache hadoop
ETL big data with apache hadoopETL big data with apache hadoop
ETL big data with apache hadoop
Maulik Thaker
 
Teradata Investor Presentation
Teradata Investor Presentation Teradata Investor Presentation
Teradata Investor Presentation teradata2014
 
Hadoop World 2011: Extending Enterprise Data Warehouse with Hadoop - Jonathan...
Hadoop World 2011: Extending Enterprise Data Warehouse with Hadoop - Jonathan...Hadoop World 2011: Extending Enterprise Data Warehouse with Hadoop - Jonathan...
Hadoop World 2011: Extending Enterprise Data Warehouse with Hadoop - Jonathan...
Cloudera, Inc.
 
Maximizing Business Value: Optimizing Technology Investment
Maximizing Business Value: Optimizing Technology InvestmentMaximizing Business Value: Optimizing Technology Investment
Maximizing Business Value: Optimizing Technology Investment
Teradata
 
[RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba
[RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba[RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba
[RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba
Rakuten Group, Inc.
 
BSI Teradata: The Shocking Case of Home Electronics Planet
BSI Teradata: The Shocking Case of Home Electronics PlanetBSI Teradata: The Shocking Case of Home Electronics Planet
BSI Teradata: The Shocking Case of Home Electronics Planet
Teradata
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
Swiss Big Data User Group
 
Hadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing Architectures
Humza Naseer
 
Integrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle DatabaseIntegrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle Database
Gwen (Chen) Shapira
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopComplement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & Hadoop
Datameer
 
Build a Big Data Warehouse on the Cloud in 30 Minutes
Build a Big Data Warehouse on the Cloud in 30 MinutesBuild a Big Data Warehouse on the Cloud in 30 Minutes
Build a Big Data Warehouse on the Cloud in 30 Minutes
Caserta
 
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerLogical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
DataWorks Summit
 
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkNYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
Vivian S. Zhang
 
The Intelligent Thing -- Using In-Memory for Big Data and Beyond
The Intelligent Thing -- Using In-Memory for Big Data and BeyondThe Intelligent Thing -- Using In-Memory for Big Data and Beyond
The Intelligent Thing -- Using In-Memory for Big Data and Beyond
Inside Analysis
 
Building your data warehouse with Redshift
Building your data warehouse with RedshiftBuilding your data warehouse with Redshift
Building your data warehouse with Redshift
Amazon Web Services
 

Viewers also liked (20)

Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data Warehouse
 
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsBest Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
 
Hadoop and Your Data Warehouse
Hadoop and Your Data WarehouseHadoop and Your Data Warehouse
Hadoop and Your Data Warehouse
 
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about..."Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
 
Teradata Intelligent Memory
Teradata Intelligent MemoryTeradata Intelligent Memory
Teradata Intelligent Memory
 
ETL big data with apache hadoop
ETL big data with apache hadoopETL big data with apache hadoop
ETL big data with apache hadoop
 
Teradata Investor Presentation
Teradata Investor Presentation Teradata Investor Presentation
Teradata Investor Presentation
 
Hadoop World 2011: Extending Enterprise Data Warehouse with Hadoop - Jonathan...
Hadoop World 2011: Extending Enterprise Data Warehouse with Hadoop - Jonathan...Hadoop World 2011: Extending Enterprise Data Warehouse with Hadoop - Jonathan...
Hadoop World 2011: Extending Enterprise Data Warehouse with Hadoop - Jonathan...
 
Maximizing Business Value: Optimizing Technology Investment
Maximizing Business Value: Optimizing Technology InvestmentMaximizing Business Value: Optimizing Technology Investment
Maximizing Business Value: Optimizing Technology Investment
 
[RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba
[RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba[RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba
[RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba
 
BSI Teradata: The Shocking Case of Home Electronics Planet
BSI Teradata: The Shocking Case of Home Electronics PlanetBSI Teradata: The Shocking Case of Home Electronics Planet
BSI Teradata: The Shocking Case of Home Electronics Planet
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Hadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing Architectures
 
Integrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle DatabaseIntegrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle Database
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopComplement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & Hadoop
 
Build a Big Data Warehouse on the Cloud in 30 Minutes
Build a Big Data Warehouse on the Cloud in 30 MinutesBuild a Big Data Warehouse on the Cloud in 30 Minutes
Build a Big Data Warehouse on the Cloud in 30 Minutes
 
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerLogical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
 
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkNYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
 
The Intelligent Thing -- Using In-Memory for Big Data and Beyond
The Intelligent Thing -- Using In-Memory for Big Data and BeyondThe Intelligent Thing -- Using In-Memory for Big Data and Beyond
The Intelligent Thing -- Using In-Memory for Big Data and Beyond
 
Building your data warehouse with Redshift
Building your data warehouse with RedshiftBuilding your data warehouse with Redshift
Building your data warehouse with Redshift
 

Similar to Hadoop and the Data Warehouse: When to Use Which

Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
MapR Technologies
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data Architecture
Perficient, Inc.
 
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
email2jl
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform Concept
Satish Mohan
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKRajesh Jayarman
 
OPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT InfrastructuresOPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
Kangaroot
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
Seeling Cheung
 
Hadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data WarehouseHadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data Warehouse
Edgar Alejandro Villegas
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
Ashnikbiz
 
Modul_1_Introduction_to_Big_Data.pptx
Modul_1_Introduction_to_Big_Data.pptxModul_1_Introduction_to_Big_Data.pptx
Modul_1_Introduction_to_Big_Data.pptx
NouhaElhaji1
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
Attunity
 
Foxvalley bigdata
Foxvalley bigdataFoxvalley bigdata
Foxvalley bigdata
Tom Rogers
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
Skillwise Group
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
Skillwise Group
 
Hadoop is not an Island in the Enterprise
Hadoop is not an Island in the EnterpriseHadoop is not an Island in the Enterprise
Hadoop is not an Island in the EnterpriseDataWorks Summit
 
Hitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop Solution
Hitachi Vantara
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Denodo
 
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...DataWorks Summit
 
Big data analytics - hadoop
Big data analytics - hadoopBig data analytics - hadoop
Big data analytics - hadoop
Vishwajeet Jadeja
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
Caserta
 

Similar to Hadoop and the Data Warehouse: When to Use Which (20)

Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data Architecture
 
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform Concept
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
 
OPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT InfrastructuresOPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
 
Hadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data WarehouseHadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data Warehouse
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
Modul_1_Introduction_to_Big_Data.pptx
Modul_1_Introduction_to_Big_Data.pptxModul_1_Introduction_to_Big_Data.pptx
Modul_1_Introduction_to_Big_Data.pptx
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
Foxvalley bigdata
Foxvalley bigdataFoxvalley bigdata
Foxvalley bigdata
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Hadoop is not an Island in the Enterprise
Hadoop is not an Island in the EnterpriseHadoop is not an Island in the Enterprise
Hadoop is not an Island in the Enterprise
 
Hitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop Solution
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
 
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
 
Big data analytics - hadoop
Big data analytics - hadoopBig data analytics - hadoop
Big data analytics - hadoop
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
 

More from DataWorks Summit

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 

Recently uploaded (20)

Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 

Hadoop and the Data Warehouse: When to Use Which

  • 1. HADOOP & THE DATA WAREHOUSE: WHEN TO USE WHICH Steve Wooledge – Teradata Labs Jim Walker – Hortonworks 1
  • 2. Topics • Trends in enterprise data architectures • The value of an integrated data warehouse • The value of Hadoop • Bringing it all together and next steps
  • 3. Big Data Comes with BIG HEADACHES Even free software like Hadoop is causing companies to spend more money…Many CIOs believe data is inexpensive because storage has become inexpensive. But data is inherently messy—it can be wrong, it can be duplicative, and it can be irrelevant— which means it requires handling, which is where the real expenses come in. “ ” Through 2015, 85% of Fortune 500 organizations will be unable to exploit big data for competitive advantage. “ ”Source: The Wall Street Journal. “CIOs’ Big Problem with Big Data”. Aug 2012 Source: Gartner. “Information Innovation: Innovation Key Initiative Overview”. April 2012
  • 4. Organizations Face Several Obstacles with Big Data Source: Big Analytics 2012 Survey, Teradata Difficulty managing multiple systems, new types of data Hard to find right skills; Lack of supportability for new systems & “data scientists” Difficulty deploying and integrating new systems Difficulty providing accessibility to fast insights on big data
  • 5. Shift from a Single Platform to an Ecosystem “Big Data requirements are solved by a range of platforms including analytical databases, discovery platforms, and NoSQL solutions beyond Hadoop.” “We will abandon the old models based on the desire to implement for high-value analytic applications.” "Logical" Data Warehouse Source: “Big Data Comes of Age”. EMA and 9sight Consulting. Nov 2012.
  • 6. AUDIO & VIDEO IMAGES TEXT WEB & SOCIAL MACHINE LOGS CRM SCM ERP DISCOVERY PLATFORM CAPTURE | STORE | REFINE INTEGRATED DATA WAREHOUSE LANGUAGES MATH & STATS DATA MINING BUSINESS INTELLIGENCE APPLICATIONS Engineers Data Scientists Business Analysts Front-Line WorkersCustomers / PartnersMarketing Operational SystemsExecutives TERADATA UNIFIED DATA ARCHITECTURE
  • 7. Topics • Trends in enterprise data architectures • The value of an integrated data warehouse • The value of Hadoop • Bringing it all together and next steps
  • 8. DUAL SYSTEMS DATA MARTS ANALYTICAL ARCHIVE TEST/ DEV The Value of The Data Warehouse INDEPENDENT DATA MART Business Analysts Knowledge Workers DATA MININGBUSINESS INTELLIGENCE APPLICATIONS Customers/Partners Marketing Executives Front-line Workers Operational Systems INTEGRATED DATA WAREHOUSE DATA LAB Integrated Analytics Advanced Analytics Temporal OLAP Optimization Geospatial Big Data Integration Application Development Agile Analytics Data Exploration Benefits •Easy to consume data •Rationalization of data from multiple sources into single enterprise view •Clean, safe, secure data •Cross-functional analysis •Transform once, use many •Fast response times
  • 9. SQL Advantages with an MPP RDBMS • Full ANSI SQL: • The lingua franca of business users when accessing data • Decades of standardization (stable, feature rich, portable) • Mature 3rd Party SQL based tools that provide business users with self service direct access to the data • BI Tools • In-database statistical packages • Analytic applications (CRM, SCM, MDM) • Easily parallelized • Scalable when manipulating large data sets 6/27/2013 9
  • 10. ACID Advantages in an MPP RDBMS • Guarantees database actions are processed reliably • Ensures 100% query result accuracy • Supports updates and deletes • Needed for applications that require 100% consistency 6/27/2013 10 Atomicity - All of the pieces are committed or none are committed. Consistency - Creates a new and valid state of data, or, if any failure occurs, returns all data to its original state. Isolation - Processed and not yet committed transactions must remain isolated from any other transactions. Durability - Committed data is saved such that in event of a failure and system restart, the data is available in its correct state.
  • 11. Tight Vertical Integration • End-to-end management of resources • Efficient utilization of resources • Engineered extremely well for known data • Fine-grained parallelism and resource management • Consistency of service level delivery Best Practices Management: • Workload functions • Workload groups • Exceptions • Priorities • Time periods
  • 12. Low Latency Advantages of MPP RDBMS Multi-temperature storage with automated distribution of data based on access patterns: • In-Memory • Solid-State Drives • Fast Hard Drives • Fat Hard Drives 6/27/2013 12 • Indexes • Statistics • Advanced partitioning
  • 13. Cost Based Optimizer Advantages in an MPP RDBMS • Best practices optimizer determines how the query will be processed most efficiently, with no “hints” or degrees of parallelism necessary. • In chess, you can look out a few moves to decide your best next move, but you can’t envision all move and countermove sequences for the entire game: • The Grand Master has the knowledge, experience, and intelligence to identify and use the right strategy. • With Hadoop, the user takes a heavy role in optimizing the execution of queries. • With an MPP RDBMS, the software is the optimizer. 6/27/2013 13 Query Rewrite • semantic optimization • different types of vendor tools Fast/Efficient Data Access • Access path - Indexing • Partitioning (CP & PPI) • Advanced partitioning schemes (range & case based, multilevel, dynamic) • IO Optimizations (efficient scans/sync scan) scan optimization Query Complexity • Join costing & planning • Aggregation Many ways to process a complex query…
  • 14. Granular Security Advantages in an MPP RDBMS • Row level security • Column level security • An MPP RDBMS tightly integrates mature security features • User-level security controls • Increased user authentication options • Support for security roles • Enterprise directory integration • Auditing and monitoring controls • Encryption 6/27/2013 14
  • 15. MPP RDBMS Customer Examples 6/27/2013 15
  • 16. Topics • Trends in enterprise data architectures • The value of an integrated data warehouse • The value of Hadoop • Bringing it all together and next steps
  • 17. © Hortonworks Inc. 2012 By the year 2015, we believe half the worlds data will be processed by Apache Hadoop Key Hadoop Features for the EDW •Storage/Processing •Metadata
  • 18. © Hortonworks Inc. 2012 Data Explosion The World of Data is Changing Page 18 By 2015, organizations that build a modern information management system will outperform their peers financially by 20 percent. – Gartner, Mark Beyer, “Information Management in the 21st Century” 1 Zettabyte (ZB) = 1 Billion TBs 15x growth rate of machine generated data by 2020 Source: IDC
  • 19. © Hortonworks Inc. 2012 StorageApache Hadoop: Center of Big Data Strategy Open Source data management with scale-out storage & distributed processing Page 19 HDFS • Distributed across “nodes” • Natively redundant • Name node tracks locations Processing Map Reduce • Splits a task across processors “near” the data & assembles results • Self-Healing, High Bandwidth Clustered Storage Key Characteristics • Scalable – Efficiently store and process petabytes of data – Linear scale driven by additional processing and storage • Reliable – Redundant storage – Failover across nodes and racks • Flexible – Store all types of data in any format – Apply schema on analysis and sharing of the data • Economical – Use commodity hardware – Open source software guards against vendor lock-in
  • 20. © Hortonworks Inc. 2012 HCatalog Table access Aligned metadata REST API • Raw Hadoop data • Inconsistent, unknown • Tool specific access Apache HCatalog provides flexible metadata services across tools and external access Metadata Services • Consistency of metadata and data models across tools (MapReduce, Pig, HBase and Hive) • Accessibility: share data as tables in and out of HDFS • Availability: enables flexible, thin-client access via REST API Shared table and schema management opens the platform
  • 21. © Hortonworks Inc. 2012 Page 21 “how to” deliver an open source enterprise product • Identify requirements • Open community delivery • Enterprise rigor Apache Hadoop Test & Patch Design & Develop Release Apache Pig Apache HCatalo g Apache HBase Other Apache Projects Apache Hive Apache Ambari An Open Apache Community Fastest path to innovation is an open community
  • 22. © Hortonworks Inc. 2012 Big Data: It’s About Scale & Structure Page 22 RDBMS HadoopNoSQLMPPEDW best fit use schemaRequired on write Required on read speedReads are fast Writes are fast governanceStandards and structured Loosely structured processingLimited, no data processing Processing coupled with data data typesStructured Multi and unstructured Interactive OLAP Analytics Complex ACID Transactions Operational Data Store Data Discovery Processing unstructured data Massive Storage/Processing costSoftware License Support only resourcesKnown entity Growing, complexities, wide
  • 23. © Hortonworks Inc. 2012 An Emerging Data Architecture Page 23 APPLICATIONSDATASYSTEMS TRADITIONAL REPOS RDBMS EDW MPP DATASOURCES MOBILE DATA OLTP, POS SYSTEMS OPERATIONAL TOOLS MANAGE & MONITOR Traditional Sources (RDBMS, OLTP, OLAP) New Sources (web logs, email, sensor data, social media) DEV & DATA TOOLS BUILD & TEST Business Analytics Custom Applications Enterprise Applications HORTONWORKS DATA PLATFORM
  • 24. © Hortonworks Inc. 2012 Interoperating With Your Tools Page 24 APPLICATIONSDATASYSTEMS DEV & DATA TOOLS OPERATIONAL TOOLS Viewpoint Microsoft Applications HADOOP DATASOURCES MOBILE DATA OLTP, POS SYSTEMS Traditional Sources (RDBMS, OLTP, OLAP) New Sources (web logs, email, sensor data, social media)
  • 25. AUDIO & VIDEO IMAGES TEXT WEB & SOCIAL MACHINE LOGS CRM SCM ERP DISCOVERY PLATFORM CAPTURE | STORE | REFINE INTEGRATED DATA WAREHOUSE LANGUAGES MATH & STATS DATA MINING BUSINESS INTELLIGENCE APPLICATIONS Engineers Data Scientists Business Analysts Front-Line WorkersCustomers / PartnersMarketing Operational SystemsExecutives TERADATA UNIFIED DATA ARCHITECTURE
  • 26. © Hortonworks Inc. 2012 By the year 2015, we believe half the worlds data will be processed by Apache Hadoop Key Hadoop Features for the EDW •Storage/Processing •Metadata
  • 27. © Hortonworks Inc. 2012 By the year 2015, we believe half the worlds data will be processed by Apache Hadoop Key Hadoop Features for the EDW •Storage/Processing •Metadata •FAMILIARITY
  • 28. Organizations Face Several Obstacles with Big Data Source: Big Analytics 2012 Survey, Teradata Difficulty managing multiple systems, new types of data Hard to find right skills; Lack of supportability for new systems & “data scientists” Difficulty deploying and integrating new systems Difficulty providing accessibility to fast insights on big data
  • 29. Topics • Trends in enterprise data architectures • The value of an integrated data warehouse • The value of Hadoop • Bringing it all together and next steps
  • 30. Confidential and proprietary. Copyright © 2013 Teradata Corporation.30 Teradata Unified Data Architecture • Hadoop - Collect ALL interaction data • Teradata Aster - Discovery customer behavioral patterns • Teradata - Operationalize Insights The right technology on the right analytical problems using best of breed technologies
  • 31. Confidential and proprietary. Copyright © 2013 Teradata Corporation.31 Improved Customer Service and Retention Hadoop captures, stores and transforms social, images and call records Path, pattern & graph analysis Data Sources Multi-Structured Raw Data Call Center Voice Records Check Images Traditional Data Flow Analysis + Marketing Automation (Customer Campaign) Capture, Store and Refine Layer ETL Tools Hadoop Call Data Integrated DW DimensionalData AnalyticResults Discovery Platform Sentiment Scores SOCIAL FEEDS CLICKSTREAM DATA
  • 32. Confidential and proprietary. Copyright © 2013 Teradata Corporation.32 Teradata Workload-Specific Platforms 670 1650 2700 6700 Data Mart Appliance Extreme Data Appliance Data Warehouse Appliance Active Enterprise Data Warehouse Appliance for Hadoop Aster Big Analytics Appliance SAS High Performance Analytics Scale Up to 12TB Up to 186PB Up to 1.6PB Up to 61PB Up to 10PB Up to 5PB Up to 52TB Work- loads Test / Development or Smaller Data Marts Analytical Archive, Deep Dive Analytics Strategic Intelligence, Decision Support System, Fast Scan Strategic & Operational Intelligence, Real Time Update, Active workloads Appliance for Storing, Capturing and Refining Data. Hortonworks HDP 1.1 Discovery Platform for Big Data Analytics with embedded SQL MapReduce for new data types & sources Dedicated appliance for SAS high- performance analytic model development 700
  • 33. Confidential and proprietary. Copyright © 2013 Teradata Corporation.33 Teradata Unified Data Architecture • Hadoop - Collect ALL interaction data • Teradata Aster - Discovery customer behavioral patterns • Teradata - Operationalize Insights The right technology on the right analytical problems using best of breed technologies SQL-H SQL-H Aster-Teradata Connector Aster Connector for Hadoop Teradata Connector for Hadoop
  • 34. Confidential and proprietary. Copyright © 2013 Teradata Corporation.34 Teradata SQL-H™ A Business User’s Bridge to Access Hadoop Data Teradata SQL-H Gives Business Users a Better Way to Access Data Stored in Hadoop • Trusted: Use existing tools/skills and enable self-service BI with granular security • Allow standard ANSI SQL access to Hadoop data • Fast: Queries run on Teradata, data accessed from Hadoop • Efficient: Intelligent data access leveraging the Hadoop HCatalog Hadoop Layer: HDFS Pig Hive Hadoop MR Teradata: SQL-H HCatalog Data DataFiltering
  • 35. Confidential and proprietary. Copyright © 2013 Teradata Corporation.35 The App Store of Big Data PATH ANALYSIS Discover Patterns in Rows of Sequential Data TEXT ANALYSIS Derive Patterns and Extract Features in Textual Data STATISTICAL ANALYSIS High-Performance Processing of Common Statistical Calculations SEGMENTATION Discover Natural Groupings of Data Points MARKETING ANALYTICS Analyze Customer Interactions to Optimize Marketing Decisions DATA TRANSFORMATION Transform Data for More Advanced Analysis Graph Analysis Graph analytics processing and visualization SQL-MapReduce Visualization Graphing and visualization tools linked to key functions of the MapReduce analytics library Aster Discovery Portfolio: Accelerate Time to Insights Some of the 80+ out-of-the-box analytical apps
  • 36. Confidential and proprietary. Copyright © 2013 Teradata Corporation.36 Big Data Analytics & Discovery Example Customers: Teradata Aster Big Analytics Appliance XL Axiata
  • 37. Confidential and proprietary. Copyright © 2013 Teradata Corporation.37 Discovering Deep Insights in Retail Transforming Web Walks into DNA Sequences Situation Large retailer with 700M visits/year, 2M customers / day look at 1M products online Problem Increase ability of web content owners to self-serve insights Solution Treat web walks like DNA sequences of simple patterns. Impact • Data: loaded logs into Hortonworks • Loaded 2 months of raw data in 1 hour, vs. 1 day on old system • Can load a day’s log data in 60 sec • Sessionize: Creates sequence for visit, e.g., boils 20 customer clicks down to 1 line: • <Home –Search -Look at Product - Add to Basket – Pay – Exit> • Analyze: Business analysts can now do path analysis • Act: • Segmentations by behavior can increase conversion rates by 5-10%. • Web design changes can drive another 10-20% more visitors into the sales funnel
  • 38. Confidential and proprietary. Copyright © 2013 Teradata Corporation.38 Example: Online Checkout Flow Analysis • Customers who have reached the checkout process follow an “ideal path”. • deliveryslots > deliveryinformation > coupons > substitutions > paymentinfo > orderconfirmation • Determine how and when (and ultimately, why) customers deviate from this path. • Discover obstacles preventing purchase and optimize visitor flow through the web site. • The Aster SQL-MapReduce Framework enables a variety of different path visualizations.
  • 39. Teradata Portfolio for Hadoop ”Taking Hadoop from Silicon Valley to Main Street” Most Trusted & Flexible Hadoop Platforms for Your Next-Generation Unified Data Architecture™ 1. Teradata Aster Big Analytics Appliance 2. Teradata Appliance for Hadoop 3. Teradata Commodity Offering for Hadoop (Dell) 4. Teradata Software-only for Hadoop (Hortonworks Data Platform) Complete consulting and training capability • Big Analytics Services – across the UDA • Data Integration Optimization – ETL, ELT across the UDA • Hadoop deployment & mentoring • Teradata delivering Hortonworks training • Hadoop Managed Services - operations & administration Customer Support for Hadoop • World-class Teradata customer support, backed by Hortonworks What We Announced Today
  • 40. Teradata Appliance for Hadoop Value-Added Software Bringing Hadoop to Enterprise Access: SQL-H™ Management: Viewpoint, TVI Administration: Hadoop Builder, Intelligent start/stop, DataNode swap, deferred drive replace High Availability : NameNode HA, Master Machine Failover Refining, Metadata, Entity Resolution Security & Data Access HCatalog KerberosKerberos
  • 41. 41 6/27/2013 Teradata Confidential Complete Consulting and Training Capability Post-sale Services Areas of Focus Teradata Analytic Architecture Services Services to scope, design, build, operate and maintain an optimal UDA approach for Teradata, Aster, and Hadoop Teradata DI Optimization Assess structured/non-structured data, discuss data loading techniques, determine best platform, optimize load scripts/processes Teradata Big Analytics Assess data value/cost of capture, identify source of “exhaust” data, create conceptual architecture, refine and enrich the data, implement initial analytics in Aster or best-fit tool Teradata Workshop for Hadoop Introduction workshop (across all of UDA) Teradata Data Staging for Hadoop Load data into landing-area; set-up data exploration/refining area; Scope architecture and analytics; set-up Hadoop repository; Load sample data Teradata Platform for Hadoop Installation guidance and mentoring for Hadoop platform, D-I-Y after installation Teradata Managed Services for Hadoop Operations, management, administration, backup, security, process control for Hadoop Teradata Training Courses for Hadoop Two comprehensive, multi-day training offerings: 1) Administration of Apache Hadoop and 2) Developing Solutions Using Apache Hadoop
  • 42. 42 6/27/2013 Teradata Confidential When to Use Which? The best approach by workload and data type Processing as a Function of Schema Requirements and Stage of Data Pipeline Low Cost Storage and Fast Loading Data Pre- Processing, Refining, Cleansing “Simple math at scale” (Score, filter, sort, avg., count...) Joins, Unions, Aggregates Analytics (Iterative and data mining) Reporting Stable Schema Evolving Schema Aster (SQL + MapReduce Analytics) Format, No Schema Hadoop Hadoop Hadoop Aster Aster Aster (MapReduce Analytics) Teradata/ Hadoop Teradata Teradata Teradata Teradata Teradata Hadoop Aster / Hadoop Aster / Hadoop Aster Aster Aster Hadoop Hadoop Hadoop Aster Aster Aster Financial Analysis, Ad-Hoc/OLAP Enterprise-Wide BI and Reporting Spatial/Temporal Active Execution Interactive Data Discovery Web Clickstream, Set-Top Box Analysis CDRs, Sensor Logs, JSON Social Feeds, Text, Image Processing Audio/Video Storage and Refining Storage and Batch Transformations
  • 43. Confidential and proprietary. Copyright © 2013 Teradata Corporation.43 When to Use Which? The best approach by workload and data type Processing as a Function of Schema Requirements and Stage of Data Pipeline Low Cost Storage and Fast Loading Data Pre- Processing, Refining, Cleansing “Simple math at scale” (Score, filter, sort, avg., count...) Joins, Unions, Aggregates Analytics (Iterative and data mining) Reporting Stable Schema Evolving Schema Aster (SQL + MapReduce Analytics) Format, No Schema Hadoop Hadoop Hadoop Aster Aster Aster (MapReduce Analytics) Teradata/ Hadoop Teradata Teradata Teradata Teradata Teradata Hadoop Aster / Hadoop Aster / Hadoop Aster Aster Aster Hadoop Hadoop Hadoop Aster Aster Aster