SlideShare a Scribd company logo
1 of 36
Download to read offline
Colin White
President, BI Research
December 5, 2012
Managing Data Warehouse Growth
in the New Era of Big Data
2
Sponsor
3
Speakers
Vineet Goel
Product Manager,
IBM InfoSphere Optim
Colin White
President,
BI Research
Colin White
President, BI Research
TDWI/IBM Webinar, December 2012
Managing Data Warehouse Growth
in the New Era of Big Data
The Evolution of Digital Data
Copyright © BI Research, 2012
First
OLTP
systems
First
commercial
RDBMSs
Early
decision support
products
Early
data
warehousing
Big data
& advanced
analytics
2012
Increasing Data Volumes
5
Sabre
84,000 txs/day
Sabre
60,000 txs/sec
25 TB EDW
5
Data Growth – Choose an Analyst!
6Copyright © BI Research, 2012 6
Source: IDC
5
Data Growth and Big Data
Big data technologies apply to
all types of digital data not just
multi-structured data
“Big” is a relative term and is
different for each organization
and application
What you do with big data and
how you use it for business
benefit should be the main
consideration – business
analytics play a key role here
Copyright © BI Research, 2012 7
5
The Value of Data: IBM 2012 Study
8Copyright © BI Research, 2012 8
5
The Value of Data: IBM 2012 Study
9Copyright © BI Research, 2012 9
5
The Changing World of Business Analytics
Advanced Analytics
• Improved analytic tools and techniques for
statistical and predictive analytics
• New tools for exploring and visualizing new
varieties of data
• Operational intelligence with embedded BI
services and BI automation
Big Data Management
• Analytic relational database systems that
offer improved price/performance and
libraries of analytic functions
• Non-relational systems such as Hadoop for
handling new types of data
• Stream processing/CEP systems for analyzing
in-motion data
• In-memory computing for high performance
Copyright © BI Research, 2012 10
5
Advanced Analytics Example: SC Digest 2012
1
1Copyright © BI Research, 2012 11
Advanced Analytics in Supply Chain, Dr. Michael Watson, Supply Chain Digest, November 2012
5
Advanced Analytics Example: SC Digest 2012
1
2Copyright © BI Research, 2012 12
Advanced Analytics in Supply Chain, Dr. Michael Watson, Supply Chain Digest, November 2012
1. Descriptive analytics – using historical data to describe
the business. This is usually associated with Business
Intelligence (BI) or visibility systems. In supply chain, you
use descriptive analytics to better understand your historical
demand patterns, to understand how product flows through
your supply chain, and to understand when a shipment
might be late.
2. Predictive analytics – using data to predict trends and
patterns. This is commonly associated with statistics. In the
supply chain, you use predictive analytics to forecast future
demand or to forecast the price of fuel.
3. Prescriptive analytics – using data to suggest the optimal
solution. This is commonly associated with optimization. In
the supply chain, you use prescriptive analytics to set your
inventory levels, schedule your plants, or route your trucks.
What Then is Big Data?
Represents analytic and data management
solutions that could not previously be
supported because of:
• Technology limitations – poor performance,
inadequate analytic capabilities, etc.
• High hardware and software costs
• Incomplete or limited data for generating the
required solutions
Set of overlapping technologies that enable
customers to deploy analytic systems
optimized to suite specific business needs and
workloads
Optimization may involve improving
performance, reducing costs, enabling new
types of data to be analyzed, etc.
Copyright © BI Research, 2012 13
5Copyright © BI Research, 2012
Big Data and Data Life Cycle Management
Data Management and Analytic Performance
• Capacity planning
• Managing data warehouse growth
• Analytic performance management and
optimization
• Service level agreements
Data Governance
• Security: user access, encryption,
masking, etc.
• Quality: governed/ungoverned data
• Backup and recovery
• Archiving and retention: historical
analysis, compliance
14
5
The Impact of Big Data on the Data Life Cycle
1
5Copyright © BI Research, 2012 15
Need fast time to value to quickly gain business benefits from big data
• Impractical to use traditional EDW approach for all analytic solutions
• Extend existing data warehousing environment to support big data and
accommodate data growth
Need high performance solutions for supporting big data analytic workloads
• One-size fits all data management is no longer viable
• Match technologies and costs to business needs and analytic workloads
Need improved data governance to handle big data
• No longer practical to rigidly control and govern all forms of data – implement
different levels of governance based on security, compliance and quality needs
• Determine data archiving and policies based on the possible future need to
analyze historical data and data compliance requirements
5
The New Extended Data Warehouse
IMPROVE EXTEND
Copyright © BI Research, 2012 16
5Copyright © BI Research, 2012
The Quest for Performance: Optimized Systems
17
10Copyright © BI Research, 2012
Optimized Systems: Hardware Directions
Faster processors
Multi-core processors
Intelligent hardware
64-bit memory spaces
Large-capacity disk drives
Fast hard-disk and solid-state drives
Hybrid storage configurations
Scale-up/out parallel processing configurations
Lower-cost hardware (blades, clusters)
Reduced power and cooling requirements
Packaged hardware/software appliances
18
1
9Copyright © BI Research, 2012
Managing Data Warehouse Growth: Storage Options
Large-capacity hard-disk drives (HDD)
• More economical, less reliable, slower
performance, e.g., SATA drives in white-box H/W
• Often “short-stroked” to improve performance
High-performance hard-disk drives (HDD)
• More expensive, more reliable, better
performance, e.g., enterprise SAS drives
Solid-state drives (SSDs)
• High and consistent performance
• Better reliability and more energy efficient
• Distinguish between commodity SSDs and
enterprise SSDs
Dynamic RAM (DRAM)
• Best performance - eliminates I/O overheads
• Use by in-memory computing systems
19
2
0Copyright © BI Research, 2012
Managing Data Warehouse Growth: Tiered Storage
Hybrid (tiered) storage
• Employs a combination of different
storage devices and options
• Data location is based on
availability, usage, and
performance requirements
Some DBMSs support system-
enabled data migration
• Table/partition vs
block-level migration
• Administrator-driven
vs automatic migration
• Consider relationship to
workload management
and data archiving &
retention strategies
High-capacity
HDDs
High-performance
HDDs
Hot data
Warm data
Cold data
Performance Tier
Performance &
capacity Tier
High-
performance
RAM & SSDs
Capacity &
archive tier
20
5Copyright © BI Research, 2012
Managing DW Growth: Archiving/Retention - 1
The price/performance of new software
and hardware technologies enables
more data to be kept online longer
Data warehouse data is likely to be
managed on multiple systems for
performance, cost, or business reasons
Policies are required to manage the
archiving (for possible future analysis)
and retention (for compliance) of data
warehouse data
These policies must be coordinated
with corporate strategy especially in the
area of compliance
21
5Copyright © BI Research, 2012
Managing DW Growth: Archiving/Retention - 2
Operational, analytical and
collaborative computing are
becoming inextricably linked and
this should be considered when
developing retention policies
Data security, quality, archiving and
retention are related, but may need
different levels of governance
Data archiving and retention
solutions may be home grown, use
best-of-breed products, or employ
an enterprise-level software
approach
22
5Copyright © BI Research, 2012
Summary
Organizations are experiencing huge data growth
Big data involves all digital data, but “big” is different
for each organization and application
Big data is a set of overlapping technologies that
can provide huge business benefits but create
some significant governance issues
Big data involves the blending of governed and
ungoverned data
Impossible to govern all data and organizations
need to implement flexible policies that:
• Allow easy but secure access to data
• Provide elastic capacity and good performance
• Efficiently manage data archiving and retention
23
© 2012 IBM Corporation
Information Management
Managing Data Warehouse Growth
IBM InfoSphere Optim, Information Management
© 2012 IBM Corporation
Information Management
IBM Big Data Strategy
IBM Big Data Platform
Systems
Managem
ent
Application
Developm
ent
Visualizati
on
&
Discovery Accelerators
Information Integration &
Governance
Hadoop
System
Stream
Computing
Data
Warehous
e
New analytic applications drive the
requirements for a big data platform
• Integrate and manage the full variety,
velocity and volume of data
• Apply advanced analytics to information
in its native form
• Visualize all available data for ad-hoc
analysis
Need for trusted data drives requirements
for an information integration and
governance platform
• Ensure the highest quality information
• Master data into a single view
• Secure and Protect sensitive data to
minimize risk and exposure
• Govern data throughout its lifecycle
Information Integration & Governance
BI /
Reporting
Exploration /
Visualization
Functional
App
Industry
App
Predictive
Analytics
Content
Analytics
Analytic Applications
© 2012 IBM Corporation
Information Management
Governing Data Throughout its Lifecycle
Validate test resultsDefine policies
Report & retrieve
archived data
Enable compliance
with retention &
e-discovery
Move only the needed
information
Integrate into single
data source
Create & refresh test
data
Manage data growthClassify & define data
and relationships
Develop database
structures & code
Enhance performanceDiscover where
data resides
Develop &
Test
Understand &
Define
Optimize &
Archive
Consolidate &
Retire
Information Governance Core Disciplines
Lifecycle Management
© 2012 IBM Corporation
Information Management
Data Warehouse Challenges Related to Data Growth
Increasing Costs
and Time to Market
Poor Data Warehouse
Performance
Managing Risk &
Compliance
Slow-performing business
intelligence (BI) & analytics
solutions due to unchecked
data growth.
The “keep everything”
strategy impacts supporting
the complex data retention
and disposal compliance.
The impact of exponential
data growth on infrastructure
and operational costs to keep
up with data volumes.
© 2012 IBM Corporation
Information Management
2828
Technical Challenges with High Volume Data Growth
 No policies and processes to manage
multi-temperature data cost-effectively
 Transaction & data volume growth drive up
infrastructure costs
 Multiple instances of production data – backup,
DR, training, test – can compound data growth
 Large production instances affect cost &
operations of backups, DR, non-prod
 Archival processes are complex requiring full
business context for e-discovery
© 2012 IBM Corporation
Information Management
What is Data Archiving?
Manage data growth and improve
performance by intelligently
archiving historical data
Universal Access to Application Data (ODBC/JDBC)
Report WriterApplication
Requirements
Benefits
• Reduce hardware, storage
and maintenance costs
• Improve application
performance
• Support data retention or
disposal requirements
• Archive, manage and
retain application data
according to business
policies
• Maintain Referential
Integrity
• Support tiered archiving to
secondary database or file
system
© 2012 IBM Corporation
Information Management
Tiered Storage for Information Lifecycle Management
Non DBMS
Retention Platform
ATA File Server
EMC Centera
IBM RS550
HDS
Compressed
Archives
Offline
Retention Platform
CD
Tape
Optical
Compressed
Archives
Production
Database
Archive
Definition
s
Archive
Restore
Archive Reporting
Database
Compressed
Archives
Online
Archive
5-6 years
Offline
Archive
7+ years
Current
Data
1-2 years
Active
Historical
3-4 years
© 2012 IBM Corporation
Information Management
Benefits of Archiving for the Data Warehouse
Archive
Archive
Archive
Reduce Costs Improve
Performance
Minimize Risk
Reduce TCO by
leveraging a tiered-
storage archiving strategy
-Reclaim space & defer
hardware upgrades
-Reduce operational and
storage costs
Increase performance by
archiving out dormant
data.
-Faster analytical
processing, ETL and
reporting.
-Faster backup, restores
and maintenance tasks.
Support data retention
or legal hold
requirement.
-Capture the precise sets
of data required based
on your audit or e-
discovery.
-Support defensible
disposal of data
© 2012 IBM Corporation
Information Management
Single, scalable, information lifecycle management solution provides a central point to deploy policies to
extract, archive, subset, and protect application data records from creation to deletion
Manage Data GrowthData MaskingTest Data Management Application RetirementDiscover
Partner-delivered Solutions
About IBM InfoSphere Optim Solutions
© 2012 IBM Corporation
Information Management
Learn more
 Webpage: InfoSphere Optim
 Solution Sheet: InfoSphere Optim Solutions for
the data warehouse
 Demo: InfoSphere Optim archiving
 Whitepaper: Control Application Data Before
it Controls Your Business
www.ibm.com/optim
© 2012 IBM Corporation
Information Management
35
Questions?
Contacting Speakers
• If you have further questions or comments:
Colin White, BI Research
cwhite@bi-research.com
Vineet Goel
vineetg@us.ibm.com

More Related Content

What's hot

TDWI Spotlight: Enabling Data Self-Service with Security, Governance, and Reg...
TDWI Spotlight: Enabling Data Self-Service with Security, Governance, and Reg...TDWI Spotlight: Enabling Data Self-Service with Security, Governance, and Reg...
TDWI Spotlight: Enabling Data Self-Service with Security, Governance, and Reg...Denodo
 
Why You Need to Govern Big Data
Why You Need to Govern Big DataWhy You Need to Govern Big Data
Why You Need to Govern Big DataIBM Analytics
 
Competing IT Priorities - An Operating Model for Data Stewardship and Busines...
Competing IT Priorities - An Operating Model for Data Stewardship and Busines...Competing IT Priorities - An Operating Model for Data Stewardship and Busines...
Competing IT Priorities - An Operating Model for Data Stewardship and Busines...Jaleann M McClurg MPH, CSPO, CSM, DTM
 
Modern Integrated Data Environment - Whitepaper | Qubole
Modern Integrated Data Environment - Whitepaper | QuboleModern Integrated Data Environment - Whitepaper | Qubole
Modern Integrated Data Environment - Whitepaper | QuboleVasu S
 
001 More introduction to big data analytics
001   More introduction to big data analytics001   More introduction to big data analytics
001 More introduction to big data analyticsDendej Sawarnkatat
 
Building the enterprise data architecture
Building the enterprise data architectureBuilding the enterprise data architecture
Building the enterprise data architectureCosta Pissaris
 
The Death of the Star Schema
The Death of the Star SchemaThe Death of the Star Schema
The Death of the Star SchemaDATAVERSITY
 
Slides: Migrate BI Dashboards to Run Directly on a Cloud Data Lake in Five Ea...
Slides: Migrate BI Dashboards to Run Directly on a Cloud Data Lake in Five Ea...Slides: Migrate BI Dashboards to Run Directly on a Cloud Data Lake in Five Ea...
Slides: Migrate BI Dashboards to Run Directly on a Cloud Data Lake in Five Ea...DATAVERSITY
 
Data Warehouse - a Fit-For-Purpose Approach
Data Warehouse - a Fit-For-Purpose ApproachData Warehouse - a Fit-For-Purpose Approach
Data Warehouse - a Fit-For-Purpose ApproachJohn Bao Vuu
 
Accelerating Fast Data Strategy with Data Virtualization
Accelerating Fast Data Strategy with Data VirtualizationAccelerating Fast Data Strategy with Data Virtualization
Accelerating Fast Data Strategy with Data VirtualizationDenodo
 
Netspend: Maintaining "High Operations Tempo" via Multidomain MDM
Netspend: Maintaining "High Operations Tempo" via Multidomain MDMNetspend: Maintaining "High Operations Tempo" via Multidomain MDM
Netspend: Maintaining "High Operations Tempo" via Multidomain MDMOrchestra Networks
 
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...Precisely
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...DATAVERSITY
 
ADV Slides: The Data Needed to Evolve an Enterprise Artificial Intelligence S...
ADV Slides: The Data Needed to Evolve an Enterprise Artificial Intelligence S...ADV Slides: The Data Needed to Evolve an Enterprise Artificial Intelligence S...
ADV Slides: The Data Needed to Evolve an Enterprise Artificial Intelligence S...DATAVERSITY
 
Using Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-PurposeUsing Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-PurposeDATAVERSITY
 
Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best PracticesDATAVERSITY
 
Role of Database Management System in A Data Warehouse
Role of Database Management System in A Data Warehouse Role of Database Management System in A Data Warehouse
Role of Database Management System in A Data Warehouse Lesa Cote
 
The Merger is Happening, Now What Do We Do?
The Merger is Happening, Now What Do We Do?The Merger is Happening, Now What Do We Do?
The Merger is Happening, Now What Do We Do?DATUM LLC
 
Enterprise Data Management
Enterprise Data ManagementEnterprise Data Management
Enterprise Data ManagementBhavendra Chavan
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best PracticesDATAVERSITY
 

What's hot (20)

TDWI Spotlight: Enabling Data Self-Service with Security, Governance, and Reg...
TDWI Spotlight: Enabling Data Self-Service with Security, Governance, and Reg...TDWI Spotlight: Enabling Data Self-Service with Security, Governance, and Reg...
TDWI Spotlight: Enabling Data Self-Service with Security, Governance, and Reg...
 
Why You Need to Govern Big Data
Why You Need to Govern Big DataWhy You Need to Govern Big Data
Why You Need to Govern Big Data
 
Competing IT Priorities - An Operating Model for Data Stewardship and Busines...
Competing IT Priorities - An Operating Model for Data Stewardship and Busines...Competing IT Priorities - An Operating Model for Data Stewardship and Busines...
Competing IT Priorities - An Operating Model for Data Stewardship and Busines...
 
Modern Integrated Data Environment - Whitepaper | Qubole
Modern Integrated Data Environment - Whitepaper | QuboleModern Integrated Data Environment - Whitepaper | Qubole
Modern Integrated Data Environment - Whitepaper | Qubole
 
001 More introduction to big data analytics
001   More introduction to big data analytics001   More introduction to big data analytics
001 More introduction to big data analytics
 
Building the enterprise data architecture
Building the enterprise data architectureBuilding the enterprise data architecture
Building the enterprise data architecture
 
The Death of the Star Schema
The Death of the Star SchemaThe Death of the Star Schema
The Death of the Star Schema
 
Slides: Migrate BI Dashboards to Run Directly on a Cloud Data Lake in Five Ea...
Slides: Migrate BI Dashboards to Run Directly on a Cloud Data Lake in Five Ea...Slides: Migrate BI Dashboards to Run Directly on a Cloud Data Lake in Five Ea...
Slides: Migrate BI Dashboards to Run Directly on a Cloud Data Lake in Five Ea...
 
Data Warehouse - a Fit-For-Purpose Approach
Data Warehouse - a Fit-For-Purpose ApproachData Warehouse - a Fit-For-Purpose Approach
Data Warehouse - a Fit-For-Purpose Approach
 
Accelerating Fast Data Strategy with Data Virtualization
Accelerating Fast Data Strategy with Data VirtualizationAccelerating Fast Data Strategy with Data Virtualization
Accelerating Fast Data Strategy with Data Virtualization
 
Netspend: Maintaining "High Operations Tempo" via Multidomain MDM
Netspend: Maintaining "High Operations Tempo" via Multidomain MDMNetspend: Maintaining "High Operations Tempo" via Multidomain MDM
Netspend: Maintaining "High Operations Tempo" via Multidomain MDM
 
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
 
ADV Slides: The Data Needed to Evolve an Enterprise Artificial Intelligence S...
ADV Slides: The Data Needed to Evolve an Enterprise Artificial Intelligence S...ADV Slides: The Data Needed to Evolve an Enterprise Artificial Intelligence S...
ADV Slides: The Data Needed to Evolve an Enterprise Artificial Intelligence S...
 
Using Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-PurposeUsing Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-Purpose
 
Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best Practices
 
Role of Database Management System in A Data Warehouse
Role of Database Management System in A Data Warehouse Role of Database Management System in A Data Warehouse
Role of Database Management System in A Data Warehouse
 
The Merger is Happening, Now What Do We Do?
The Merger is Happening, Now What Do We Do?The Merger is Happening, Now What Do We Do?
The Merger is Happening, Now What Do We Do?
 
Enterprise Data Management
Enterprise Data ManagementEnterprise Data Management
Enterprise Data Management
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 

Similar to Managing Data Warehouse Growth in the New Era of Big Data

¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?Denodo
 
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...Denodo
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadhMithlesh Sadh
 
New Innovations in Information Management for Big Data - Smarter Business 2013
New Innovations in Information Management for Big Data - Smarter Business 2013New Innovations in Information Management for Big Data - Smarter Business 2013
New Innovations in Information Management for Big Data - Smarter Business 2013IBM Sverige
 
Traditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonTraditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonCapgemini
 
Agile BI: How to Deliver More Value in Less Time
Agile BI: How to Deliver More Value in Less TimeAgile BI: How to Deliver More Value in Less Time
Agile BI: How to Deliver More Value in Less TimePerficient, Inc.
 
Education Seminar: Self-service BI, Logical Data Warehouse and Data Lakes
Education Seminar: Self-service BI, Logical Data Warehouse and Data LakesEducation Seminar: Self-service BI, Logical Data Warehouse and Data Lakes
Education Seminar: Self-service BI, Logical Data Warehouse and Data LakesDenodo
 
DAS Slides: Cloud-Based Data Warehousing – What’s New and What Stays the Same
DAS Slides: Cloud-Based Data Warehousing – What’s New and What Stays the SameDAS Slides: Cloud-Based Data Warehousing – What’s New and What Stays the Same
DAS Slides: Cloud-Based Data Warehousing – What’s New and What Stays the SameDATAVERSITY
 
Hadoop 2015: what we larned -Think Big, A Teradata Company
Hadoop 2015: what we larned -Think Big, A Teradata CompanyHadoop 2015: what we larned -Think Big, A Teradata Company
Hadoop 2015: what we larned -Think Big, A Teradata CompanyDataWorks Summit
 
Manish tripathi-ea-dw-bi
Manish tripathi-ea-dw-biManish tripathi-ea-dw-bi
Manish tripathi-ea-dw-biA P
 
Using hadoop for enterprise data management
Using hadoop for enterprise data managementUsing hadoop for enterprise data management
Using hadoop for enterprise data managementEstuate, Inc.
 
IBM InfoSphere Optim Solutions - Highlights
IBM InfoSphere Optim Solutions - HighlightsIBM InfoSphere Optim Solutions - Highlights
IBM InfoSphere Optim Solutions - HighlightsAdam Gartenberg
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...DataScienceConferenc1
 
Data Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with ClouderaData Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with ClouderaCaserta
 
MDM & BI Strategy For Large Enterprises
MDM & BI Strategy For Large EnterprisesMDM & BI Strategy For Large Enterprises
MDM & BI Strategy For Large EnterprisesMark Schoeppel
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on HadoopCaserta
 
Defining and Applying Data Governance in Today’s Business Environment
Defining and Applying Data Governance in Today’s Business EnvironmentDefining and Applying Data Governance in Today’s Business Environment
Defining and Applying Data Governance in Today’s Business EnvironmentCaserta
 

Similar to Managing Data Warehouse Growth in the New Era of Big Data (20)

¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
 
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 
New Innovations in Information Management for Big Data - Smarter Business 2013
New Innovations in Information Management for Big Data - Smarter Business 2013New Innovations in Information Management for Big Data - Smarter Business 2013
New Innovations in Information Management for Big Data - Smarter Business 2013
 
Traditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonTraditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A Comparison
 
Agile BI: How to Deliver More Value in Less Time
Agile BI: How to Deliver More Value in Less TimeAgile BI: How to Deliver More Value in Less Time
Agile BI: How to Deliver More Value in Less Time
 
Education Seminar: Self-service BI, Logical Data Warehouse and Data Lakes
Education Seminar: Self-service BI, Logical Data Warehouse and Data LakesEducation Seminar: Self-service BI, Logical Data Warehouse and Data Lakes
Education Seminar: Self-service BI, Logical Data Warehouse and Data Lakes
 
DAS Slides: Cloud-Based Data Warehousing – What’s New and What Stays the Same
DAS Slides: Cloud-Based Data Warehousing – What’s New and What Stays the SameDAS Slides: Cloud-Based Data Warehousing – What’s New and What Stays the Same
DAS Slides: Cloud-Based Data Warehousing – What’s New and What Stays the Same
 
Hadoop 2015: what we larned -Think Big, A Teradata Company
Hadoop 2015: what we larned -Think Big, A Teradata CompanyHadoop 2015: what we larned -Think Big, A Teradata Company
Hadoop 2015: what we larned -Think Big, A Teradata Company
 
Manish tripathi-ea-dw-bi
Manish tripathi-ea-dw-biManish tripathi-ea-dw-bi
Manish tripathi-ea-dw-bi
 
Using hadoop for enterprise data management
Using hadoop for enterprise data managementUsing hadoop for enterprise data management
Using hadoop for enterprise data management
 
Why data governance is the new buzz?
Why data governance is the new buzz?Why data governance is the new buzz?
Why data governance is the new buzz?
 
IBM InfoSphere Optim Solutions - Highlights
IBM InfoSphere Optim Solutions - HighlightsIBM InfoSphere Optim Solutions - Highlights
IBM InfoSphere Optim Solutions - Highlights
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
 
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
 
Data Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with ClouderaData Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with Cloudera
 
MDM & BI Strategy For Large Enterprises
MDM & BI Strategy For Large EnterprisesMDM & BI Strategy For Large Enterprises
MDM & BI Strategy For Large Enterprises
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on Hadoop
 
Defining and Applying Data Governance in Today’s Business Environment
Defining and Applying Data Governance in Today’s Business EnvironmentDefining and Applying Data Governance in Today’s Business Environment
Defining and Applying Data Governance in Today’s Business Environment
 

Managing Data Warehouse Growth in the New Era of Big Data

  • 1. Colin White President, BI Research December 5, 2012 Managing Data Warehouse Growth in the New Era of Big Data
  • 3. 3 Speakers Vineet Goel Product Manager, IBM InfoSphere Optim Colin White President, BI Research
  • 4. Colin White President, BI Research TDWI/IBM Webinar, December 2012 Managing Data Warehouse Growth in the New Era of Big Data
  • 5. The Evolution of Digital Data Copyright © BI Research, 2012 First OLTP systems First commercial RDBMSs Early decision support products Early data warehousing Big data & advanced analytics 2012 Increasing Data Volumes 5 Sabre 84,000 txs/day Sabre 60,000 txs/sec 25 TB EDW
  • 6. 5 Data Growth – Choose an Analyst! 6Copyright © BI Research, 2012 6 Source: IDC
  • 7. 5 Data Growth and Big Data Big data technologies apply to all types of digital data not just multi-structured data “Big” is a relative term and is different for each organization and application What you do with big data and how you use it for business benefit should be the main consideration – business analytics play a key role here Copyright © BI Research, 2012 7
  • 8. 5 The Value of Data: IBM 2012 Study 8Copyright © BI Research, 2012 8
  • 9. 5 The Value of Data: IBM 2012 Study 9Copyright © BI Research, 2012 9
  • 10. 5 The Changing World of Business Analytics Advanced Analytics • Improved analytic tools and techniques for statistical and predictive analytics • New tools for exploring and visualizing new varieties of data • Operational intelligence with embedded BI services and BI automation Big Data Management • Analytic relational database systems that offer improved price/performance and libraries of analytic functions • Non-relational systems such as Hadoop for handling new types of data • Stream processing/CEP systems for analyzing in-motion data • In-memory computing for high performance Copyright © BI Research, 2012 10
  • 11. 5 Advanced Analytics Example: SC Digest 2012 1 1Copyright © BI Research, 2012 11 Advanced Analytics in Supply Chain, Dr. Michael Watson, Supply Chain Digest, November 2012
  • 12. 5 Advanced Analytics Example: SC Digest 2012 1 2Copyright © BI Research, 2012 12 Advanced Analytics in Supply Chain, Dr. Michael Watson, Supply Chain Digest, November 2012 1. Descriptive analytics – using historical data to describe the business. This is usually associated with Business Intelligence (BI) or visibility systems. In supply chain, you use descriptive analytics to better understand your historical demand patterns, to understand how product flows through your supply chain, and to understand when a shipment might be late. 2. Predictive analytics – using data to predict trends and patterns. This is commonly associated with statistics. In the supply chain, you use predictive analytics to forecast future demand or to forecast the price of fuel. 3. Prescriptive analytics – using data to suggest the optimal solution. This is commonly associated with optimization. In the supply chain, you use prescriptive analytics to set your inventory levels, schedule your plants, or route your trucks.
  • 13. What Then is Big Data? Represents analytic and data management solutions that could not previously be supported because of: • Technology limitations – poor performance, inadequate analytic capabilities, etc. • High hardware and software costs • Incomplete or limited data for generating the required solutions Set of overlapping technologies that enable customers to deploy analytic systems optimized to suite specific business needs and workloads Optimization may involve improving performance, reducing costs, enabling new types of data to be analyzed, etc. Copyright © BI Research, 2012 13
  • 14. 5Copyright © BI Research, 2012 Big Data and Data Life Cycle Management Data Management and Analytic Performance • Capacity planning • Managing data warehouse growth • Analytic performance management and optimization • Service level agreements Data Governance • Security: user access, encryption, masking, etc. • Quality: governed/ungoverned data • Backup and recovery • Archiving and retention: historical analysis, compliance 14
  • 15. 5 The Impact of Big Data on the Data Life Cycle 1 5Copyright © BI Research, 2012 15 Need fast time to value to quickly gain business benefits from big data • Impractical to use traditional EDW approach for all analytic solutions • Extend existing data warehousing environment to support big data and accommodate data growth Need high performance solutions for supporting big data analytic workloads • One-size fits all data management is no longer viable • Match technologies and costs to business needs and analytic workloads Need improved data governance to handle big data • No longer practical to rigidly control and govern all forms of data – implement different levels of governance based on security, compliance and quality needs • Determine data archiving and policies based on the possible future need to analyze historical data and data compliance requirements
  • 16. 5 The New Extended Data Warehouse IMPROVE EXTEND Copyright © BI Research, 2012 16
  • 17. 5Copyright © BI Research, 2012 The Quest for Performance: Optimized Systems 17
  • 18. 10Copyright © BI Research, 2012 Optimized Systems: Hardware Directions Faster processors Multi-core processors Intelligent hardware 64-bit memory spaces Large-capacity disk drives Fast hard-disk and solid-state drives Hybrid storage configurations Scale-up/out parallel processing configurations Lower-cost hardware (blades, clusters) Reduced power and cooling requirements Packaged hardware/software appliances 18
  • 19. 1 9Copyright © BI Research, 2012 Managing Data Warehouse Growth: Storage Options Large-capacity hard-disk drives (HDD) • More economical, less reliable, slower performance, e.g., SATA drives in white-box H/W • Often “short-stroked” to improve performance High-performance hard-disk drives (HDD) • More expensive, more reliable, better performance, e.g., enterprise SAS drives Solid-state drives (SSDs) • High and consistent performance • Better reliability and more energy efficient • Distinguish between commodity SSDs and enterprise SSDs Dynamic RAM (DRAM) • Best performance - eliminates I/O overheads • Use by in-memory computing systems 19
  • 20. 2 0Copyright © BI Research, 2012 Managing Data Warehouse Growth: Tiered Storage Hybrid (tiered) storage • Employs a combination of different storage devices and options • Data location is based on availability, usage, and performance requirements Some DBMSs support system- enabled data migration • Table/partition vs block-level migration • Administrator-driven vs automatic migration • Consider relationship to workload management and data archiving & retention strategies High-capacity HDDs High-performance HDDs Hot data Warm data Cold data Performance Tier Performance & capacity Tier High- performance RAM & SSDs Capacity & archive tier 20
  • 21. 5Copyright © BI Research, 2012 Managing DW Growth: Archiving/Retention - 1 The price/performance of new software and hardware technologies enables more data to be kept online longer Data warehouse data is likely to be managed on multiple systems for performance, cost, or business reasons Policies are required to manage the archiving (for possible future analysis) and retention (for compliance) of data warehouse data These policies must be coordinated with corporate strategy especially in the area of compliance 21
  • 22. 5Copyright © BI Research, 2012 Managing DW Growth: Archiving/Retention - 2 Operational, analytical and collaborative computing are becoming inextricably linked and this should be considered when developing retention policies Data security, quality, archiving and retention are related, but may need different levels of governance Data archiving and retention solutions may be home grown, use best-of-breed products, or employ an enterprise-level software approach 22
  • 23. 5Copyright © BI Research, 2012 Summary Organizations are experiencing huge data growth Big data involves all digital data, but “big” is different for each organization and application Big data is a set of overlapping technologies that can provide huge business benefits but create some significant governance issues Big data involves the blending of governed and ungoverned data Impossible to govern all data and organizations need to implement flexible policies that: • Allow easy but secure access to data • Provide elastic capacity and good performance • Efficiently manage data archiving and retention 23
  • 24. © 2012 IBM Corporation Information Management Managing Data Warehouse Growth IBM InfoSphere Optim, Information Management
  • 25. © 2012 IBM Corporation Information Management IBM Big Data Strategy IBM Big Data Platform Systems Managem ent Application Developm ent Visualizati on & Discovery Accelerators Information Integration & Governance Hadoop System Stream Computing Data Warehous e New analytic applications drive the requirements for a big data platform • Integrate and manage the full variety, velocity and volume of data • Apply advanced analytics to information in its native form • Visualize all available data for ad-hoc analysis Need for trusted data drives requirements for an information integration and governance platform • Ensure the highest quality information • Master data into a single view • Secure and Protect sensitive data to minimize risk and exposure • Govern data throughout its lifecycle Information Integration & Governance BI / Reporting Exploration / Visualization Functional App Industry App Predictive Analytics Content Analytics Analytic Applications
  • 26. © 2012 IBM Corporation Information Management Governing Data Throughout its Lifecycle Validate test resultsDefine policies Report & retrieve archived data Enable compliance with retention & e-discovery Move only the needed information Integrate into single data source Create & refresh test data Manage data growthClassify & define data and relationships Develop database structures & code Enhance performanceDiscover where data resides Develop & Test Understand & Define Optimize & Archive Consolidate & Retire Information Governance Core Disciplines Lifecycle Management
  • 27. © 2012 IBM Corporation Information Management Data Warehouse Challenges Related to Data Growth Increasing Costs and Time to Market Poor Data Warehouse Performance Managing Risk & Compliance Slow-performing business intelligence (BI) & analytics solutions due to unchecked data growth. The “keep everything” strategy impacts supporting the complex data retention and disposal compliance. The impact of exponential data growth on infrastructure and operational costs to keep up with data volumes.
  • 28. © 2012 IBM Corporation Information Management 2828 Technical Challenges with High Volume Data Growth  No policies and processes to manage multi-temperature data cost-effectively  Transaction & data volume growth drive up infrastructure costs  Multiple instances of production data – backup, DR, training, test – can compound data growth  Large production instances affect cost & operations of backups, DR, non-prod  Archival processes are complex requiring full business context for e-discovery
  • 29. © 2012 IBM Corporation Information Management What is Data Archiving? Manage data growth and improve performance by intelligently archiving historical data Universal Access to Application Data (ODBC/JDBC) Report WriterApplication Requirements Benefits • Reduce hardware, storage and maintenance costs • Improve application performance • Support data retention or disposal requirements • Archive, manage and retain application data according to business policies • Maintain Referential Integrity • Support tiered archiving to secondary database or file system
  • 30. © 2012 IBM Corporation Information Management Tiered Storage for Information Lifecycle Management Non DBMS Retention Platform ATA File Server EMC Centera IBM RS550 HDS Compressed Archives Offline Retention Platform CD Tape Optical Compressed Archives Production Database Archive Definition s Archive Restore Archive Reporting Database Compressed Archives Online Archive 5-6 years Offline Archive 7+ years Current Data 1-2 years Active Historical 3-4 years
  • 31. © 2012 IBM Corporation Information Management Benefits of Archiving for the Data Warehouse Archive Archive Archive Reduce Costs Improve Performance Minimize Risk Reduce TCO by leveraging a tiered- storage archiving strategy -Reclaim space & defer hardware upgrades -Reduce operational and storage costs Increase performance by archiving out dormant data. -Faster analytical processing, ETL and reporting. -Faster backup, restores and maintenance tasks. Support data retention or legal hold requirement. -Capture the precise sets of data required based on your audit or e- discovery. -Support defensible disposal of data
  • 32. © 2012 IBM Corporation Information Management Single, scalable, information lifecycle management solution provides a central point to deploy policies to extract, archive, subset, and protect application data records from creation to deletion Manage Data GrowthData MaskingTest Data Management Application RetirementDiscover Partner-delivered Solutions About IBM InfoSphere Optim Solutions
  • 33. © 2012 IBM Corporation Information Management Learn more  Webpage: InfoSphere Optim  Solution Sheet: InfoSphere Optim Solutions for the data warehouse  Demo: InfoSphere Optim archiving  Whitepaper: Control Application Data Before it Controls Your Business www.ibm.com/optim
  • 34. © 2012 IBM Corporation Information Management
  • 36. Contacting Speakers • If you have further questions or comments: Colin White, BI Research cwhite@bi-research.com Vineet Goel vineetg@us.ibm.com