DAMA NY CHAPTER PRESENTATION
“Big Data” &
“The Cloud”
Extreme Performance Data Warehousing
Inside Of The Cloud
Robert J. Abate, CBIP, CDMP
Solutions Principal, EIM & Analytics Practice
EMC ConsultingC Co su t g
January 19th, 2012
1© Copyright 2012 EMC Corporation. All rights reserved.
DAMA NY CHAPTER PRESENTATION
Big Data & The Cloud
• Background & Definitions
AGENDA
• Background & Definitions
• The Challenge
A hit t l S l ti T Bi D t• Architectural Solutions To Big Data
• It’s A Brave New World
• Example Case Studies
• Open Discussion…
2© Copyright 2012 EMC Corporation. All rights reserved.
BackgroundBackground
&&
DefinitionsDefinitions
3© Copyright 2012 EMC Corporation. All rights reserved.
“Big data will represent a hugely disruptive force during
the next five years – enabling levels of insight – that are
currently unachievable through any other means”
4© Copyright 2012 EMC Corporation. All rights reserved.
currently unachievable through any other means” Gartner May 2011
We Are Awash In Data
• In the information age, every organization is in the “data”
business
• Data is growing exponentially, so are the challenges
• Complexity is causing insight to be lost
Source: IDC Digital Universe White Paper,
Sponsored by EMC, May 2009
5© Copyright 2012 EMC Corporation. All rights reserved.
Spo so ed by C, ay 009
Pictorial Representation Of Information
6© Copyright 2012 EMC Corporation. All rights reserved.
Big Data: More Than Just About Volume
i l
• Consider: Master Data, Fidelity,
Complexity, Validity, Perishability,
Linking Data
Velocity Volume
Video
Transactional
DataIndustry-
specific
Web traffic
• Structured Transactional Data:
POS transactions, call detail
records, credit card transactions,
shipping updates purchase orders
Text
Social
shipping updates, purchase orders,
payments, shipments, account
transactions
• Unstructured Data: Web logs,
Variety
Complexity
Sensor/
newsfeeds, social media, geo-
location, mobile, consumer
comments, claims, doctor’s notes,
clinical studies, images, video,
Smart Grid
Images
Audio
Documents
location-
based
audio
• Device-generated Data: RFID
sensors, smart meters, smart
grids GPS spatial micro-payments
7© Copyright 2012 EMC Corporation. All rights reserved.
grids, GPS spatial, micro payments
The Typical BI/DW Environment Today…
8© Copyright 2012 EMC Corporation. All rights reserved.
Big Data’s Potential For Actionable Insight
Today’s Situation Big Data Ramifications
 Vast majority of available Less than 10% of the
 Forward looking or
“Wi d hi ld i ”
Vast majority of available
sources and external data
 “Rear-view” mirror
i d hb d d
Less than 10% of the
enterprise’s data
“Windshield-view”
predictions with
recommendations
Re l time ne e l time
reporting, dashboards and
analysis
– Weeks, months, or even
quarters old
 Correlated, high confidence,
governed data
– Real-time near real-time
 Incomplete, inaccurate, and
disjointed data
quarters old
 Vastly accelerated time to
market
governed data
 Architectures and methods
that take 6 to 18 months to
exploit
j
9© Copyright 2012 EMC Corporation. All rights reserved.
exploit
“Th Ti V l C ”“Th Ti V l C ”
Time Really is Money!
“THE TIME VALUE CURVE”
© 2007 - Dr. Richard Hackathorn, Bolder Technology, Inc., All Rights Reserved. Used with Permission.
“The Time Value Curve”“The Time Value Curve”
Value
ostost
Business EventBusiness Event
CaptureCapture
ValueLoValueLo
Latency
Analysis
Latency
Data Ready For Analysis
Information Delivered
Latency
Analysis
Latency
Data Ready For Analysis
Information Delivered
A ti TiA ti Ti
Action
TakenTaken
Decision
Latency
Decision
Latency
Data
Lifecycle
Action TimeAction Time
Time
10© Copyright 2012 EMC Corporation. All rights reserved.
Lifecycle
Data Is Coming At Us Faster
In a recent TDWI survey of 450 CIO’s
17% have a real time data warehouse– 17% have a real time data warehouse
– 90% plan on having a real time warehouse
% ill l l i– 75% will replace to get to a real-time
solution
“REAL TIME IS A RAPIDLY BECOMING
A NECESSARY FOUNDATION TO AA NECESSARY FOUNDATION TO A
DATA SOLUTION AND WITHOUT
ARCHITECTURE THERE IS CHAOS!”
11© Copyright 2012 EMC Corporation. All rights reserved.
Data Is Coming From All Directions
Data is now commonly entering into the
enterprise from external sourcesp
– Government (Census, Revenues, …)
– Neilson, NPD Group (Sales), p ( )
– Bloomberg, NYSE (Financial Position)
– Experian, TransUnion, Equifax (CreditExperian, TransUnion, Equifax (Credit
Reporting)
– Google Maps, MapInfo (Geospatial, …)
– Radian 6, Biz360, … (Client Trend Data)
– Etc.
12© Copyright 2012 EMC Corporation. All rights reserved.
Need For Data Trust
C li ith l Compliance with laws
– Revenue Canada, Sarbanes Oxley [SOX], BASIL II, HIPAA,
etc.
L k f fid i th d t Lack of confidence in the data
– Reports utilizing same data do not report same totals or
computations
D t t d fi d d dil il bl Data not defined and readily available
– Multiple sources of data have to be rationalized at each
project start-up thereby wasting valuable time & $ on
every projecty p j
 Data timeliness
– Manual process to collect, analyze and provide results
Data integ it Data integrity
– Unknown filters, varying calculation/computations, fields
used for data not indicative of field names, data passed
along from one person to another to another to another…..
13© Copyright 2012 EMC Corporation. All rights reserved.
g p
Summation Of Challenges We Are
ObservingObserving
• Business mandate to obtain more value out
of the data (get answers)of the data (get answers)
• Variety of sources, amounts, types and
granularity of data that customers want to
integrate is growing exponentially
• Need to shrink the latency between the
b i d h d il bili fbusiness event and the data availability for
analysis and decision-making
• Advancing agility of information is key• Advancing agility of information is key
• Need for Data trust and Compliance with
regulations
14© Copyright 2012 EMC Corporation. All rights reserved.
regulations
TheThe
ChallengeChallenge
Of Big DataOf Big Data
15© Copyright 2012 EMC Corporation. All rights reserved.
“Old” Journey To Information Maturity [EIM]
Data Chaos
• Same type of data
means different things
in different systems
E AT&T i th
Master Data
• Publish and
Subscribe to
master data
Ex: Single view of
Data Analytics
• Analyzing the data.
• Looking for trends and
correlations
• Ex: AT&T is the same
as AT&T Inc
• Ex: Single view of
customer across all
information
systemsData Discovery Data Governance Data Integration Data MiningPROCESSES
Data Chaos Defined Data Master Data
Integrated
Information
Data
Analytics
Business
Optimization
Defined Data Integrated Predictive
Data Discover Metadata ETL Suite BI / DW / OLAPTOOLS
Defined Data
• Define common
meanings.
• Ex: Determine the
sources, types, and
f d
Integrated
Information
• Bring metadata
together with
information for
Predictive
Information
• Using the analyzed data to
optimize operations
• Wiki Type Sharing Of Self-
16© Copyright 2012 EMC Corporation. All rights reserved.
properties of grouped
(i.e.: customer)
records
reporting (BI) and
warehousing (drilling
and hierarchies).
Provisioned Environments
• Atomic Data Analytics
The Information Issue IsThe Information Issue Is…
Too many organizations are not using
information to its full advantage:information to its full advantage:
– 1 in 3 business leaders frequently make
critical decisions without the information
they need
– 1 in 2 business leaders do not have access
to the information across their organizationto the information across their organization
needed to do their jobs.
– 3 in 4 business leaders say more predictivey p
information would drive better decisions
17© Copyright 2012 EMC Corporation. All rights reserved.
Source:Source: IBM Institute for Business Value, March 2009
Information Trust & Business Alignment
 Harris Interactive recently polled 23,000 U.S.
employees and found
Only 37% said they have a clear understanding of– Only 37% said they have a clear understanding of
what
their organization is trying to achieve and why
O l i fi th i ti b t th i t– Only one in five was enthusiastic about their team
and the organization’s / corporation’s goals
– Only one in five said they have a clear “line of sight”
between their tasks and their team and organization’s
goals
– Only 15% felt that their organization fully enablesy g y
them
to execute key goals
– Only 20% fully trusted the organization they work for
18© Copyright 2012 EMC Corporation. All rights reserved.
Only 20% fully trusted the organization they work for
Viewed Using An Seasonal Analogy…
 If a football team had
these players on the
fi ldfield:
– Only 4 of the 11 players on
the field would know which
goal is theirs
– Only 6 of the 11 would care
– Only 3 of the 11 would knowOnly 3 of the 11 would know
what position they play and
what they are supposed to do
– 9 players out of 11 would, in9 players out of 11 would, in
some way, be competing
against their own team rather
than the opponent
19© Copyright 2012 EMC Corporation. All rights reserved.
pp
Perceived Complicated Landscape
• BI/DW is perceived as not “enabling” the business
– Inhibitor to corporate progress IT systems cannot be
changed fast enough to meet market demands, seizeg g
opportunity or comply with a new requirement.
– Weak alignment between IT and business strategy
Marked by an intractable language barrier.
i l h f i– Business not always sure what Information or
Dimensions they want or need How can IT provide
without requirements?
BI/DW is not known as the source of innovations– BI/DW is not known as the source of innovations
• The complexity of systems has caused BI/DW to be
reactive rather than proactive
– Silo’d solutions, db’s and applications with trapped business
rules
– Multiple sources of information and no single “truth”
No “Architectural Blueprints” to the enterprise
20© Copyright 2012 EMC Corporation. All rights reserved.
– No “Architectural Blueprints” to the enterprise…
The Business Intelligence Maturity Model
21© Copyright 2012 EMC Corporation. All rights reserved.
Advancing The Maturity Of Information…
22© Copyright 2012 EMC Corporation. All rights reserved.
The big data impacts to both business and IT are significant;
early adopters will fundamentally change their industries
• More agile, more real-time, more accurate
decision-making
Business Expectations IT Ramifications
• Enhanced user experience that delivers
insights to any deviceg
• Predict and spot changes in dynamic and
volatile markets
• Deeper understanding of customer
preferences and behavior
• Greater fidelity in risk assessment and
li f t
g y
• Operationalization of data scientists and
analytic insights
• Tools and processes for data quality,
governance, and security
• Cloud for self-service, collaboration, agility,
d t d ticompliance enforcement and cost reduction
“Big data poses a major opportunity for CIOs to drive
added value for the business by deriving insights andadded value for the business, by deriving insights and
identifying patterns from the huge amounts of data
available”
“Through 2015, organizations integrating high value, diverse new information sources and types into
a coherent information management infrastructure will outperform industry peers financially by more
than 20%”
23© Copyright 2012 EMC Corporation. All rights reserved.
Source: Gartner
"The New Value Integrator," Insights from the Global Chief Financial Officer Study”
July 2011
ArchitecturalArchitectural
Solutions ForSolutions For
Big Datag
24© Copyright 2012 EMC Corporation. All rights reserved.
Big Data Requires Change…g q g
 Consider 100 GB would store the entire US
Census DB “basic” information set for everyCensus DB “basic” information set for every
living human being on the planet:
Age Sex Income Ethnicity Language Religion– Age, Sex, Income, Ethnicity, Language, Religion,
Housing Status, Location into a 128 bit set
– That equates to about 6.75 millions rows ofat equates to about 6 5 o s o s o
about 10 collumns
 Consider the Large Hadron Collinder at CERN
– Expected to produce 150,000 times as much raw
data each year
25© Copyright 2012 EMC Corporation. All rights reserved.
The Big Change In Technologies
 Consider that Relational
technologies were
invented to get data ininvented to get data in
and organized, not
designed nor organized
t t it tto get it out
– RDBMS’s were designed
for efficient
transactions processing
on large data sets
▪ Adding, Updating
▪ Searching for &
retrieving small
amounts of data
26© Copyright 2012 EMC Corporation. All rights reserved.
[2] Source: ACM Website “The Pathologies of Big Data”, Adam Jacobs, 7/6/09
Data Warehouses Were An Answer
DW l i ll d i d “ f DW was classically designed as “copy of
transaction data specifically structured for
query and analysis”query and analysis
– General approach is bulk ETL into a DB designed
for queries
 Big data changes the answer
– “Traditional RDBMS-based dimensional modeling
and cube-based OLAP turns out to be to slow orand cube based OLAP turns out to be to slow or
to limited to support asking the really interesting
questions of warehoused data”[2]
“To achieve acceptable performance for highly order-dependent
queries on truly large data, one must be willing to consider
abandoning the purely relational database model[2]”
27© Copyright 2012 EMC Corporation. All rights reserved.
[2] Source: ACM Website “The Pathologies of Big Data”, Adam Jacobs, 7/6/09
Voluminous Data Sets…
What makes large data sets are
repeated observations over time/spacerepeated observations over time/space
– Web log has M’s visits over handful pages
Retailer has 10K products M custs but B trans– Retailer has 10K products, M custs, but B trans
– Hi-Res Scientific like fMRI 1K GB per view
L d t t  S ti l T l di ’– Large datasets  Spatial or Temporal dim’s
Cardinalities (distinct observations) is
usually small with regard to total # of
observations
28© Copyright 2012 EMC Corporation. All rights reserved.
Technology Solutions Appeared…
29© Copyright 2012 EMC Corporation. All rights reserved.
Lets Talk Technical Solutions…
 Sequential and/or Distributed File-Based
Solutions
– Oracle Exadata, Hadoop, etc.
 Columnar (compression) / Multi-Level Tables( p ) /
– Solves challenge of retrieving entire row
– Par-Excel, Vertica, Sybase, etc.
 Distributed MPP
– Teradata, Greenplum, etc.
 Polymorphic
– Combination of Columnar & MPP
30© Copyright 2012 EMC Corporation. All rights reserved.
Finding Answers Sequentially With OLTP
 Random access is slower than sequential
 The advantage gained by doing all datag g y g
access in sequential order is often 4x – 10x
– Many orders of magnitude !
31© Copyright 2012 EMC Corporation. All rights reserved.
[2] Source: ACM Website “The Pathologies of Big Data”, Adam Jacobs, 7/6/09
Distributed File: Partitioning With OLTP
Partitioning can solve challenges of dataPartitioning can solve challenges of data
growth, but true distributed processing
utilizing MPP is best (author’s opinion)
32© Copyright 2012 EMC Corporation. All rights reserved.
utilizing MPP is best
Distributed File: Partitioning Viewed
Q: What was the total
transactions (sales)
amount for May 20 and
May 21 2009?
Sales Table
5/17
May 21 2009?
5/17
5/18
Only the 2
Select sum(sales_amount)
From SALES
5/19
5/20
relevant
partitions
are read
Where sales_date between
to_date(‘05/20/2009’,’MM/DD/YYYY’)
And
5/20
5/21
to_date(‘05/22/2009’,’MM/DD/YYYY’);
5/22
33© Copyright 2012 EMC Corporation. All rights reserved.
Source: Extreme Performance With Oracle Data Warehousing
Distributed File: Open Source (Hadoop)
 Apache Hadoop is a software framework that supports data-intensivep p pp
distributed applications under a free license.
– It enables applications to work with thousands of nodes and petabytes of data
– Hadoop was inspired by Google's MapReduce and Google File System (GFS)
papers.papers.
 Hadoop is a top-level Apache project being built and used by a global
community of contributors using the Java programming language.
– Yahoo! has been the largest contributor to the project, and uses Hadoop
extensively across its businesses
34© Copyright 2012 EMC Corporation. All rights reserved.
extensively across its businesses.
Source: Wikipedia “Hadoop”
Distributed File: Hash-Based Distribution
 In a hash-based data distribution, the data is
distributed across multiple platforms for
ll li f iparallelism of queries…
35© Copyright 2012 EMC Corporation. All rights reserved.
Columnar: Storage
 In a table with say 256 columns, a lookup will retrieve all
the data in the row (disk bound)
 Columnar storage reduces this I/O bandwidth by storingg / y g
column data using compression
– State (50 combinations stored)
– Master (compressed) table has pointers to State
36© Copyright 2012 EMC Corporation. All rights reserved.
( p ) p
Source: Vertica Website
Columnar: Multi-Level Table Partitioning
 In multi-level table partitioning, data distribution
occurs across multiple platforms in segmentedp p g
tables for distribution of columnar queries
 This reduces the amount of work performed by
each platfo m
37© Copyright 2012 EMC Corporation. All rights reserved.
each platform
MPP Shared Nothing Architectures
 Extreme scalability
 Elastic Expansion & Self-Healing Fault-Tolerance
 Unified Analytics
38© Copyright 2012 EMC Corporation. All rights reserved.
y
Source: “Greenplum Database 4.0: Critical Mass Innovation”,
White Paper, August 2010
MPP Shared Nothing Architectures
39© Copyright 2012 EMC Corporation. All rights reserved.
Source: “Greenplum Database 4.0: Critical Mass Innovation”,
White Paper, August 2010
The “Ideal” – MPP Shared Nothing
Poly-Morphic Storage
Tabular, Columnar,
NoSQL, etc.
40© Copyright 2012 EMC Corporation. All rights reserved.
It’s A BraveIt s A Brave
New WorldNew World
41© Copyright 2012 EMC Corporation. All rights reserved.
From the Old Stack to a New Ecosystem:
Drivers for Changeg
 Many new data sources (organic growth, data services, M&A)
– Impractical to add new data sources because of tightly coupled pipeline
M t t d d t i l di i l di More unstructured data, including social media
– Lack of access to unstructured data; need analytics and classifiers that operate on it
 Less up front data integration
– Can’t assume data is pre-integrated – have to be able to locate and to query federated– Can t assume data is pre-integrated – have to be able to locate and to query federated
sources of data and content
 More need to track and leverage metadata
– Metadata is fragmented, jailed and inconsistent – need agile, community approach
 Need for flexible, agile data structures
– Current structures are too rigid, and too close to the sources or the business reports
 More emphasis on dynamic views for purposeo e e p as s o dy a c e s o pu pose
– Need dynamic planning, creation and structuring of views that support analytics
 Information governance and management in a federated, regulated world
– Need flexible policy expression and enforcement, not just at point of access
42© Copyright 2012 EMC Corporation. All rights reserved.
An Information Platform with New DNA
To Promote Agility Business Value and CommunityTo Promote Agility, Business Value and Community
1. Coordinated ingestion of diverse information, changes, events
2. Metadata driven processing and management
3. Nuanced optimization – on demand, multi-source, matching information
needs
4. Broader reach of query – contextual search, federation, materialization
5. Freedom from imposed information structure – roll your own structure!
6 Navigation through information – contextual faceted multi-dimensional6. Navigation through information contextual, faceted, multi dimensional
7. Visualization of information – heat, clouds, clusters, flows
8. New data paths engendered by patterned consumption of entities
9 R i b t d t t l ti d i ti f h d bli ti9. Reasoning about data set location, derivation, freshness, and obligations
10. User empowerment – collaboration and talent development
43© Copyright 2012 EMC Corporation. All rights reserved.
Businesses Want Integrated, Timely Information
for Purposefor Purpose
Area Revolution
Latency “Microbatch is the new Batch”
Enrichment “Tagging is the new Transformation”
Query “Query is the new ETL”
Federation “Query Director is the new Query
Optimizer”
Source “Purposeful View is the new Master”
44© Copyright 2012 EMC Corporation. All rights reserved.
Some Of The Newer Trends In Big Data
 Powerful Analytics
– What if, What will happen next, …, pp ,
– Self-service analytics?
▪ Build your own sandbox of data…u d you o sa dbo o da a
 Data Cloud Surrounded Warehouse
– Data Virtualization– Data Virtualization
▪ Abstracting the data from the systems, it complements
existing data warehouses
– Many times the size of structured warehouse
– Provides for rapid analytic iterations
45© Copyright 2012 EMC Corporation. All rights reserved.
p y
When You Link Structured &
Unstructured Information You Get…
46© Copyright 2012 EMC Corporation. All rights reserved.
Powerful Analytical Engines
What is the best price to sell my product?
47© Copyright 2012 EMC Corporation. All rights reserved.
How Do I Do This?How Do I Do This?
48© Copyright 2012 EMC Corporation. All rights reserved.
How Do I Do This #2?How Do I Do This #2?
49© Copyright 2012 EMC Corporation. All rights reserved.
How Do I Do This #3?How Do I Do This #3?
50© Copyright 2012 EMC Corporation. All rights reserved.
Visualize The Information…
51© Copyright 2012 EMC Corporation. All rights reserved.
Analytics: A Picture Is Worth A 1,000
WordsWords
52© Copyright 2012 EMC Corporation. All rights reserved.
Data Virtualization Example
53© Copyright 2012 EMC Corporation. All rights reserved.
Data Virtualization In Practice
54© Copyright 2012 EMC Corporation. All rights reserved.
Enterprise Big Data Cloud
55© Copyright 2012 EMC Corporation. All rights reserved.
The Future Of Data Warehousing?
The “Ideal” AAbatebate Enterprise Data Cloud
 Truly Virtualized Data Environment
 Extreme Scale, Elastic Expansion
 Automated Metadata Discovery, Classification & Tagging
 Linearly Scalable Linearly Scalable
– Add 1x and get 2x performance
 Self – Service Provisioning
 Single Point Of Management
– Resource utilization optimization
 Secure, Unified Data Access – Single Point of Entry
– Portal based sharing of data sandboxes (wiki-type)
 Reduce TCO By Eliminating Excessive Licensing Fees
– Use of open source community to improve solution
56© Copyright 2012 EMC Corporation. All rights reserved.
ExampleExample
Case StudiesCase Studies
57© Copyright 2012 EMC Corporation. All rights reserved.
Telecomm Provider Learns A Lesson…
BIG DATA ANALYTICS USE CASE
e eco o de ea s esso
Before investing $M of dollars on infrastructure, a provider learned
where to invest their monies that would payoff…
Ch llChallenge
– 100TB Traditional EDW, Single Source Of Truth
– Operational Reporting & Financial Consolidation
– Heavy Governance And Control
– Unable To Support Critical Business Initiatives
– Customer Loyalty And Churn The #1 Business
Initiative From The CEO
Enterprise Data CloudEnterprise Data Cloud
Architecture-Based Solution
– Extracted Data From EDW & Other Sources
– Generated Social Graph From Call Detail
And Subscriber Data
– Within 2 Weeks Found “Connected”
Subscribers
7X More Likely To Churn Than Average Users
N D l i 1PB P d ti
58© Copyright 2012 EMC Corporation. All rights reserved.
– Now Deploying 1PB Production
Drive Multi-channel Campaign Optimization
BIG DATA ANALYTICS USE CASE
Drive Multi channel Campaign Optimization
Retailer increases in-flight multi-channel effectiveness with customer
and product insights
HIGH
ion
Legacy
System
Advanced
Analytics
oodOfConversi
Big Data
Analytics
I t t t b h i l d t ith
LOW
Likeliho
Monitor cross-
channel product
sales effectiveness
Integrate customer behavioral data with
social media sentiment data to yield new
market, product and campaign insights
59© Copyright 2012 EMC Corporation. All rights reserved.
Innovate With Big Data Analytics
BIG DATA ANALYTICS USE CASE
Innovate With Big Data Analytics
Big Data Analytics Accelerate Health Care 2.0 for Evidence-based Care Provider
HIGHHIGH
Care
Legacy
System BI Reporting
Big Data
Advanced
Analytics
QualityofC
Delivering 10 Years
g ata
Analytics
Associative Rule Mining and User External Data Sources Enable
LOW
Treatment
Pathways on
Treatment
Pathways on
Delivering 10 Years
Of Data In Seconds
Associative Rule Mining and User
Clustering Improves Pathways
External Data Sources Enable
Personalized Medicine
TRADITIONAL DATA LEVERAGED
a ays o
Summary Data
a ays o
All the Data
BIG DATA LEVERAGED
60© Copyright 2012 EMC Corporation. All rights reserved.
O h OfOpen Exchange Of
Ideas…Ideas…
Speaker Contact Information:Speaker Contact Information:
Robert J. Abate, CBIP, CDMP
robert.abate@emc.com
(201) 745-7680
61© Copyright 2012 EMC Corporation. All rights reserved.
Credits To Quoted Authors
Adam Jacobs i i f i 1010d I h h l h l d h Adam Jacobs is senior software engineer at 1010data Inc., where, among other roles, he leads the
continuing development of Tenbase, the company’s ultra-high-performance analytical database engine.
He has more than 10 years of experience with distributed processing of big datasets, starting in his
earlier career as a computational neuroscientist at Weill Medical College of Cornell University (where he
holds the position of Visiting Fellow) and at UCLA. He holds a Ph.D. in neuroscience from UC Berkeley and
a B.A. in linguistics from Columbia University. (QUOTED FROM: “The Pathologies of Big Data”, 7/6/09)a B.A. in linguistics from Columbia University. (QUOTED FROM: The Pathologies of Big Data , 7/6/09)
 Bill Schmarzo has over two decades of experience in data warehousing, BI and analytic applications
(Metaphor Computers, 1984). Bill authored the Business Benefits Analysis methodology that links an
organization’s strategic business initiatives with their supporting data and analytic requirements, and co-
authored with Ralph Kimball a series of articles on analytic applications. Bill has served on The Data
W h I i f l h h d f h l i li i i l Bill VP f A l iWarehouse Institute faculty as the head of the analytic applications curriculum. Bill was VP of Analytics at
Yahoo where he was responsible for the development of Yahoo’s Advertiser and Web Site analytics
products, including the delivery of “actionable insights” through a holistic user experience. For Business
Objects, Bill oversaw the Analytic Applications business unit including the development, marketing and
sales of Business Objects’ industry-leading analytic applications.
 Donald Sutton has over 20 years experience in Data Architecture, Analysis, Modeling, ETL,
Implementation and Integration in the areas of Data Entry (OLTP) or ERP and 3rd Party COTS
Applications, Operational Data Store (ODS), Master Data Store (MDS), Data Warehouse (DW) and Data
Marts (DM) while providing Business Intelligence (BI) from multiple sources above. Passionate and
motivated about sound design of data structures in all different data layers and the representation and
t f ti f d t ith th ti d f d t th h t ll d t l hiltransformation of data with the accounting and governance of data throughout all data layers while
Providing Business Intelligence (BI) and analytics with Key Performance Indicators (KPI) along with
business modeling in translating business requirements to data requirements. (QUOTED FROM: Current
Warehousing Environment & Analytics Visualizations)
62© Copyright 2012 EMC Corporation. All rights reserved.
DAMA Big Data & The Cloud 2012-01-19

DAMA Big Data & The Cloud 2012-01-19

  • 1.
    DAMA NY CHAPTERPRESENTATION “Big Data” & “The Cloud” Extreme Performance Data Warehousing Inside Of The Cloud Robert J. Abate, CBIP, CDMP Solutions Principal, EIM & Analytics Practice EMC ConsultingC Co su t g January 19th, 2012 1© Copyright 2012 EMC Corporation. All rights reserved.
  • 2.
    DAMA NY CHAPTERPRESENTATION Big Data & The Cloud • Background & Definitions AGENDA • Background & Definitions • The Challenge A hit t l S l ti T Bi D t• Architectural Solutions To Big Data • It’s A Brave New World • Example Case Studies • Open Discussion… 2© Copyright 2012 EMC Corporation. All rights reserved.
  • 3.
  • 4.
    “Big data willrepresent a hugely disruptive force during the next five years – enabling levels of insight – that are currently unachievable through any other means” 4© Copyright 2012 EMC Corporation. All rights reserved. currently unachievable through any other means” Gartner May 2011
  • 5.
    We Are AwashIn Data • In the information age, every organization is in the “data” business • Data is growing exponentially, so are the challenges • Complexity is causing insight to be lost Source: IDC Digital Universe White Paper, Sponsored by EMC, May 2009 5© Copyright 2012 EMC Corporation. All rights reserved. Spo so ed by C, ay 009
  • 6.
    Pictorial Representation OfInformation 6© Copyright 2012 EMC Corporation. All rights reserved.
  • 7.
    Big Data: MoreThan Just About Volume i l • Consider: Master Data, Fidelity, Complexity, Validity, Perishability, Linking Data Velocity Volume Video Transactional DataIndustry- specific Web traffic • Structured Transactional Data: POS transactions, call detail records, credit card transactions, shipping updates purchase orders Text Social shipping updates, purchase orders, payments, shipments, account transactions • Unstructured Data: Web logs, Variety Complexity Sensor/ newsfeeds, social media, geo- location, mobile, consumer comments, claims, doctor’s notes, clinical studies, images, video, Smart Grid Images Audio Documents location- based audio • Device-generated Data: RFID sensors, smart meters, smart grids GPS spatial micro-payments 7© Copyright 2012 EMC Corporation. All rights reserved. grids, GPS spatial, micro payments
  • 8.
    The Typical BI/DWEnvironment Today… 8© Copyright 2012 EMC Corporation. All rights reserved.
  • 9.
    Big Data’s PotentialFor Actionable Insight Today’s Situation Big Data Ramifications  Vast majority of available Less than 10% of the  Forward looking or “Wi d hi ld i ” Vast majority of available sources and external data  “Rear-view” mirror i d hb d d Less than 10% of the enterprise’s data “Windshield-view” predictions with recommendations Re l time ne e l time reporting, dashboards and analysis – Weeks, months, or even quarters old  Correlated, high confidence, governed data – Real-time near real-time  Incomplete, inaccurate, and disjointed data quarters old  Vastly accelerated time to market governed data  Architectures and methods that take 6 to 18 months to exploit j 9© Copyright 2012 EMC Corporation. All rights reserved. exploit
  • 10.
    “Th Ti Vl C ”“Th Ti V l C ” Time Really is Money! “THE TIME VALUE CURVE” © 2007 - Dr. Richard Hackathorn, Bolder Technology, Inc., All Rights Reserved. Used with Permission. “The Time Value Curve”“The Time Value Curve” Value ostost Business EventBusiness Event CaptureCapture ValueLoValueLo Latency Analysis Latency Data Ready For Analysis Information Delivered Latency Analysis Latency Data Ready For Analysis Information Delivered A ti TiA ti Ti Action TakenTaken Decision Latency Decision Latency Data Lifecycle Action TimeAction Time Time 10© Copyright 2012 EMC Corporation. All rights reserved. Lifecycle
  • 11.
    Data Is ComingAt Us Faster In a recent TDWI survey of 450 CIO’s 17% have a real time data warehouse– 17% have a real time data warehouse – 90% plan on having a real time warehouse % ill l l i– 75% will replace to get to a real-time solution “REAL TIME IS A RAPIDLY BECOMING A NECESSARY FOUNDATION TO AA NECESSARY FOUNDATION TO A DATA SOLUTION AND WITHOUT ARCHITECTURE THERE IS CHAOS!” 11© Copyright 2012 EMC Corporation. All rights reserved.
  • 12.
    Data Is ComingFrom All Directions Data is now commonly entering into the enterprise from external sourcesp – Government (Census, Revenues, …) – Neilson, NPD Group (Sales), p ( ) – Bloomberg, NYSE (Financial Position) – Experian, TransUnion, Equifax (CreditExperian, TransUnion, Equifax (Credit Reporting) – Google Maps, MapInfo (Geospatial, …) – Radian 6, Biz360, … (Client Trend Data) – Etc. 12© Copyright 2012 EMC Corporation. All rights reserved.
  • 13.
    Need For DataTrust C li ith l Compliance with laws – Revenue Canada, Sarbanes Oxley [SOX], BASIL II, HIPAA, etc. L k f fid i th d t Lack of confidence in the data – Reports utilizing same data do not report same totals or computations D t t d fi d d dil il bl Data not defined and readily available – Multiple sources of data have to be rationalized at each project start-up thereby wasting valuable time & $ on every projecty p j  Data timeliness – Manual process to collect, analyze and provide results Data integ it Data integrity – Unknown filters, varying calculation/computations, fields used for data not indicative of field names, data passed along from one person to another to another to another….. 13© Copyright 2012 EMC Corporation. All rights reserved. g p
  • 14.
    Summation Of ChallengesWe Are ObservingObserving • Business mandate to obtain more value out of the data (get answers)of the data (get answers) • Variety of sources, amounts, types and granularity of data that customers want to integrate is growing exponentially • Need to shrink the latency between the b i d h d il bili fbusiness event and the data availability for analysis and decision-making • Advancing agility of information is key• Advancing agility of information is key • Need for Data trust and Compliance with regulations 14© Copyright 2012 EMC Corporation. All rights reserved. regulations
  • 15.
    TheThe ChallengeChallenge Of Big DataOfBig Data 15© Copyright 2012 EMC Corporation. All rights reserved.
  • 16.
    “Old” Journey ToInformation Maturity [EIM] Data Chaos • Same type of data means different things in different systems E AT&T i th Master Data • Publish and Subscribe to master data Ex: Single view of Data Analytics • Analyzing the data. • Looking for trends and correlations • Ex: AT&T is the same as AT&T Inc • Ex: Single view of customer across all information systemsData Discovery Data Governance Data Integration Data MiningPROCESSES Data Chaos Defined Data Master Data Integrated Information Data Analytics Business Optimization Defined Data Integrated Predictive Data Discover Metadata ETL Suite BI / DW / OLAPTOOLS Defined Data • Define common meanings. • Ex: Determine the sources, types, and f d Integrated Information • Bring metadata together with information for Predictive Information • Using the analyzed data to optimize operations • Wiki Type Sharing Of Self- 16© Copyright 2012 EMC Corporation. All rights reserved. properties of grouped (i.e.: customer) records reporting (BI) and warehousing (drilling and hierarchies). Provisioned Environments • Atomic Data Analytics
  • 17.
    The Information IssueIsThe Information Issue Is… Too many organizations are not using information to its full advantage:information to its full advantage: – 1 in 3 business leaders frequently make critical decisions without the information they need – 1 in 2 business leaders do not have access to the information across their organizationto the information across their organization needed to do their jobs. – 3 in 4 business leaders say more predictivey p information would drive better decisions 17© Copyright 2012 EMC Corporation. All rights reserved. Source:Source: IBM Institute for Business Value, March 2009
  • 18.
    Information Trust &Business Alignment  Harris Interactive recently polled 23,000 U.S. employees and found Only 37% said they have a clear understanding of– Only 37% said they have a clear understanding of what their organization is trying to achieve and why O l i fi th i ti b t th i t– Only one in five was enthusiastic about their team and the organization’s / corporation’s goals – Only one in five said they have a clear “line of sight” between their tasks and their team and organization’s goals – Only 15% felt that their organization fully enablesy g y them to execute key goals – Only 20% fully trusted the organization they work for 18© Copyright 2012 EMC Corporation. All rights reserved. Only 20% fully trusted the organization they work for
  • 19.
    Viewed Using AnSeasonal Analogy…  If a football team had these players on the fi ldfield: – Only 4 of the 11 players on the field would know which goal is theirs – Only 6 of the 11 would care – Only 3 of the 11 would knowOnly 3 of the 11 would know what position they play and what they are supposed to do – 9 players out of 11 would, in9 players out of 11 would, in some way, be competing against their own team rather than the opponent 19© Copyright 2012 EMC Corporation. All rights reserved. pp
  • 20.
    Perceived Complicated Landscape •BI/DW is perceived as not “enabling” the business – Inhibitor to corporate progress IT systems cannot be changed fast enough to meet market demands, seizeg g opportunity or comply with a new requirement. – Weak alignment between IT and business strategy Marked by an intractable language barrier. i l h f i– Business not always sure what Information or Dimensions they want or need How can IT provide without requirements? BI/DW is not known as the source of innovations– BI/DW is not known as the source of innovations • The complexity of systems has caused BI/DW to be reactive rather than proactive – Silo’d solutions, db’s and applications with trapped business rules – Multiple sources of information and no single “truth” No “Architectural Blueprints” to the enterprise 20© Copyright 2012 EMC Corporation. All rights reserved. – No “Architectural Blueprints” to the enterprise…
  • 21.
    The Business IntelligenceMaturity Model 21© Copyright 2012 EMC Corporation. All rights reserved.
  • 22.
    Advancing The MaturityOf Information… 22© Copyright 2012 EMC Corporation. All rights reserved.
  • 23.
    The big dataimpacts to both business and IT are significant; early adopters will fundamentally change their industries • More agile, more real-time, more accurate decision-making Business Expectations IT Ramifications • Enhanced user experience that delivers insights to any deviceg • Predict and spot changes in dynamic and volatile markets • Deeper understanding of customer preferences and behavior • Greater fidelity in risk assessment and li f t g y • Operationalization of data scientists and analytic insights • Tools and processes for data quality, governance, and security • Cloud for self-service, collaboration, agility, d t d ticompliance enforcement and cost reduction “Big data poses a major opportunity for CIOs to drive added value for the business by deriving insights andadded value for the business, by deriving insights and identifying patterns from the huge amounts of data available” “Through 2015, organizations integrating high value, diverse new information sources and types into a coherent information management infrastructure will outperform industry peers financially by more than 20%” 23© Copyright 2012 EMC Corporation. All rights reserved. Source: Gartner "The New Value Integrator," Insights from the Global Chief Financial Officer Study” July 2011
  • 24.
    ArchitecturalArchitectural Solutions ForSolutions For BigDatag 24© Copyright 2012 EMC Corporation. All rights reserved.
  • 25.
    Big Data RequiresChange…g q g  Consider 100 GB would store the entire US Census DB “basic” information set for everyCensus DB “basic” information set for every living human being on the planet: Age Sex Income Ethnicity Language Religion– Age, Sex, Income, Ethnicity, Language, Religion, Housing Status, Location into a 128 bit set – That equates to about 6.75 millions rows ofat equates to about 6 5 o s o s o about 10 collumns  Consider the Large Hadron Collinder at CERN – Expected to produce 150,000 times as much raw data each year 25© Copyright 2012 EMC Corporation. All rights reserved.
  • 26.
    The Big ChangeIn Technologies  Consider that Relational technologies were invented to get data ininvented to get data in and organized, not designed nor organized t t it tto get it out – RDBMS’s were designed for efficient transactions processing on large data sets ▪ Adding, Updating ▪ Searching for & retrieving small amounts of data 26© Copyright 2012 EMC Corporation. All rights reserved. [2] Source: ACM Website “The Pathologies of Big Data”, Adam Jacobs, 7/6/09
  • 27.
    Data Warehouses WereAn Answer DW l i ll d i d “ f DW was classically designed as “copy of transaction data specifically structured for query and analysis”query and analysis – General approach is bulk ETL into a DB designed for queries  Big data changes the answer – “Traditional RDBMS-based dimensional modeling and cube-based OLAP turns out to be to slow orand cube based OLAP turns out to be to slow or to limited to support asking the really interesting questions of warehoused data”[2] “To achieve acceptable performance for highly order-dependent queries on truly large data, one must be willing to consider abandoning the purely relational database model[2]” 27© Copyright 2012 EMC Corporation. All rights reserved. [2] Source: ACM Website “The Pathologies of Big Data”, Adam Jacobs, 7/6/09
  • 28.
    Voluminous Data Sets… Whatmakes large data sets are repeated observations over time/spacerepeated observations over time/space – Web log has M’s visits over handful pages Retailer has 10K products M custs but B trans– Retailer has 10K products, M custs, but B trans – Hi-Res Scientific like fMRI 1K GB per view L d t t  S ti l T l di ’– Large datasets  Spatial or Temporal dim’s Cardinalities (distinct observations) is usually small with regard to total # of observations 28© Copyright 2012 EMC Corporation. All rights reserved.
  • 29.
    Technology Solutions Appeared… 29©Copyright 2012 EMC Corporation. All rights reserved.
  • 30.
    Lets Talk TechnicalSolutions…  Sequential and/or Distributed File-Based Solutions – Oracle Exadata, Hadoop, etc.  Columnar (compression) / Multi-Level Tables( p ) / – Solves challenge of retrieving entire row – Par-Excel, Vertica, Sybase, etc.  Distributed MPP – Teradata, Greenplum, etc.  Polymorphic – Combination of Columnar & MPP 30© Copyright 2012 EMC Corporation. All rights reserved.
  • 31.
    Finding Answers SequentiallyWith OLTP  Random access is slower than sequential  The advantage gained by doing all datag g y g access in sequential order is often 4x – 10x – Many orders of magnitude ! 31© Copyright 2012 EMC Corporation. All rights reserved. [2] Source: ACM Website “The Pathologies of Big Data”, Adam Jacobs, 7/6/09
  • 32.
    Distributed File: PartitioningWith OLTP Partitioning can solve challenges of dataPartitioning can solve challenges of data growth, but true distributed processing utilizing MPP is best (author’s opinion) 32© Copyright 2012 EMC Corporation. All rights reserved. utilizing MPP is best
  • 33.
    Distributed File: PartitioningViewed Q: What was the total transactions (sales) amount for May 20 and May 21 2009? Sales Table 5/17 May 21 2009? 5/17 5/18 Only the 2 Select sum(sales_amount) From SALES 5/19 5/20 relevant partitions are read Where sales_date between to_date(‘05/20/2009’,’MM/DD/YYYY’) And 5/20 5/21 to_date(‘05/22/2009’,’MM/DD/YYYY’); 5/22 33© Copyright 2012 EMC Corporation. All rights reserved. Source: Extreme Performance With Oracle Data Warehousing
  • 34.
    Distributed File: OpenSource (Hadoop)  Apache Hadoop is a software framework that supports data-intensivep p pp distributed applications under a free license. – It enables applications to work with thousands of nodes and petabytes of data – Hadoop was inspired by Google's MapReduce and Google File System (GFS) papers.papers.  Hadoop is a top-level Apache project being built and used by a global community of contributors using the Java programming language. – Yahoo! has been the largest contributor to the project, and uses Hadoop extensively across its businesses 34© Copyright 2012 EMC Corporation. All rights reserved. extensively across its businesses. Source: Wikipedia “Hadoop”
  • 35.
    Distributed File: Hash-BasedDistribution  In a hash-based data distribution, the data is distributed across multiple platforms for ll li f iparallelism of queries… 35© Copyright 2012 EMC Corporation. All rights reserved.
  • 36.
    Columnar: Storage  Ina table with say 256 columns, a lookup will retrieve all the data in the row (disk bound)  Columnar storage reduces this I/O bandwidth by storingg / y g column data using compression – State (50 combinations stored) – Master (compressed) table has pointers to State 36© Copyright 2012 EMC Corporation. All rights reserved. ( p ) p Source: Vertica Website
  • 37.
    Columnar: Multi-Level TablePartitioning  In multi-level table partitioning, data distribution occurs across multiple platforms in segmentedp p g tables for distribution of columnar queries  This reduces the amount of work performed by each platfo m 37© Copyright 2012 EMC Corporation. All rights reserved. each platform
  • 38.
    MPP Shared NothingArchitectures  Extreme scalability  Elastic Expansion & Self-Healing Fault-Tolerance  Unified Analytics 38© Copyright 2012 EMC Corporation. All rights reserved. y Source: “Greenplum Database 4.0: Critical Mass Innovation”, White Paper, August 2010
  • 39.
    MPP Shared NothingArchitectures 39© Copyright 2012 EMC Corporation. All rights reserved. Source: “Greenplum Database 4.0: Critical Mass Innovation”, White Paper, August 2010
  • 40.
    The “Ideal” –MPP Shared Nothing Poly-Morphic Storage Tabular, Columnar, NoSQL, etc. 40© Copyright 2012 EMC Corporation. All rights reserved.
  • 41.
    It’s A BraveIts A Brave New WorldNew World 41© Copyright 2012 EMC Corporation. All rights reserved.
  • 42.
    From the OldStack to a New Ecosystem: Drivers for Changeg  Many new data sources (organic growth, data services, M&A) – Impractical to add new data sources because of tightly coupled pipeline M t t d d t i l di i l di More unstructured data, including social media – Lack of access to unstructured data; need analytics and classifiers that operate on it  Less up front data integration – Can’t assume data is pre-integrated – have to be able to locate and to query federated– Can t assume data is pre-integrated – have to be able to locate and to query federated sources of data and content  More need to track and leverage metadata – Metadata is fragmented, jailed and inconsistent – need agile, community approach  Need for flexible, agile data structures – Current structures are too rigid, and too close to the sources or the business reports  More emphasis on dynamic views for purposeo e e p as s o dy a c e s o pu pose – Need dynamic planning, creation and structuring of views that support analytics  Information governance and management in a federated, regulated world – Need flexible policy expression and enforcement, not just at point of access 42© Copyright 2012 EMC Corporation. All rights reserved.
  • 43.
    An Information Platformwith New DNA To Promote Agility Business Value and CommunityTo Promote Agility, Business Value and Community 1. Coordinated ingestion of diverse information, changes, events 2. Metadata driven processing and management 3. Nuanced optimization – on demand, multi-source, matching information needs 4. Broader reach of query – contextual search, federation, materialization 5. Freedom from imposed information structure – roll your own structure! 6 Navigation through information – contextual faceted multi-dimensional6. Navigation through information contextual, faceted, multi dimensional 7. Visualization of information – heat, clouds, clusters, flows 8. New data paths engendered by patterned consumption of entities 9 R i b t d t t l ti d i ti f h d bli ti9. Reasoning about data set location, derivation, freshness, and obligations 10. User empowerment – collaboration and talent development 43© Copyright 2012 EMC Corporation. All rights reserved.
  • 44.
    Businesses Want Integrated,Timely Information for Purposefor Purpose Area Revolution Latency “Microbatch is the new Batch” Enrichment “Tagging is the new Transformation” Query “Query is the new ETL” Federation “Query Director is the new Query Optimizer” Source “Purposeful View is the new Master” 44© Copyright 2012 EMC Corporation. All rights reserved.
  • 45.
    Some Of TheNewer Trends In Big Data  Powerful Analytics – What if, What will happen next, …, pp , – Self-service analytics? ▪ Build your own sandbox of data…u d you o sa dbo o da a  Data Cloud Surrounded Warehouse – Data Virtualization– Data Virtualization ▪ Abstracting the data from the systems, it complements existing data warehouses – Many times the size of structured warehouse – Provides for rapid analytic iterations 45© Copyright 2012 EMC Corporation. All rights reserved. p y
  • 46.
    When You LinkStructured & Unstructured Information You Get… 46© Copyright 2012 EMC Corporation. All rights reserved.
  • 47.
    Powerful Analytical Engines Whatis the best price to sell my product? 47© Copyright 2012 EMC Corporation. All rights reserved.
  • 48.
    How Do IDo This?How Do I Do This? 48© Copyright 2012 EMC Corporation. All rights reserved.
  • 49.
    How Do IDo This #2?How Do I Do This #2? 49© Copyright 2012 EMC Corporation. All rights reserved.
  • 50.
    How Do IDo This #3?How Do I Do This #3? 50© Copyright 2012 EMC Corporation. All rights reserved.
  • 51.
    Visualize The Information… 51©Copyright 2012 EMC Corporation. All rights reserved.
  • 52.
    Analytics: A PictureIs Worth A 1,000 WordsWords 52© Copyright 2012 EMC Corporation. All rights reserved.
  • 53.
    Data Virtualization Example 53©Copyright 2012 EMC Corporation. All rights reserved.
  • 54.
    Data Virtualization InPractice 54© Copyright 2012 EMC Corporation. All rights reserved.
  • 55.
    Enterprise Big DataCloud 55© Copyright 2012 EMC Corporation. All rights reserved.
  • 56.
    The Future OfData Warehousing? The “Ideal” AAbatebate Enterprise Data Cloud  Truly Virtualized Data Environment  Extreme Scale, Elastic Expansion  Automated Metadata Discovery, Classification & Tagging  Linearly Scalable Linearly Scalable – Add 1x and get 2x performance  Self – Service Provisioning  Single Point Of Management – Resource utilization optimization  Secure, Unified Data Access – Single Point of Entry – Portal based sharing of data sandboxes (wiki-type)  Reduce TCO By Eliminating Excessive Licensing Fees – Use of open source community to improve solution 56© Copyright 2012 EMC Corporation. All rights reserved.
  • 57.
    ExampleExample Case StudiesCase Studies 57©Copyright 2012 EMC Corporation. All rights reserved.
  • 58.
    Telecomm Provider LearnsA Lesson… BIG DATA ANALYTICS USE CASE e eco o de ea s esso Before investing $M of dollars on infrastructure, a provider learned where to invest their monies that would payoff… Ch llChallenge – 100TB Traditional EDW, Single Source Of Truth – Operational Reporting & Financial Consolidation – Heavy Governance And Control – Unable To Support Critical Business Initiatives – Customer Loyalty And Churn The #1 Business Initiative From The CEO Enterprise Data CloudEnterprise Data Cloud Architecture-Based Solution – Extracted Data From EDW & Other Sources – Generated Social Graph From Call Detail And Subscriber Data – Within 2 Weeks Found “Connected” Subscribers 7X More Likely To Churn Than Average Users N D l i 1PB P d ti 58© Copyright 2012 EMC Corporation. All rights reserved. – Now Deploying 1PB Production
  • 59.
    Drive Multi-channel CampaignOptimization BIG DATA ANALYTICS USE CASE Drive Multi channel Campaign Optimization Retailer increases in-flight multi-channel effectiveness with customer and product insights HIGH ion Legacy System Advanced Analytics oodOfConversi Big Data Analytics I t t t b h i l d t ith LOW Likeliho Monitor cross- channel product sales effectiveness Integrate customer behavioral data with social media sentiment data to yield new market, product and campaign insights 59© Copyright 2012 EMC Corporation. All rights reserved.
  • 60.
    Innovate With BigData Analytics BIG DATA ANALYTICS USE CASE Innovate With Big Data Analytics Big Data Analytics Accelerate Health Care 2.0 for Evidence-based Care Provider HIGHHIGH Care Legacy System BI Reporting Big Data Advanced Analytics QualityofC Delivering 10 Years g ata Analytics Associative Rule Mining and User External Data Sources Enable LOW Treatment Pathways on Treatment Pathways on Delivering 10 Years Of Data In Seconds Associative Rule Mining and User Clustering Improves Pathways External Data Sources Enable Personalized Medicine TRADITIONAL DATA LEVERAGED a ays o Summary Data a ays o All the Data BIG DATA LEVERAGED 60© Copyright 2012 EMC Corporation. All rights reserved.
  • 61.
    O h OfOpenExchange Of Ideas…Ideas… Speaker Contact Information:Speaker Contact Information: Robert J. Abate, CBIP, CDMP robert.abate@emc.com (201) 745-7680 61© Copyright 2012 EMC Corporation. All rights reserved.
  • 62.
    Credits To QuotedAuthors Adam Jacobs i i f i 1010d I h h l h l d h Adam Jacobs is senior software engineer at 1010data Inc., where, among other roles, he leads the continuing development of Tenbase, the company’s ultra-high-performance analytical database engine. He has more than 10 years of experience with distributed processing of big datasets, starting in his earlier career as a computational neuroscientist at Weill Medical College of Cornell University (where he holds the position of Visiting Fellow) and at UCLA. He holds a Ph.D. in neuroscience from UC Berkeley and a B.A. in linguistics from Columbia University. (QUOTED FROM: “The Pathologies of Big Data”, 7/6/09)a B.A. in linguistics from Columbia University. (QUOTED FROM: The Pathologies of Big Data , 7/6/09)  Bill Schmarzo has over two decades of experience in data warehousing, BI and analytic applications (Metaphor Computers, 1984). Bill authored the Business Benefits Analysis methodology that links an organization’s strategic business initiatives with their supporting data and analytic requirements, and co- authored with Ralph Kimball a series of articles on analytic applications. Bill has served on The Data W h I i f l h h d f h l i li i i l Bill VP f A l iWarehouse Institute faculty as the head of the analytic applications curriculum. Bill was VP of Analytics at Yahoo where he was responsible for the development of Yahoo’s Advertiser and Web Site analytics products, including the delivery of “actionable insights” through a holistic user experience. For Business Objects, Bill oversaw the Analytic Applications business unit including the development, marketing and sales of Business Objects’ industry-leading analytic applications.  Donald Sutton has over 20 years experience in Data Architecture, Analysis, Modeling, ETL, Implementation and Integration in the areas of Data Entry (OLTP) or ERP and 3rd Party COTS Applications, Operational Data Store (ODS), Master Data Store (MDS), Data Warehouse (DW) and Data Marts (DM) while providing Business Intelligence (BI) from multiple sources above. Passionate and motivated about sound design of data structures in all different data layers and the representation and t f ti f d t ith th ti d f d t th h t ll d t l hiltransformation of data with the accounting and governance of data throughout all data layers while Providing Business Intelligence (BI) and analytics with Key Performance Indicators (KPI) along with business modeling in translating business requirements to data requirements. (QUOTED FROM: Current Warehousing Environment & Analytics Visualizations) 62© Copyright 2012 EMC Corporation. All rights reserved.