More Related Content
Similar to Ibm big dataibm marriage of hadoop and data warehousing
Similar to Ibm big dataibm marriage of hadoop and data warehousing (20)
More from DataWorks Summit
More from DataWorks Summit (20)
Ibm big dataibm marriage of hadoop and data warehousing
- 1. June 2012
IBM Big Data
The Marriage of Hadoop and Data Warehousing
James Kobielus
Senior Program Director, Product Marketing, Big Data, IBM
© 2012 IBM Corporation
- 2. Hadoop and DW are
fast being joined into a
new platform paradigm:
the Hadoop DW
2 © 2012 IBM Corporation
- 3. Agenda
§ Big Data: 3 Vs and myriad use cases
§ Big Data: diverse workloads
§ Big Data: emergence of the Hadoop DW
3 © 2012 IBM Corporation
- 4. Agenda
§ Big Data: 3 Vs and myriad use cases
§ Big Data: diverse workloads
§ Big Data: emergence of the Hadoop DW
4 © 2012 IBM Corporation
- 5. Scalability Imperative: 3 Vs Drive Big Data Everywhere
Information Radical Extreme
from Everywhere Flexibility Scalability
Volume Velocity Variety
5
12 terabytes
of Tweets created daily
5 million
trade events per second
100’s
from surveillance cameras
video
feeds
© 2012 IBM Corporation
- 7. More Mission-Critical Apps Ride on Big Data Platforms
Advanced Analytic Applications
§ Integrate and manage the full variety, velocity
and volume of data
§ Apply advanced analytics to information in its
native form
Big Data Platform § Visualize all available data for ad-hoc analysis
Process and analyze any type of data and discovery
Accelerators
§ Development environment for building new
analytic applications
§ Integration and deploy applications with enterprise
grade availability, manageability, security, and
performance
• Analyze data in motion • Visualization and
• MapReduce / noSQL exploration
• Machine Learning • Scalability
• Text Analytics • Hardware
• Text Search acceleration
• Data Discovery • Stream computing
7 © 2012 IBM Corporation
- 8. Big Data: Business Crucible for Practical Data Science
Business and IT Identify
Information Sources Available
New insights IT Delivers a
drive integration Platform that
to traditional enables creative
technology exploration of all
available data and
content
Business determines what
questions to ask by exploring the
data and relationships
8 © 2012 IBM Corporation
- 9. Big Data Initiatives: Fueled by Practical Data Science
Analyze a Variety of Information
Novel analytics on a broad set of mixed
information that could not be analyzed before
Analyze Information in Motion
Streaming data analysis
Large volume data bursts and ad-hoc analysis
Analyze Extreme Volumes of Information
Cost-efficiently process and analyze PBs of
information
Manage & analyze high volumes of structured,
relational data
Discover and Experiment
Ad-hoc analytics, data discovery and
experimentation
Manage and Plan
Enforce data structure, integrity and control to
9 ensure consistency for repeatable queries IBM Corporation
© 2012
- 10. Big Data: Marriage of Established & Emerging Approaches
Established Approach Emerging Approaches
Structured, analytical, logical Creative, holistic thought, intuition
DW Hadoop, etc.
Transaction Data Web Logs
Internal App Data Social Data
Structured Unstructured
Structured Enterprise Exploratory
Exploratory
Repeatable
Repeatable
Linear
Integration
Iterative
Iterative Text Data: emails
Mainframe Data
Linear
Monthly sales reports Brand sentiment
Profitability analysis Product strategy
OLTP SystemCustomer surveys
Data Sensor data: images
Maximum asset utilization
ERP data Traditional New RFID
Sources Sources
10 © 2012 IBM Corporation
- 11. Agenda
§ Big Data: 3 Vs and myriad use cases
§ Big Data: diverse workloads
§ Big Data: emergence of the Hadoop DW
11 © 2012 IBM Corporation
- 12. Continuous Social Media Monitoring and Analytics
Data Set Information extracted
• 1.1B tweets • Buzz and sentiment
• 5.7M blog and forum posts • Gender, Location and Occupation
• 3.5M relevant messages • Fans
• 97K referencing Product A • Intent to in purchase
• 18K referencing Product B • Specific attributes of products
12 © 2012 IBM Corporation
- 13. Content mining, natural language processing, & classification
§ How it works Unstructured text (document, email, etc)
– Parses text and detects meaning with extractors
Football World Cup 2010, one team
– Understands the context in which the text is
analyzed
distinguished themselves well, losing to
the eventual champions 1-0 in the Final.
– Hundreds of pre-built extractors for names,
addresses, phone numbers, organizations, URL,
Early in the second half, Netherlands’
Datetime, etc. striker, Arjen Robben, had a breakaway,
but the keeper for Spain, Iker Casillas
§ Accuracy made the save. Winger Andres Iniesta
– Highly accurate in deriving meaning from scored for Spain for the win.
complex text
§ Performance
– AQL language optimized for MapReduce Classification and Insight
World Cup 2010 Highlights
13 © 2012 IBM Corporation
- 15. Statistical Analysis, Predictive Modeling, & Machine Learning
Enables Machine learning (ML) on massive datasets
§ R and Matlab-like syntax for smooth adoption
§ Optimizations to generate low-level executions plans
§ Out-of-box and write-your-own analytic algorithms, e.g. Regression, Clustering,
Classification, Pattern Mining, Ranking, etc.
§ Scale to massively parallel clusters from 10s to 1000s of machines and from
Terabytes to Petabytes
What are people
talking about in social
media about a
product?
15
15 © 2012 IBM Corporation
- 18. Intent and Sentiment Analysis
Online flow: Data-in-motion analysis
Data Sources Stream Computing and Analytics Timely
Decisions
Entity Predictive
Data Ingest Text Analytics: Analytics: Analytics:
and Prep Timely Insights Profile Action
Resolution Determination
Dashboard
Hadoop System and Analytics
Comprehensive
Entity
Social Media and Social Media Predictive Customer
Text Analytics Analytics and
Enterprise Data Customer Analytics Models
Integration
Profiles
Offline flow: Data-at-rest analysis Reports
18 © 2012 IBM Corporation
- 19. Agenda
§ Big Data: 3 Vs and myriad use cases
§ Big Data: diverse workloads
§ Big Data: emergence of the Hadoop DW
19 © 2012 IBM Corporation
- 20. Big Data: DW & Hadoop are Married in Spirit
Cloud-facing
architectures
models Massively
policies
metadata aggregates parallel
DQ MDM hubs marts processing
cubes
ETL databases
DW In-database
views
storage
memory
staging
production cache in-database
analytics
nodes
tables analytics
operational
data stores
Mixed workload
management
Hybrid storage
layers
20 © 2012 IBM Corporation
- 21. Hadoop is Core of Next-Gen Big Data DW
§ Vendor-agnostic framework for
massively parallel processing of
advanced analytics against
polystructured information
§ Leverages extensible framework for
building advanced analytics and data
management functions
§ Evolving rapidly in new directions
§ Being commercialized and adopted
rapidly in enterprises
§ Vibrant open-source community and
industry
21 © 2012 IBM Corporation
- 22. Hadoop, DW, and other Databases Co-Exist in Big Data
Ecosystem
Hadoop & In-memory
NoSQL
DW RDBMS
Columnar
OLAP
Big Data staging,
ETL, and Big Data SVOT and Big Data access
preprocessing tier governance tier and interaction tier
22 © 2012 IBM Corporation
- 24. Single Version of Big Data: Where Hadoop DW Will Excel
Timely Insights
• Intent to see a movie title, buy a product
• Current Location
Life Events Products Interests
• Life-changing events: relocation, having a • Personal preferences of product and services
baby, getting married, getting divorced, • Product purchase history
buying a house
Personal Attributes Relationships
Social media based • Personal relationships: family, friends
• Identifiers: name, address, age, gender and roommates…
• Interests: sports, pets, cuisine… 360-degree
• Business relationships: co-workers and
• Life Cycle Status: marital, parental consumer profiles work/interests network…
Monetizable intent to see a Monetizable intent to buy
Kinda feel like going to movies tonight… Any I need a new digital camera for my food pictures, and
recommendations? @Texas Angelika Texas recommendations around 300?
I don t think anyone understands how much I like What should I buy?? A mini laptop with Windows 7 OR a Apply
watching movies. My 3rd trip to the threatre in 3 days. MacBook!??!
Life Events
Location announcements College: Off to Standard for my MBA! Bbye chicago!
I m at Starbucks Parque Tezontle http://4sq.com/
fYReSj Looks like we ll be moving to New Orleans sooner than I
24 thought. © 2012 IBM Corporation
- 25. Hadoop DW Integration: What to Look For
models
§ Hadoop distro functional depth policies
metadata aggregates
§ EDW HDFS connector DQ MDM hubs marts
cubes
ETL databases
DW
§ Software, appliance, and cloud form factors for views
storage
Hadoop offerings staging memory
nodes production cache in-database
§ Pluggable storage layer for Hadoop offerings tables
operational
analytics
§ Bundled data management and analytics data stores
offerings integrated with Hadoop solutions
§ Modeling, management, acceleration, and
optimization tools
§ Real-time/low-latency capabilities integrated into
Hadoop offerings
§ Robust availability, security, and workload
management tools integrated with Hadoop
offerings
§ And many more, focused on EDW-grade
robustness, scalability, and flexibility!
25 © 2012 IBM Corporation
- 26. Consider Big Data Platform Accelerators
Telecommunications Retail Customer
CDR streaming analytics Intelligence
Deep Network Analytics Customer Behavior and Lifetime
Value Analysis
Finance Social Media Analytics
Streaming options trading Sentiment Analytics, Intent to
Insurance and banking DW purchase
models
Public transportation Data mining
Real-time monitoring and Streaming statistical analysis
routing optimization
Over 100 sample User Defined Standard Toolkits Industry Data Models
applications Toolkits Banking, Insurance, Telco,
Healthcare, Retail
26 © 2012 IBM Corporation
- 27. How Will You Do MDM on Your Hadoop DW?
(A1) Unstructured Entity Integration (on BigInsights)
– Complex analytics to populate master data set
– Text Analytics: Rule language (AQL) for extracting
entities, events, relationships from text and html documents
– Entity Integration: Rule language (HIL) to express & MDM DaaS
customize the integration, cleansing, and aggregation of Applications
the master entities and Views
(A2) Entity Repository (on MDM)
– BigInsights Bridge: Generation of the MDM model for
public master entities, from the BigInsights model; and select cik, Officers, Directors
bulk-loading of master entities from Company
Data services where name = Citigroup
– Query-based Application Development: Supports the
generation of custom queries for individual applications
Tooling based
Queries on entity model
A2
External data
subscriptions
(e.g., Acxiom)
A1 Relational tables SELECT *
FROM
with master
(SELECT t2.CIK as CIK, t2.NAME as NAME, t2.IS_FORMER_OFFICER as IS_FORMER_OFFICER,
t2.IS_IMPORTANT_OFFICER as IS_IMPORTANT_OFFICER, t2.POSITION_NAME as
POSITION_NAME,
Text Analytics entities FROM
tp.EARLIEST_DATE as EARLIEST_DATE, tp.IS_EARLIEST_EXACT as IS_EARLIEST_EXACT,
tp.LATEST_DATE as LATEST_DATE, tp.IS_LATEST_EXACT as IS_LATEST_EXACT
External public data and (SELECT t1.CIK as CIK, t1.NAME as NAME,t1.IS_FORMER_OFFICER as IS_FORMER_OFFICER,
t1.IS_IMPORTANT_OFFICER as IS_IMPORTANT_OFFICER, p.NAME as POSITION_NAME,
p.POSITIONSPK_ID as POSITIONSPK_ID
sources Entity Integration FROM
(SELECT o.CIK as CIK, o.NAME as NAME, o.IS_FORMER_OFFICER as IS_FORMER_OFFICER,
o.IS_IMPORTANT_OFFICER as IS_IMPORTANT_OFFICER, o.OFFICERSPK_ID as OFFICERSPK_ID
(e.g., SEC/FDIC,
FROM DB2ADMIN.OFFICERS o
WHERE o.OFFICER_OF = 567830643756635868
) as t1
Twitter, Blogs, BigInsights InfoSphere MDM left outer join DB2ADMIN.POSITIONS p on t1.OFFICERSPK_ID= p.POSITIONOF
) as t2
Facebook) left outer join D2ADMIN.RANGEOFKNOWNDATES tp
with Extensions UNION
on t2.POSITIONSPK_ID = tp.RANGE_OF_KNOWN_DATES_FOR_POS )
// ( OUTER UNION)
…
27 © 2012 IBM Corporation
- 28. IBM Big Data Platform
New analytic applications drive the Analytic Applications
requirements for a big data platform BI / Exploration / Functional Industry Predictive Content
Reporting Visualization App App
BI /
Analytics Analytics
Reporting
• Integrate and manage the full IBM Big Data Platform
variety, velocity and volume of data
Visualization Application Systems
• Apply advanced analytics to & Discovery Development Management
information in its native form
• Visualize all available data for ad- Accelerators
hoc analysis
• Development environment for Hadoop Stream Data
System Computing Warehouse
building new analytic applications
• Workload optimization and
scheduling
• Security and Governance Information Integration & Governance
© 2012 IBM Corporation