Most people see data as a storage issue – persistence / serializatioAnd the associated technologies – oracle,mysql, sql server, and now Hadoop / riak / Hbasebut I don’t see data as a persistence problem – simply – “where do you put it?”I see data lattiss - theirs more data in the network than in databases… – it’s value is comes from how/where the data is used – not how it is stored. I had the opportunity to attend a number of training sessions help by Chris Date – we learned about RI, dead locks,… But one thing he said to me that really stuck – He said that the “data” is in the transaction log, not the database – the database only contains a current snapshot – What is happening is in the transaction log.
New Analytical Architectures for Big Data
New Analytical Architectures Why Classic Data Warehousing Approaches Miss the Mark with Big Data March 21, 2013 Casey Kiernan • firstname.lastname@example.org Blog • www.the-data-platform.com
“The Future is Data”“Hadoopis the kernel of a new Distributed Data OS” Doug Cutting
Data has Changed > Analytics has Changed Transactional > Trailing Indicators Communities > Reach/Influence Personal > Interactive Can the Data Warehouse Architecture adapt?
My Mountain Bike as a Data Platform Data Collection Heart Rate Data Collection Altitude Data Collection Temperature Speed / Trip Miles Time Guidance Performance Rate of Climb Calories Burned Miles Obtained Total Climbed Elapsed Time Current, Average, Max Values Data Collection Cadence / RPM Data Architecture - on a Local Wireless Network (ANT+ Protocol)
“Personal” Ride Analytics …is this a Data Warehouse?
New Data Behaviors (individual actions) > Content > Time Behaviors Content
New Data More is Better… Meaningful Guidance Massive Data 9
“Business”Analytics - Classic “DW” BUSINESS INTELLIGENCE OLAP / DATA WAREHOUSE OLTP / TRANSACTIONS DATA. Answers the question: What are our most profitable Products?
Classic “Business” AnalyticsGood for Reporting, Forecasting What did Happen? What will Happen? Operational Reporting Tactical Trending Strategic Months WeeksWeeks Months Years Descriptive/Trending Analytics11
New“Personal”Analytics SELF-SERVICE GUIDANCE BEHAVIOURS DATA. Answers the question: Show me a good movie to watch!
“Personal” Analytics“Right Now” is a very important time-frame! What is Happening RIGHT NOW! What did Happen? What will Happen? Operational Reporting Tactical Trending Strategic Months WeeksWeeks Months Years Predictive/Prescriptive Analytics 13
“Business”Analytical Architecture Classic “DW” Data Flow - Uni-Directional, Latent,… Ordering App Data Staging OLAP / Reports Business Financial App Warehouse OLTP to OLAP Facts/Dimensions Analyst Mapping Business Metrics, Master Data Facts & KPI, YTD Reporting Dimensions What are our most Profitable Products? 16
“Personal” Analytical Architecture“New” Data Flow - Iterative, Specialized, Extensible, plug & play Analytics, near real-time[Some components are open-source] Application / UX Analytical Capabilities Scoring/Ranking, Recommendations, Natural Language Processing, Relevancy, Classification, Optimization, Data Analytics Collaborative Filtering, Personalization, Digital Attribution,… Data Analysts What movie should I watch tonight? 17
“Personal Analytics” Data Architecture “New” Data Flow – Detailed View of ComponentsEnd-User Experience Browser, Tablet, Self-Service Application Mobile,… Personalization, Personalized Preferences, State Recommendations App Persistence Published AnalyticsPersistence/Analytics “State” Persistence “Read” Performance Analytics Engines Pluggable Social Signals Mass Data Storage RSS/Facebook/… Behaviors / “Write” Performance Data Scientists 18
Let’s get personal… SALLY LIKES TACOS HOW DO WE MODEL THIS DATA?
How important is Social? Shows you who is actively Install ghostery.com watching you surf the web! Lots of people!!!
Signals – The Core of New DataMixture of Proprietary and Public Data Social Personal Content Time
The New “Analytical Application” Architecture“New” Data Flow – Specialized Technology Choices End-User Experience Browser, Tablet, Self-Service Application Mobile,… Personalization, Personalized Preferences, State Recommendations App Persistence Published Analytics Persistence/Analytics Cassandra, Riak,… Hbase Data-Center or Cloud Analytics R, Mahout, Pig Mass Data Storage Hadoop Specialization of Data Technologies 26
Servicing Multiple Analytical SystemsUsing Shared Analytical Mas- Storage Self-Service Application A Self-Service Application B Published Published Persistence Persistence Analytics Analytics Riak Cassandra HBase MySQL Data Scientists Analytics Engine Pluggable Mass Data Storage Analytics Engine Behaviors / “Write”Performance Pluggable Analytics Engine Hadoop / AWS Pluggable p. 27
Integrating the Architectures“Personal” Analytics Stack + Classic “DW” Stack Only Financial Events ($$$) cross the threshold App (and are recorded into) the Data Warehouse Staging App Data Warehouse OLAP / Business OLTP to OLAP Mapping Reports Analyst App “Local” Events stay Local (they are analyzed locally) Not all DATA Belongs in the Data Warehouse! 28
Classic DW Vs. the New AnalyticsThe Shift from “Business” Analytics to “Personal” Analytics Classic DW New AnalyticsScope Enterprise ApplicationAnalytics Trailing:OLAP Predictive: Machine Learning Sentiment Analysis, Recommendations, Personalization,Natural Language Processing, Classification, Clustering, Optimization, Collaborative Filtering, Digital Attribution,…Actionable? Loosely Coupled Tightly Coupled Analytics Embedded in ApplicationData Structures Facts/Dimensions Semantic Data,Graph / Triples, (Requires a DW) Observations, Direct SignalsKnowledge Expert Business Analyst Data ScientistTechnology Stack Vendor Driven ($$$) Open-SourceArchitecture Scale-Up Scale-Out (or in the Cloud)
New Signals + New Analytics = NewScenarios Data New Analytics Signals New Recommendations, Social Scenarios Natural Language Location Customer Processing, Engagement, Relevancy, Personal Customer Loyalty / Classification, Behaviors Attrition / Retention, Optimization, Transactions Fraud, Risk Analysis, Collaborative Content Intent, Customer Filtering, Personalization Digital Time Attribution,…