SlideShare a Scribd company logo
1 of 36
www.globalbigdataconference.com
Twitter : @bigdataconf
“Taming the Data Lake”
2
Intended for Knowledge Sharing only
Disclaimer:
Participation in this summit is purely on personal basis and not representing VISA in any form or
matter. The talk is based on learnings from work across industries and firms. Care has been taken to
ensure no proprietary or work related info of any firm is used in any material.
Director, Insights at Visa, Inc.
Enable Decision Making at the Executives/
Product/Marketing level via actionable
insights derived from Data.
RAMKUMAR RAVICHANDRAN
Data Warehouse Architect at Visa, Inc.
Architect a data-shop in Hadoop to get 360-
degree view of the interaction. Technology
interface for the Data Stakeholder Community.
BHARATHIRAJA CHANDRASEKHARAN
Intended for Knowledge Sharing only
Quick recap of what it is
Intended for Knowledge Sharing only
Data Lakes – the concept
AS THEY ARE ENVISIONED TODAY…
Intended for Knowledge Sharing only
Source: http://www.tangerine.co.th/tag/how-do-data-lake-work/
5
DOES IT RING A BELL?
*only satiric to wake you up and not indicative of anyone or anything- any similarity is purely coincidental! 6
& DOES THIS TOO?
*only satiric to wake you up and not indicative of anyone or anything- any similarity is purely coincidental! 7
SO WHAT DO WE HEAR FROM OUR USERS?
We often hear these statements in the context of data lakes…
Success criteria was engineering specific – Storage/Scalability cost saving, etc
Expensive Change Management
Complex for the end users to deal with
Analytical performance issues
Data Governance, Lineage and Management complexities
“Although the cost of Storage went down, actual cost of utilizing the data has shot up”
8
Intended for Knowledge Sharing only
Quick recap of what it is
Intended for Knowledge Sharing only
Taking a step back
DATA REALLY HAS GOTTEN BIG – VOLUME, VARIETY, VELOCITY & VERACITY
Each of the data source is critical either across all or multiple functions….
Intended for Knowledge Sharing only
…and are consumed either as reports, analytical deep dive insights, forward looking projections, etc.
TRANSACTION DATA
CLICK STREAM DATA (MOBILE
& WEB)
SENTIMENT/SOCIAL DATA
• Are overall txns going up/down; where the txns are happening,
etc..
• How are Consumers interacting with the website/app – drop-offs,
clicks, Time spent, etc..
• Social Media, NPS surveys, Media mentions helps in gauging true
Consumer reactions
DATA SOURCES TYPES OF INSIGHTS
SERVER LOGS DATA
• How are consumers reacting with various functions on the front
end?
LOCATION DATA • Are consumers using the product in-store or on the move?
PROMOTIONS DATA • How are consumers reacting to various marketing campaigns?
INDUSTRY DATA • Benchmarking against industry performance
10
EVERYONE NEEDS DATA…
Intended for Knowledge Sharing only
How are we doing today?
BI
Where will be tomorrow?
What if we do this?
What can we do?
ANALYTICS
Did the initiative work?
A/B TESTING
How do Customers feel
about us?
USER RESEARCH
Where should we invest?
STRATEGY
11
…AND DISTRIBUTED DATA SYSTEMS HAD THEIR OWN ISSUES
Intended for Knowledge Sharing only
Inconsistent (and/or conflicting) definitions of data and numbers
Varying granularities
Multiple methodologies
Different BU = (different KPIs or same KPIs different priorities)
Lack of visibility/understanding outside of the BUs
“Slow & inefficient, Non-scalable,
Difficulties rolling up, Trust issues,
Cascading mistakes”
12
AND IT THEN JUST HAPPENED…
Intended for Knowledge Sharing only
TRANSACTION DATA
CLICK STREAM DATA (MOBILE
& WEB)
SENTIMENT DATA
DATA SOURCES
SERVER LOGS DATA
LOCATION DATA
CAMPAIGN DATA
INDUSTRY DATA
Source: http://www.adamadiouf.com/2013/03/22/bigdata-vs-enterprise-data-warehouse/
As if all prayers were answered Hadoop arrived in a big way & poof all problems seemed to disappear…
13
Intended for Knowledge Sharing only
Quick recap of what it is
Intended for Knowledge Sharing only
A stroke of luck or was it?
WE FOCUSED ON OUR SPOUSE BUT FORGOT THE IN-LAWS…
Inform
Reports on KPIs
with high level
drilldowns
Act
Deep dives via
Business
Analytics
Predict
Identify Causal
relationships
via Advanced
Analytics
Optimize
Experiments to
verify which
one works via
A/B Testing
Maturity phases of Analytics Practice
ValueAddition
Intended for Knowledge Sharing only
Mine
Machine
Learning
Focus on the 20% Data consumers (Reports) and assumption was that 80% Data Consumers will either
love it or at least figure it out…
5%
50%
15%
20%
10%
15
HIGH DEVELOPMENT/MODIFICATION COSTS
Intended for Knowledge Sharing only
Rigid Structure and scale of operations make dynamism difficult…
16
Data Modeling/Schema
ETL; Metadata
Raw Data
NOT ONLY IS THE AUDIENCE CHANGING…
Intended for Knowledge Sharing only
Stakeholders Needs
Reports, Insights
& Drilldowns
Datamart Documentation
Executives
- Reports
- High level drilldown
- Unified summary
- “On the go*”
Marketing & PR
- Campaign performance
- Infographics
- Deep dives
- Testing
Sales / RM
- Sales performance
- Prospecting
- Competitive
- Infographics
Product
- Product performance
- Deep dive
- Mining
- Testing
- Research
Technology / AE /
Operations
- Platform performance
- Deep dive
- Forecasting
- Real time alerting
FP & A
- Consolidated Initiative
readouts (E2E)
- Deduping
- Drill downs
- Forecasting
17
…BUT ALSO THE NEEDS ARE EVER CHANGING
Intended for Knowledge Sharing only
“In mail”
Recommendations
with supporting
graphs, tables, etc.
“Story Deck”
Full deck with the pitch
and supporting arguments,
numbers, graphs, charts
“On-the-go”
-Mobile App, On the
Cloud, Subscriptions
-Reports, Dashboards,
Infographics
Algorithm/Model
Ready to be deployed
How to decide? Customer needs;
Turnaround Speed; One time/reuse;
Deployment on Front end; Strategic
Doc; Quick read/research doc
18
Intended for Knowledge Sharing only
Quick recap of what it is
Intended for Knowledge Sharing only
Getting to the point – what do we propose?
WE BRING TO YOU THE SCALABLE METRICS MODEL (SMM)…
EDW
Aggregated
Cubes
Every attempt to bring the best of the most used models…
20
ACID, Fast, Stable
Rigid, Cost, Resourcing
Scalable
Metrics Model
(Pre-Aggregated
Metrics + Primary-
Foreign Keys)
Cost, Flexibility,
Scalability
Performance, Reliability
Performance, Easy to
understand
Reporting only
TACTICAL DETAILS: WHERE DO WE START?
An illustrative example from Retail domain…
21
• Defined Granularity & associated Info: Determined by Core Objectives,
e.g., Customer level table for Customer Engagement team
• Defined Foreign Keys & Common Dimensions: For extensibility
• Defined Metrics: KPIs as required
• Identify Value Add Metrics: recommendation, forecasting etc
CUSTOMER
•Primary Key: Customer id
•Foreign Keys: Sign Up Partner,
Promotion Id, First Txn id
•Customer Level Info: Email, Phone,
Number, Geo, etc.
•Metrics:
•Lifetime Spend, Txns
•Behavioral Bucket
•RFM Bucket
•Recommended Action items:
•Next Best Product
•CLV
•Target Offers
•Call Center Agent Reco
TACTICAL DETAILS: DATA MODEL
An illustrative example from Retail domain…
22
id Dimensions foreign_keys metrics
Customer_id
Name
Email
Address,etc.
signup_partner_id
promotion_id
Lifetime Spend, Txns
Behavioral Bucket
RFM Bucket
Recommended Action items:
Next Best Product
CLV
Target Offers
Call Center Agent Reco
11234
{"name":"John",
"Email" :
"john@email.co
m" ,
"Address":"123
nowhereblvd"}
{"signup_partner_id
":"666YYY",
"promotion" :
"YAH123" }
{"Lifetime Spend":"3400",
"Txns":"150",
"Behavioural Bucket" : "repeat
user" ,
"RFM Bucket":"",
"recommended Product
id":"PRD789",
"CLV":"??",
"Target Offer":"OFF789",
"CallCenterAgentReco":"1234"}
WhatitcontainsSampledata
TACTICAL DETAILS: ETL FRAMEWORK
An illustrative example from Retail domain…
23
STEP I:
QUERIES
STEP II:
FRAMEWORK
RUNS
•Write separate queries/code to get metrics on the defined granularity
•Put those queries into the framework
STEP III:
IMPLEMENT
MODULARITY
STEP IV:
USER
INTERFACE
•Adding a new metric is just adding a new query/code for that metric alone
•Can change an existing logic for a metric will impact that metric alone
•Create physical impala tables for interactive querying
•Create views for abstraction and end-user access
•Exporting data to reporting tools like Tableau/QlikView brings a high level
of analysis capability to this model.
•Framework runs each of these queries and populate respective keys
ETL framework
• Divide and conquer
– Write separate queries/code to get metrics on the defined
granularity
– Put those queries into the framework
• Framework runs each of these queries and populate
respective keys
• Modularity
– Adding a new metric is just adding a new query/code for that metric
alone
– Can change an existing logic for a metric will impact that metric
alone
24
Reporting and presentation
• Map data-types are hard for the users for access
• Three options
– Create physical impala tables for interactive querying
– Create views for abstraction and end-user access
• Reporting layer (like Tableau)
– Brings a different level of accessibility and analysis capability to this
model.
• Faster (if data is cached)
• Create report level calculations
• Data blending
• Using metrics as a dimension – like customer buckets on transaction size
• Visualization
25
DATA BUS EXTENSIBILITY
CUSTOMER
•Primary Key: Customer id
•Foreign Keys: Sign Up Partner,
Promotion Id, First Txn id
•Customer Level Info: Email,
Phone, Number, Geo, etc.
•Metrics:
•Lifetime Spend, Txns
•Behavioral Bucket
•RFM Bucket
•Recommended Action items:
•Next Best Product
•CLV
•Target Offers
•Call Center Agent Reco
SELLERS
•Primary Key: Seller id
•Foreign Keys: Product id,
Operating Channel
•Customer Level Info: Name,
Operating Region, Annual Sales
•Metrics:
•Lifetime Sales, Txns
•Performance Bucket
•Special Category Flag
•Recommended Action items:
•Next Best Product
•Next Co-Marketing
•RM action
TXNS
•Primary Key: Txn id
•Foreign Keys: Custid, Sellerid,
Channel,
•Txn Level Info: Amt, Type,
Date,
•Flags:
•Buyer/Seller Type
•Deviation Metrics
•Fraud/Good
•Agent Verification
•Next Best Offer
CLICKSTREAM
PROMOTIONS
PARTNERS
PRODUCTS
SENTIMENT
LOGS
3rd PARTY
ETC ETC…
Common
Dimensions or
Foreign Keys
Business
requirements
Design
DataModel
ETL
Framework
Reporting
Layer
Use and
learn
27
THE SALIENT FEATURES
28
• Fit for wide variety of Solution Sets & audiences: Optimal data model to
support all three needs – Reporting, Analytics & Data Mining.
• Best of all worlds: Scalable Metrics Model is a hybrid approach,
• ACID Strengths: performance, stability and reliability of RDBMS.
• Non ACID Strengths: scalability, flexibility, versatility of Hadoop.
• Needs Optimized Model: Highest premium is provided to needs of the user – easy
to incorporate changes as they come along (view like). Refresh cycle is easy and
changed logics easily get incorporated in the next run.
• Data Governance & Lineage: Operates with a modular approach – break down
complex problems into smaller items and integrate in a bigger scheme of things. This
eases better Data Governance and Lineage.
• Extensibility:
• Caching: Easy integration with buffering technologies to optimize on
performance.
• Visualization: Easier integration with visualization tools like Tableau.
• Coding Interface: Additional drilldowns, analyses, data analysis via HIVE/SAS/R.
● MODULAR ● EXTENDABLE ● UPDATABLE ● SCALABLE
FOUR DIMENSIONS OF SUCCESSFUL EXECUTION
29
PEOPLE
• Business Analysts: Details on Business needs like Timing(Immediate/
near/medium/long term), Priority (Critical/Urgent/Important/Good to have),
Frequency (Regular/once-in-a-while/rare), Real-time, Delivery & Users.
• Technical Architects: Understand the raw data structure, flow mechanisms &
pipelines, security/legal/storage/resourcing constraints, feasibility
assessments.
PROCESS
• Matching & Gap Analysis: Is the technology available to handle all business
needs (possible/not enough RoI/deferred); Contingency, resourcing & budgeting.
• Project Planning: Milestone based delivery, Deep Stakeholder involvement in
development & validation, Communications Management
• Execution: Schema on read efficient, Aggregates, Tight Metadata,
reporting/analytics layer, Tables/Partitions/File types/Compression, Metadata
TECH
• PIG: ETL
• HIVE/Impala: Schema & Table creation
• Java/Streaming:
• SAS/Python/R: Statistical Modeling
CULTURE
• Customer Needs Focused
• Need for a smart vision, sound planning and able change management
• Outcome Focused Organization (common business goal)
• SAS/Python/R: Statistical Modeling
WHY DO WE THINK THE TIME IS NOW?
Evolution in the value prop of Analysts:
What/where/how much -> what can happen ->what should we do ?
Audience has broadened (A numbers middle man -> Front line Managers)
Luxury of time has evaporated
Nature of questions have drastically changed (Expectation of being able to
connect the dots in “Data Lake” world).
Overselling potential before getting “there”
30
KPI of Analytics has changed from Turn-Around-Time (TAT) to Time-to-
Action (TTA)
Intended for Knowledge Sharing only
Quick recap of what it is
Intended for Knowledge Sharing only
Putting it all together
SWOT ANALYSIS OF SMM
STRENGTHS
OPPORTUNITIES
WEAKNESSES
THREATS
• Need sensitive model
• Cost of development, modification & refresh
reduced
• Easy for Analysts/End Users to understand and
play with
• Data Governance & Lineage: Break down
bigger problems into smaller manageable
• Integration with front end tools that can
simplify UX.
• Tools that buffer the backend data to
ensure speedy delivery.
• Good vision of future Analytical
requirements is paramount.
• Full refresh every time it runs again.
• Maximum granularity needs to be pre-
fixed.
• Learning Curve on Coding
language/syntax.
• Non-normalized data model.
• Not for real-time insights delivery
• No Slowly Changing Dimensions
32
THE FIVE COMMANDMENTS
33
• “Know” that it caters to most frequent and not all needs.
• “Must have” as good & farther as possible Analytics vision/needs and Outcome
Focused approach.
• “Ensure” Deeper Stakeholder involvement in the development. Test & Learn
approach must. And be ready to modify if needed.
• “Develop” modularity in delivery.
• “Prepare” for ever more increasing dependencies from Analytics and other
stakeholders.
Intended for Knowledge Sharing only
Quick recap of what it is
Intended for Knowledge Sharing only
Appendix
THANK YOU!
Intended for Knowledge Sharing only
Would love to hear from you on any of the following forums…
https://twitter.com/decisions_2_0
http://www.slideshare.net/RamkumarRavichandran
https://www.youtube.com/channel/UCODSVC0WQws607clv0k8mQA/videos
http://www.odbms.org/2015/01/ramkumar-ravichandran-visa/
https://www.linkedin.com/pub/ramkumar-ravichandran/10/545/67a
https://www.linkedin.com/in/dataisbig
http://bigdatadw.blogspot.com/
BHARATHIRAJA CHANDRASEKHARAN
RAMKUMAR RAVICHANDRAN
35
36
RESEARCH/LEARNING RESOURCES
Intended for Knowledge Sharing only
• Alternative approach by Martin Fowler:
http://martinfowler.com/bliki/DataLake.html
• Teradata/Hortonworks Data Lake Whitepaper:
http://hortonworks.com/wp-
content/uploads/2014/05/TeradataHortonworks_Datalake_White-Paper_20140410.pdf
• Teradata/Hortonworks Data Lake Whitepaper:
http://hortonworks.com/wp-
content/uploads/2014/05/TeradataHortonworks_Datalake_White-Paper_20140410.pdf
• EMC Data Lake:
https://www.youtube.com/watch?v=o2fs02h_LEo
36

More Related Content

What's hot

Mir 1808 cus_datplat
Mir 1808 cus_datplatMir 1808 cus_datplat
Mir 1808 cus_datplatEvoLife.bg
 
360 Degree View Of Customer Powerpoint Presentation Slides
360 Degree View Of Customer Powerpoint Presentation Slides360 Degree View Of Customer Powerpoint Presentation Slides
360 Degree View Of Customer Powerpoint Presentation SlidesSlideTeam
 
Definitive guide-to-marketing-metrics-marketing-analytics
Definitive guide-to-marketing-metrics-marketing-analyticsDefinitive guide-to-marketing-metrics-marketing-analytics
Definitive guide-to-marketing-metrics-marketing-analyticsNuno Fraga Coelho
 
Single Customer View
Single Customer ViewSingle Customer View
Single Customer ViewDatalicious
 
Attribution Modelling or Customer 360⁰ view engineering: Which comes first & ...
Attribution Modelling or Customer 360⁰ view engineering: Which comes first & ...Attribution Modelling or Customer 360⁰ view engineering: Which comes first & ...
Attribution Modelling or Customer 360⁰ view engineering: Which comes first & ...Digital Science Consulting Ltd
 
The Role of CDP in Data-Driven Marketing
The Role of CDP in Data-Driven MarketingThe Role of CDP in Data-Driven Marketing
The Role of CDP in Data-Driven MarketingChristine Paulson
 
Introduction to Customer Data Platforms
Introduction to Customer Data PlatformsIntroduction to Customer Data Platforms
Introduction to Customer Data PlatformsTreasure Data, Inc.
 
Retail Big Data and Analytics
Retail Big Data and AnalyticsRetail Big Data and Analytics
Retail Big Data and AnalyticsCloudera, Inc.
 
Impact of BIG Data on MDM
Impact of BIG Data on MDMImpact of BIG Data on MDM
Impact of BIG Data on MDMSubhendu Dey
 
Using Power BI To Improve Media Buying & Ad Performance
Using Power BI To Improve Media Buying & Ad PerformanceUsing Power BI To Improve Media Buying & Ad Performance
Using Power BI To Improve Media Buying & Ad PerformanceGramener
 
What is a customer data platform (CDP)?
What is a customer data platform (CDP)?What is a customer data platform (CDP)?
What is a customer data platform (CDP)?Todd Belcher
 
Customer data platform categorization spectrum
Customer data platform categorization spectrumCustomer data platform categorization spectrum
Customer data platform categorization spectrumTodd Belcher
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US InformationJulian Tong
 
Analytics & Data Strategy 101 by Deko Dimeski
Analytics & Data Strategy 101 by Deko DimeskiAnalytics & Data Strategy 101 by Deko Dimeski
Analytics & Data Strategy 101 by Deko DimeskiDeko Dimeski
 
What is (and who needs) a customer data platform?
What is (and who needs) a customer data platform?What is (and who needs) a customer data platform?
What is (and who needs) a customer data platform?Angela Sun
 
MicroStrategy BI Solutions for Retail Industry
MicroStrategy BI Solutions for Retail IndustryMicroStrategy BI Solutions for Retail Industry
MicroStrategy BI Solutions for Retail IndustryBiBoard.Org
 
from-big-data-comes-small-worlds-messineo.PDF
from-big-data-comes-small-worlds-messineo.PDFfrom-big-data-comes-small-worlds-messineo.PDF
from-big-data-comes-small-worlds-messineo.PDFDavid Messineo
 
MicroStrategy Business Intelligence Solutions for Financial Services
MicroStrategy Business Intelligence Solutions for Financial ServicesMicroStrategy Business Intelligence Solutions for Financial Services
MicroStrategy Business Intelligence Solutions for Financial ServicesRK Paleru
 
Business Intelligence & Reporting are Not the Same
Business Intelligence & Reporting are Not the SameBusiness Intelligence & Reporting are Not the Same
Business Intelligence & Reporting are Not the SameHeath Turner
 

What's hot (20)

Mir 1808 cus_datplat
Mir 1808 cus_datplatMir 1808 cus_datplat
Mir 1808 cus_datplat
 
360 Degree View Of Customer Powerpoint Presentation Slides
360 Degree View Of Customer Powerpoint Presentation Slides360 Degree View Of Customer Powerpoint Presentation Slides
360 Degree View Of Customer Powerpoint Presentation Slides
 
Definitive guide-to-marketing-metrics-marketing-analytics
Definitive guide-to-marketing-metrics-marketing-analyticsDefinitive guide-to-marketing-metrics-marketing-analytics
Definitive guide-to-marketing-metrics-marketing-analytics
 
Single Customer View
Single Customer ViewSingle Customer View
Single Customer View
 
Attribution Modelling or Customer 360⁰ view engineering: Which comes first & ...
Attribution Modelling or Customer 360⁰ view engineering: Which comes first & ...Attribution Modelling or Customer 360⁰ view engineering: Which comes first & ...
Attribution Modelling or Customer 360⁰ view engineering: Which comes first & ...
 
The Role of CDP in Data-Driven Marketing
The Role of CDP in Data-Driven MarketingThe Role of CDP in Data-Driven Marketing
The Role of CDP in Data-Driven Marketing
 
Introduction to Customer Data Platforms
Introduction to Customer Data PlatformsIntroduction to Customer Data Platforms
Introduction to Customer Data Platforms
 
Retail Big Data and Analytics
Retail Big Data and AnalyticsRetail Big Data and Analytics
Retail Big Data and Analytics
 
Impact of BIG Data on MDM
Impact of BIG Data on MDMImpact of BIG Data on MDM
Impact of BIG Data on MDM
 
Using Power BI To Improve Media Buying & Ad Performance
Using Power BI To Improve Media Buying & Ad PerformanceUsing Power BI To Improve Media Buying & Ad Performance
Using Power BI To Improve Media Buying & Ad Performance
 
What is a customer data platform (CDP)?
What is a customer data platform (CDP)?What is a customer data platform (CDP)?
What is a customer data platform (CDP)?
 
Customer data platform categorization spectrum
Customer data platform categorization spectrumCustomer data platform categorization spectrum
Customer data platform categorization spectrum
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US Information
 
Analytics & Data Strategy 101 by Deko Dimeski
Analytics & Data Strategy 101 by Deko DimeskiAnalytics & Data Strategy 101 by Deko Dimeski
Analytics & Data Strategy 101 by Deko Dimeski
 
What is (and who needs) a customer data platform?
What is (and who needs) a customer data platform?What is (and who needs) a customer data platform?
What is (and who needs) a customer data platform?
 
MicroStrategy BI Solutions for Retail Industry
MicroStrategy BI Solutions for Retail IndustryMicroStrategy BI Solutions for Retail Industry
MicroStrategy BI Solutions for Retail Industry
 
from-big-data-comes-small-worlds-messineo.PDF
from-big-data-comes-small-worlds-messineo.PDFfrom-big-data-comes-small-worlds-messineo.PDF
from-big-data-comes-small-worlds-messineo.PDF
 
MicroStrategy Business Intelligence Solutions for Financial Services
MicroStrategy Business Intelligence Solutions for Financial ServicesMicroStrategy Business Intelligence Solutions for Financial Services
MicroStrategy Business Intelligence Solutions for Financial Services
 
Business Intelligence & Reporting are Not the Same
Business Intelligence & Reporting are Not the SameBusiness Intelligence & Reporting are Not the Same
Business Intelligence & Reporting are Not the Same
 
Getting Past the Hype about Customer Data Platforms - David Raab
Getting Past the Hype about Customer Data Platforms - David RaabGetting Past the Hype about Customer Data Platforms - David Raab
Getting Past the Hype about Customer Data Platforms - David Raab
 

Similar to Taming data lake - scalable metrics model

Taming the Data Lake with Scalable Metrics Model Framework
Taming the Data Lake with Scalable Metrics Model FrameworkTaming the Data Lake with Scalable Metrics Model Framework
Taming the Data Lake with Scalable Metrics Model FrameworkRamkumar Ravichandran
 
Data Visualization Trends - Next Steps for Tableau
Data Visualization Trends - Next Steps for TableauData Visualization Trends - Next Steps for Tableau
Data Visualization Trends - Next Steps for TableauArunima Gupta
 
Agile Mumbai 2022 - Balvinder Kaur & Sushant Joshi | Real-Time Insights and A...
Agile Mumbai 2022 - Balvinder Kaur & Sushant Joshi | Real-Time Insights and A...Agile Mumbai 2022 - Balvinder Kaur & Sushant Joshi | Real-Time Insights and A...
Agile Mumbai 2022 - Balvinder Kaur & Sushant Joshi | Real-Time Insights and A...AgileNetwork
 
How to Capitalize on Big Data with Oracle Analytics Cloud
How to Capitalize on Big Data with Oracle Analytics CloudHow to Capitalize on Big Data with Oracle Analytics Cloud
How to Capitalize on Big Data with Oracle Analytics CloudPerficient, Inc.
 
Finding business value in Big Data
Finding business value in Big DataFinding business value in Big Data
Finding business value in Big DataJames Serra
 
How to get started in extracting business value from big data 1 of 2 oct 2013
How to get started in extracting business value from big data 1 of 2 oct 2013How to get started in extracting business value from big data 1 of 2 oct 2013
How to get started in extracting business value from big data 1 of 2 oct 2013Jaime Nistal
 
[Notes] Customer 360 Analytics with LEO CDP
[Notes] Customer 360 Analytics with LEO CDP[Notes] Customer 360 Analytics with LEO CDP
[Notes] Customer 360 Analytics with LEO CDPTrieu Nguyen
 
Understanding Web Analytics and Google Analytics
Understanding Web Analytics and Google AnalyticsUnderstanding Web Analytics and Google Analytics
Understanding Web Analytics and Google AnalyticsPrathamesh Kulkarni
 
The ABCs of Treating Data as Product
The ABCs of Treating Data as ProductThe ABCs of Treating Data as Product
The ABCs of Treating Data as ProductDATAVERSITY
 
CDP for Retail Webinar with Appnovation - Q2 2022.pdf
CDP for Retail Webinar with Appnovation - Q2 2022.pdfCDP for Retail Webinar with Appnovation - Q2 2022.pdf
CDP for Retail Webinar with Appnovation - Q2 2022.pdfAcquia
 
SPSChicagoBurbs 2019 - What is CDM and CDS?
SPSChicagoBurbs 2019 - What is CDM and CDS?SPSChicagoBurbs 2019 - What is CDM and CDS?
SPSChicagoBurbs 2019 - What is CDM and CDS?Nicolas Georgeault
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIDenodo
 
MarketView Marketing Database Platform | Data Services, Inc.
MarketView Marketing Database Platform | Data Services, Inc.MarketView Marketing Database Platform | Data Services, Inc.
MarketView Marketing Database Platform | Data Services, Inc.Data Services, Inc.
 
DAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
DAMA Webinar: Turn Grand Designs into a Reality with Data VirtualizationDAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
DAMA Webinar: Turn Grand Designs into a Reality with Data VirtualizationDenodo
 
Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)
Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)
Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)Denodo
 

Similar to Taming data lake - scalable metrics model (20)

Taming the Data Lake with Scalable Metrics Model Framework
Taming the Data Lake with Scalable Metrics Model FrameworkTaming the Data Lake with Scalable Metrics Model Framework
Taming the Data Lake with Scalable Metrics Model Framework
 
Data Visualization Trends - Next Steps for Tableau
Data Visualization Trends - Next Steps for TableauData Visualization Trends - Next Steps for Tableau
Data Visualization Trends - Next Steps for Tableau
 
Agile Mumbai 2022 - Balvinder Kaur & Sushant Joshi | Real-Time Insights and A...
Agile Mumbai 2022 - Balvinder Kaur & Sushant Joshi | Real-Time Insights and A...Agile Mumbai 2022 - Balvinder Kaur & Sushant Joshi | Real-Time Insights and A...
Agile Mumbai 2022 - Balvinder Kaur & Sushant Joshi | Real-Time Insights and A...
 
How to Capitalize on Big Data with Oracle Analytics Cloud
How to Capitalize on Big Data with Oracle Analytics CloudHow to Capitalize on Big Data with Oracle Analytics Cloud
How to Capitalize on Big Data with Oracle Analytics Cloud
 
Real time analytics in Big Data
Real time analytics in Big DataReal time analytics in Big Data
Real time analytics in Big Data
 
Finding business value in Big Data
Finding business value in Big DataFinding business value in Big Data
Finding business value in Big Data
 
uae views on big data
  uae views on  big data  uae views on  big data
uae views on big data
 
How to get started in extracting business value from big data 1 of 2 oct 2013
How to get started in extracting business value from big data 1 of 2 oct 2013How to get started in extracting business value from big data 1 of 2 oct 2013
How to get started in extracting business value from big data 1 of 2 oct 2013
 
[Notes] Customer 360 Analytics with LEO CDP
[Notes] Customer 360 Analytics with LEO CDP[Notes] Customer 360 Analytics with LEO CDP
[Notes] Customer 360 Analytics with LEO CDP
 
Understanding Web Analytics and Google Analytics
Understanding Web Analytics and Google AnalyticsUnderstanding Web Analytics and Google Analytics
Understanding Web Analytics and Google Analytics
 
The ABCs of Treating Data as Product
The ABCs of Treating Data as ProductThe ABCs of Treating Data as Product
The ABCs of Treating Data as Product
 
CDP for Retail Webinar with Appnovation - Q2 2022.pdf
CDP for Retail Webinar with Appnovation - Q2 2022.pdfCDP for Retail Webinar with Appnovation - Q2 2022.pdf
CDP for Retail Webinar with Appnovation - Q2 2022.pdf
 
SPSChicagoBurbs 2019 - What is CDM and CDS?
SPSChicagoBurbs 2019 - What is CDM and CDS?SPSChicagoBurbs 2019 - What is CDM and CDS?
SPSChicagoBurbs 2019 - What is CDM and CDS?
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
 
Data Science and Analytics
Data Science and Analytics Data Science and Analytics
Data Science and Analytics
 
Dashboard Process
Dashboard ProcessDashboard Process
Dashboard Process
 
MarketView Marketing Database Platform | Data Services, Inc.
MarketView Marketing Database Platform | Data Services, Inc.MarketView Marketing Database Platform | Data Services, Inc.
MarketView Marketing Database Platform | Data Services, Inc.
 
Predictive Analytics as a Product
Predictive Analytics as a Product Predictive Analytics as a Product
Predictive Analytics as a Product
 
DAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
DAMA Webinar: Turn Grand Designs into a Reality with Data VirtualizationDAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
DAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
 
Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)
Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)
Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)
 

Recently uploaded

04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 

Recently uploaded (20)

04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 

Taming data lake - scalable metrics model

  • 2. “Taming the Data Lake” 2
  • 3. Intended for Knowledge Sharing only Disclaimer: Participation in this summit is purely on personal basis and not representing VISA in any form or matter. The talk is based on learnings from work across industries and firms. Care has been taken to ensure no proprietary or work related info of any firm is used in any material. Director, Insights at Visa, Inc. Enable Decision Making at the Executives/ Product/Marketing level via actionable insights derived from Data. RAMKUMAR RAVICHANDRAN Data Warehouse Architect at Visa, Inc. Architect a data-shop in Hadoop to get 360- degree view of the interaction. Technology interface for the Data Stakeholder Community. BHARATHIRAJA CHANDRASEKHARAN
  • 4. Intended for Knowledge Sharing only Quick recap of what it is Intended for Knowledge Sharing only Data Lakes – the concept
  • 5. AS THEY ARE ENVISIONED TODAY… Intended for Knowledge Sharing only Source: http://www.tangerine.co.th/tag/how-do-data-lake-work/ 5
  • 6. DOES IT RING A BELL? *only satiric to wake you up and not indicative of anyone or anything- any similarity is purely coincidental! 6
  • 7. & DOES THIS TOO? *only satiric to wake you up and not indicative of anyone or anything- any similarity is purely coincidental! 7
  • 8. SO WHAT DO WE HEAR FROM OUR USERS? We often hear these statements in the context of data lakes… Success criteria was engineering specific – Storage/Scalability cost saving, etc Expensive Change Management Complex for the end users to deal with Analytical performance issues Data Governance, Lineage and Management complexities “Although the cost of Storage went down, actual cost of utilizing the data has shot up” 8
  • 9. Intended for Knowledge Sharing only Quick recap of what it is Intended for Knowledge Sharing only Taking a step back
  • 10. DATA REALLY HAS GOTTEN BIG – VOLUME, VARIETY, VELOCITY & VERACITY Each of the data source is critical either across all or multiple functions…. Intended for Knowledge Sharing only …and are consumed either as reports, analytical deep dive insights, forward looking projections, etc. TRANSACTION DATA CLICK STREAM DATA (MOBILE & WEB) SENTIMENT/SOCIAL DATA • Are overall txns going up/down; where the txns are happening, etc.. • How are Consumers interacting with the website/app – drop-offs, clicks, Time spent, etc.. • Social Media, NPS surveys, Media mentions helps in gauging true Consumer reactions DATA SOURCES TYPES OF INSIGHTS SERVER LOGS DATA • How are consumers reacting with various functions on the front end? LOCATION DATA • Are consumers using the product in-store or on the move? PROMOTIONS DATA • How are consumers reacting to various marketing campaigns? INDUSTRY DATA • Benchmarking against industry performance 10
  • 11. EVERYONE NEEDS DATA… Intended for Knowledge Sharing only How are we doing today? BI Where will be tomorrow? What if we do this? What can we do? ANALYTICS Did the initiative work? A/B TESTING How do Customers feel about us? USER RESEARCH Where should we invest? STRATEGY 11
  • 12. …AND DISTRIBUTED DATA SYSTEMS HAD THEIR OWN ISSUES Intended for Knowledge Sharing only Inconsistent (and/or conflicting) definitions of data and numbers Varying granularities Multiple methodologies Different BU = (different KPIs or same KPIs different priorities) Lack of visibility/understanding outside of the BUs “Slow & inefficient, Non-scalable, Difficulties rolling up, Trust issues, Cascading mistakes” 12
  • 13. AND IT THEN JUST HAPPENED… Intended for Knowledge Sharing only TRANSACTION DATA CLICK STREAM DATA (MOBILE & WEB) SENTIMENT DATA DATA SOURCES SERVER LOGS DATA LOCATION DATA CAMPAIGN DATA INDUSTRY DATA Source: http://www.adamadiouf.com/2013/03/22/bigdata-vs-enterprise-data-warehouse/ As if all prayers were answered Hadoop arrived in a big way & poof all problems seemed to disappear… 13
  • 14. Intended for Knowledge Sharing only Quick recap of what it is Intended for Knowledge Sharing only A stroke of luck or was it?
  • 15. WE FOCUSED ON OUR SPOUSE BUT FORGOT THE IN-LAWS… Inform Reports on KPIs with high level drilldowns Act Deep dives via Business Analytics Predict Identify Causal relationships via Advanced Analytics Optimize Experiments to verify which one works via A/B Testing Maturity phases of Analytics Practice ValueAddition Intended for Knowledge Sharing only Mine Machine Learning Focus on the 20% Data consumers (Reports) and assumption was that 80% Data Consumers will either love it or at least figure it out… 5% 50% 15% 20% 10% 15
  • 16. HIGH DEVELOPMENT/MODIFICATION COSTS Intended for Knowledge Sharing only Rigid Structure and scale of operations make dynamism difficult… 16 Data Modeling/Schema ETL; Metadata Raw Data
  • 17. NOT ONLY IS THE AUDIENCE CHANGING… Intended for Knowledge Sharing only Stakeholders Needs Reports, Insights & Drilldowns Datamart Documentation Executives - Reports - High level drilldown - Unified summary - “On the go*” Marketing & PR - Campaign performance - Infographics - Deep dives - Testing Sales / RM - Sales performance - Prospecting - Competitive - Infographics Product - Product performance - Deep dive - Mining - Testing - Research Technology / AE / Operations - Platform performance - Deep dive - Forecasting - Real time alerting FP & A - Consolidated Initiative readouts (E2E) - Deduping - Drill downs - Forecasting 17
  • 18. …BUT ALSO THE NEEDS ARE EVER CHANGING Intended for Knowledge Sharing only “In mail” Recommendations with supporting graphs, tables, etc. “Story Deck” Full deck with the pitch and supporting arguments, numbers, graphs, charts “On-the-go” -Mobile App, On the Cloud, Subscriptions -Reports, Dashboards, Infographics Algorithm/Model Ready to be deployed How to decide? Customer needs; Turnaround Speed; One time/reuse; Deployment on Front end; Strategic Doc; Quick read/research doc 18
  • 19. Intended for Knowledge Sharing only Quick recap of what it is Intended for Knowledge Sharing only Getting to the point – what do we propose?
  • 20. WE BRING TO YOU THE SCALABLE METRICS MODEL (SMM)… EDW Aggregated Cubes Every attempt to bring the best of the most used models… 20 ACID, Fast, Stable Rigid, Cost, Resourcing Scalable Metrics Model (Pre-Aggregated Metrics + Primary- Foreign Keys) Cost, Flexibility, Scalability Performance, Reliability Performance, Easy to understand Reporting only
  • 21. TACTICAL DETAILS: WHERE DO WE START? An illustrative example from Retail domain… 21 • Defined Granularity & associated Info: Determined by Core Objectives, e.g., Customer level table for Customer Engagement team • Defined Foreign Keys & Common Dimensions: For extensibility • Defined Metrics: KPIs as required • Identify Value Add Metrics: recommendation, forecasting etc CUSTOMER •Primary Key: Customer id •Foreign Keys: Sign Up Partner, Promotion Id, First Txn id •Customer Level Info: Email, Phone, Number, Geo, etc. •Metrics: •Lifetime Spend, Txns •Behavioral Bucket •RFM Bucket •Recommended Action items: •Next Best Product •CLV •Target Offers •Call Center Agent Reco
  • 22. TACTICAL DETAILS: DATA MODEL An illustrative example from Retail domain… 22 id Dimensions foreign_keys metrics Customer_id Name Email Address,etc. signup_partner_id promotion_id Lifetime Spend, Txns Behavioral Bucket RFM Bucket Recommended Action items: Next Best Product CLV Target Offers Call Center Agent Reco 11234 {"name":"John", "Email" : "john@email.co m" , "Address":"123 nowhereblvd"} {"signup_partner_id ":"666YYY", "promotion" : "YAH123" } {"Lifetime Spend":"3400", "Txns":"150", "Behavioural Bucket" : "repeat user" , "RFM Bucket":"", "recommended Product id":"PRD789", "CLV":"??", "Target Offer":"OFF789", "CallCenterAgentReco":"1234"} WhatitcontainsSampledata
  • 23. TACTICAL DETAILS: ETL FRAMEWORK An illustrative example from Retail domain… 23 STEP I: QUERIES STEP II: FRAMEWORK RUNS •Write separate queries/code to get metrics on the defined granularity •Put those queries into the framework STEP III: IMPLEMENT MODULARITY STEP IV: USER INTERFACE •Adding a new metric is just adding a new query/code for that metric alone •Can change an existing logic for a metric will impact that metric alone •Create physical impala tables for interactive querying •Create views for abstraction and end-user access •Exporting data to reporting tools like Tableau/QlikView brings a high level of analysis capability to this model. •Framework runs each of these queries and populate respective keys
  • 24. ETL framework • Divide and conquer – Write separate queries/code to get metrics on the defined granularity – Put those queries into the framework • Framework runs each of these queries and populate respective keys • Modularity – Adding a new metric is just adding a new query/code for that metric alone – Can change an existing logic for a metric will impact that metric alone 24
  • 25. Reporting and presentation • Map data-types are hard for the users for access • Three options – Create physical impala tables for interactive querying – Create views for abstraction and end-user access • Reporting layer (like Tableau) – Brings a different level of accessibility and analysis capability to this model. • Faster (if data is cached) • Create report level calculations • Data blending • Using metrics as a dimension – like customer buckets on transaction size • Visualization 25
  • 26. DATA BUS EXTENSIBILITY CUSTOMER •Primary Key: Customer id •Foreign Keys: Sign Up Partner, Promotion Id, First Txn id •Customer Level Info: Email, Phone, Number, Geo, etc. •Metrics: •Lifetime Spend, Txns •Behavioral Bucket •RFM Bucket •Recommended Action items: •Next Best Product •CLV •Target Offers •Call Center Agent Reco SELLERS •Primary Key: Seller id •Foreign Keys: Product id, Operating Channel •Customer Level Info: Name, Operating Region, Annual Sales •Metrics: •Lifetime Sales, Txns •Performance Bucket •Special Category Flag •Recommended Action items: •Next Best Product •Next Co-Marketing •RM action TXNS •Primary Key: Txn id •Foreign Keys: Custid, Sellerid, Channel, •Txn Level Info: Amt, Type, Date, •Flags: •Buyer/Seller Type •Deviation Metrics •Fraud/Good •Agent Verification •Next Best Offer CLICKSTREAM PROMOTIONS PARTNERS PRODUCTS SENTIMENT LOGS 3rd PARTY ETC ETC… Common Dimensions or Foreign Keys
  • 28. THE SALIENT FEATURES 28 • Fit for wide variety of Solution Sets & audiences: Optimal data model to support all three needs – Reporting, Analytics & Data Mining. • Best of all worlds: Scalable Metrics Model is a hybrid approach, • ACID Strengths: performance, stability and reliability of RDBMS. • Non ACID Strengths: scalability, flexibility, versatility of Hadoop. • Needs Optimized Model: Highest premium is provided to needs of the user – easy to incorporate changes as they come along (view like). Refresh cycle is easy and changed logics easily get incorporated in the next run. • Data Governance & Lineage: Operates with a modular approach – break down complex problems into smaller items and integrate in a bigger scheme of things. This eases better Data Governance and Lineage. • Extensibility: • Caching: Easy integration with buffering technologies to optimize on performance. • Visualization: Easier integration with visualization tools like Tableau. • Coding Interface: Additional drilldowns, analyses, data analysis via HIVE/SAS/R. ● MODULAR ● EXTENDABLE ● UPDATABLE ● SCALABLE
  • 29. FOUR DIMENSIONS OF SUCCESSFUL EXECUTION 29 PEOPLE • Business Analysts: Details on Business needs like Timing(Immediate/ near/medium/long term), Priority (Critical/Urgent/Important/Good to have), Frequency (Regular/once-in-a-while/rare), Real-time, Delivery & Users. • Technical Architects: Understand the raw data structure, flow mechanisms & pipelines, security/legal/storage/resourcing constraints, feasibility assessments. PROCESS • Matching & Gap Analysis: Is the technology available to handle all business needs (possible/not enough RoI/deferred); Contingency, resourcing & budgeting. • Project Planning: Milestone based delivery, Deep Stakeholder involvement in development & validation, Communications Management • Execution: Schema on read efficient, Aggregates, Tight Metadata, reporting/analytics layer, Tables/Partitions/File types/Compression, Metadata TECH • PIG: ETL • HIVE/Impala: Schema & Table creation • Java/Streaming: • SAS/Python/R: Statistical Modeling CULTURE • Customer Needs Focused • Need for a smart vision, sound planning and able change management • Outcome Focused Organization (common business goal) • SAS/Python/R: Statistical Modeling
  • 30. WHY DO WE THINK THE TIME IS NOW? Evolution in the value prop of Analysts: What/where/how much -> what can happen ->what should we do ? Audience has broadened (A numbers middle man -> Front line Managers) Luxury of time has evaporated Nature of questions have drastically changed (Expectation of being able to connect the dots in “Data Lake” world). Overselling potential before getting “there” 30 KPI of Analytics has changed from Turn-Around-Time (TAT) to Time-to- Action (TTA)
  • 31. Intended for Knowledge Sharing only Quick recap of what it is Intended for Knowledge Sharing only Putting it all together
  • 32. SWOT ANALYSIS OF SMM STRENGTHS OPPORTUNITIES WEAKNESSES THREATS • Need sensitive model • Cost of development, modification & refresh reduced • Easy for Analysts/End Users to understand and play with • Data Governance & Lineage: Break down bigger problems into smaller manageable • Integration with front end tools that can simplify UX. • Tools that buffer the backend data to ensure speedy delivery. • Good vision of future Analytical requirements is paramount. • Full refresh every time it runs again. • Maximum granularity needs to be pre- fixed. • Learning Curve on Coding language/syntax. • Non-normalized data model. • Not for real-time insights delivery • No Slowly Changing Dimensions 32
  • 33. THE FIVE COMMANDMENTS 33 • “Know” that it caters to most frequent and not all needs. • “Must have” as good & farther as possible Analytics vision/needs and Outcome Focused approach. • “Ensure” Deeper Stakeholder involvement in the development. Test & Learn approach must. And be ready to modify if needed. • “Develop” modularity in delivery. • “Prepare” for ever more increasing dependencies from Analytics and other stakeholders.
  • 34. Intended for Knowledge Sharing only Quick recap of what it is Intended for Knowledge Sharing only Appendix
  • 35. THANK YOU! Intended for Knowledge Sharing only Would love to hear from you on any of the following forums… https://twitter.com/decisions_2_0 http://www.slideshare.net/RamkumarRavichandran https://www.youtube.com/channel/UCODSVC0WQws607clv0k8mQA/videos http://www.odbms.org/2015/01/ramkumar-ravichandran-visa/ https://www.linkedin.com/pub/ramkumar-ravichandran/10/545/67a https://www.linkedin.com/in/dataisbig http://bigdatadw.blogspot.com/ BHARATHIRAJA CHANDRASEKHARAN RAMKUMAR RAVICHANDRAN 35
  • 36. 36 RESEARCH/LEARNING RESOURCES Intended for Knowledge Sharing only • Alternative approach by Martin Fowler: http://martinfowler.com/bliki/DataLake.html • Teradata/Hortonworks Data Lake Whitepaper: http://hortonworks.com/wp- content/uploads/2014/05/TeradataHortonworks_Datalake_White-Paper_20140410.pdf • Teradata/Hortonworks Data Lake Whitepaper: http://hortonworks.com/wp- content/uploads/2014/05/TeradataHortonworks_Datalake_White-Paper_20140410.pdf • EMC Data Lake: https://www.youtube.com/watch?v=o2fs02h_LEo 36