SlideShare a Scribd company logo
1 of 44
Customer Value Analysis
of Big Data products
Vikas Sardana
Indian Institute of Management, Bangalore
Agenda
• Background – evolution of data, challenges,
products and vendors
• Top Big Data Use cases
• Case Analysis: Customer Value model for Big
Data analytics use case for a mobile advertising
network
• Conclusion
What is Big Data
• “Big data refers to datasets whose size is beyond
the ability of typical database software tools to
capture, store, manage, and analyze.”
• “Big data is high-volume, high-velocity and high-
variety information assets that demand cost-
effective, innovative forms of information
processing for enhanced insight and decision
making.”
Source:
Source:
Some Sources of Big data
• Web and social media
• Machine generated data – Radio Frequency
Identification, Global Positioning Systems,
Phone apps etc.
• Biometric data
• Human interactions (email, mobile phones,
voice mails, call centers)
Big Data Challenges
• Acquisition
• Storage
• Processing and Analysis
Big Data products
• Hadoop platform and tools
• NOSQL databases
Consumption Models
Open
Source(Build)
Open Source (Buy
support)
Proprietary(Buy)
On Premise
Externally hosted
(Cloud)
Trade-offs
• Building requires in-house expertise
• On Premise leads to capital expenditure while cloud leads to
operational expenses
Prominent Vendors
• Cloudera
• MapR
• HortonWorks
• IBM
• Amazon
Top Big Data customer use cases
• Predictive analytics
Building classification and prediction systems e.g. predicting
the buying preferences of customers.
• Revenue optimization
Pricing in real time based on several factors such as
demand, cost, competition e.g. dynamic pricing. This is
popular in various verticals esp. airline industry.
• Revenue generation
Activities to create revenue streams e.g. segmentation and
targeting.
Top Big Data customer use cases ( Cont. …)
• Maximizing human and physical resources
• Scientific research in new areas
• Fraud detection
Detect potential fraud patterns in transactions
• Security and crime prevention
Gartner’s hype cycle for Big Data - 2012
“Big data has gone into Peak of inflated expectations
and is likely to plateau in 2 – 5 years”
Is there value for customers … ? - Motivation for this study
Source: Gartner
Roger’s ACCORD model for diffusion of
innovation
Dimension Measure Justification
Relative
Advantage
High
(Favorable)
Big data products have solved many new problems and are far ahead of traditional data
management products
Compatibility High
(Favorable)
Most Big Data products use commodity hardware and popular programming languages
and hence are highly compatible in the current IT ecosystem
Complexity High
(Unfavorable)
With a different paradigm of parallelism and a bunch of solutions, users need to
understand the new ways of processing and storing data. However, it requires simpler
programming skills for engineers.
Observability Moderate Although Big Data has been popularized but it is a background IT infrastructure.
Nevertheless, due to the power of problems it has solved, this has been topic of
discussion in various forums
Risk High
(Unfavorable)
It requires considerable investment of resources and energy and is still in its initial years
Case Analysis
Big Data Analytics use case for a Mobile
Advertising Network
Research Methodology
• Primary research with the buying center
• Interviews with business stakeholders and domain experts to
understand business requirements and business metrics
• Interviews with analytics technology experts to understand system
level requirements
• Interviews with hardware procurement and planning experts to
understand costs and sizing methodologies
• Secondary research
• Research and analyst reports on Big Data
• User manuals of the products for Big Data management
• Books, articles and blogs on Big Data technologies and products
• Blogs and websites of prominent mobile Ad networks
Advertising Network overview
Two sided network with advertisers buying ad space
on one side and ad publishers selling the space on
the other.
Image source: www.altitudedigital.com
Ad Serving and Click Flows
Image source: www.inmobi.com
Pricing Model
• Cost Per Click (CPC) – Outcome based pricing,
advertiser is charged only when the ad is clicked.
• Ad network revenue – 50% of the revenue generated
from advertisers is appropriated by the ad network
and the rest 50% is realized by the publisher.
Business Goals Metrics for Ad network
Business Goal Metrics
Revenue optimization for publishers
and self
Maximize Click Through Rate (CTR)
Help advertisers with campaign
planning
Accuracy of CTR prediction
Help advertisers with campaign
optimization through ongoing
improvements
Accuracy and timeliness of real time reports
Help advertisers with campaign
analysis later
Ability and accuracy for canned and ad hoc
reporting
Business continuity Availability of reports on a sustained basis
Business Problem
Ad network has set up its data analytics systems to achieve its business goals but isn’t fairing very
well on its performance metrics
Functions of data analytics systems
* This is a high level functionality detail to highlight the hardware requirements though the actual
technical steps are different to process data for real time than those for batch reports.
The insights from various analytics and reporting mechanisms help in
effective placements and effectiveness of ads.
Challenges in data analytics
• Accessing the huge volume of data from the ad
servers
• Preparing huge data for analytics
• Analyzing the data at a large scale and providing
timely insights
Steps for analytics and suitable products
Step Big Data offering suitable Other suitable alternatives
Data Collection of logs and feeds at
a massive scale ( 8 billion collection
events per day)
Challenges:
Burst bandwidth, latency, backlog,
operability
Technical metrics:
Throughput, latency, data loss and
reliability, linearly scalable
Distributed Log Collectors. e.g.
Scribe(Facebook) Flume(Cloudera),
Kafka(LinkedIn)
Log files transferred through network
protocols such as FTP, rsync.
Storing the collected data
Technical metrics:
Throughput, reliability, high
availability, durability.
HDFS, S3, NOSQL stores Files, databases
Processing of data, ETL functions
Technical metrics:
Throughput, high availability
HDFS , Hadoop mapreduce, EMR on
Amazon
Home grown solutions using scripting
languages such as Perl
Steps for analytics and suitable products
Step Big Data offering suitable Other suitable alternatives
BI Reporting
Technical metrics:
Query latency, data freshness
NOSQL Columnar stores, warehouses Traditional row based data warehouses
Ad hoc reporting based on
historical data
Technical metrics:
Throughput, latency
Hadoop mapreduce, Cloudera Impala,
HortonWorks Stinger, Apache Dremel,
Greenplum, Netezza, Teradata
Relational databases
Predictive Analytics
Technical metrics:
Throughput, latency
R, Hadoop map reduce Home grown solution run on Massively
Parallel Processing systems running on
expensive, specialized hardware.
IT Systems architecture using traditional data
management products
IT Systems architecture using Big Data products
Choice of Big data product deployment
Open
Source(Build)
Open Source
(Buy support)
Proprietary(Bu
y)
On Premise
Externally hosted
(Cloud)
Decision criteria: Intellectual property
A strong technology and intellectual property are key
success factors in the mobile ad network and can help them
develop a competitive advantage
Typical case facts about data generated by Ad
Network
• Monthly Ad impressions served: 100 billion
• Events received per day:10 billion events
(An event is triggered at various stages of serving an ad.
Some example events: Ad Request and Ad Impression events,
User Click events, User Ad Interaction events,
Conversion/Acquisition events, and Monetization events)
• Average size of data received per event: 1 KB
• Data received per day: 10 terabytes
(10 billion events X 1 KB of data per event)
Source: https://hasgeek.tv/fifthelephant/2012-2/68-the-
elephant-that-flew-big-data-analytics-inmobi
Stage 1: Data Collection
• Traditional solution: Rsync and FTP are the popular tools used to move
these logs.
With Wide Area Network capacity up to 10 gigabit/sec available, it is easily
possible to send 10 terabytes of data per day from machines that produce logs
to those that consume them as required but the challenges are:
 WAN links are usually weak leads to backlogs on the producer machine.
 Consumer systems being down leads to data choking and delay in event delivery.
 Duplicate data transfer consumes unnecessarily more bandwidth.
• Big Data solution:
• Distributed Log Collectors – Few examples:
o Apache Flume (Initially built by Cloudera)
o Scribe (Facebook)
o Kafka (LinkedIn)
Technical benefits of using distributed log
collectors
• Ability to work with distributed producers over
WAN, with consumers sitting in local or remote
datacenters.
• Producers are decoupled from consumers, so
consumers can process at their own pace.
• Efficient: no duplicate data transfers, uses
compression
• Reliable and linearly scalable
Apache Flume Hardware requirements
Image source: http://flume.apache.org
No. of agents required
Tier 1 agents
• Ratio of 1:16 for outer tier
Number of tier 1 agents = 100/16 ~ 7
Tier 2 agents
• Ratio of 1:4 for inner tier since more data will be
pushed in to Tier-2 from Tier-1
Number of tier 2 agents = 7/4 ~ 2
Total agents required = 9
Physical storage requirements
Calculating the size of physical storage (hard drive) required
• Ad server data – 10 terabyte/day
• No. of ad servers = 100
• Data per sec. from ad server = 1012/(24*60*60*100) =115 KB
• Data to be collected in two hours at this rate = 115 x 60 x 60 x 2 = 828
MB.
(Assume expected resolution time for downstream failures is two hours)
• Increase by safety margin factor say 1.5 = 828 MB x 1.5 = 1,242 MB
• Required File Channel Capacity = 1.2 GB
The physical storage capacity requirement is around 1.2 GB.
CPU Requirements
Multiple sources and sinks can be defined on a given agent based on the event batch size.
Larger the batch size, greater the risk of duplication, hence batch size is limited to a max of
2500 events
Events per sec. = 10TB/(1KB*24*60*60) = 115
For Agent 1:
• Total Exit Batch Size from 16 upstream servers = 16 x 115 = 1840
• No. of sinks to accommodate 1840 events = [ 1840/2500 ] = 1
For Agent2:
• Receiving a batch of 1840 events from each of four upstream agents
• No. of sinks = [ 1840 * 4 / 2500 ] = 3
Cores = (Sources + Sinks) / 2
For Agent 1, Cores = 1
For Agent 2, Cores = 2
Apache Flume Total Hardware Requirements
7 single core machines, each $800
2 dual core machines, each $1000
Total Hardware cost
• $5600 + $2000 = $7600
Stage 2: Storing the collected data
Traditional solution: Network storage as a part of High Performance Computing
(HPC) Clusters
• Ten times extra overhead than commodity hard drives due to communication
requirements within the cluster
• Ten times costlier than commodity hardware due to specialized features such
as redundant storage, high availability etc.
Big Data solution: Hadoop Distributed File System (HDFS)
• Low storage cost per byte as compared to other alternatives such as Storage
Area Network
• Tuned to deliver fast data for Mapreduce workloads up to 2 gigabyte per
second.
• Data reliability is the primary use case and it has been used by various
organizations
• Uses commodity hardware – less initial and maintenance cost.
• Shares cost with compute layer since it is built into the Hadoop kernel.
• Linearly scalable in terms of performance and cost even at very high volume.
Storage Requirements and costs
Traditional Solution: HPC Network
storage
• Network storage used with HPC costs
$100000 for 100GB of data
• For the ad network’s current requirement
of 14 Petabytes, cost = $14 M
• In order to move to move away from this
architecture, there would be a salvage
value of 60% of this hardware.
Big Data solution
• 10TB per day is 30TB physical space (3x
replication factor) with a 30% overhead
for MR jobs' local space (10 * 3 * 1.30) =
39TB physical space per day
• 1.65 hosts per day's worth of data.
• For a 1 year retention, storage required =
39 Terabytes X 365 = 14 Petabytes
• ~600 hosts
• 600 hosts X $5000 per host = $3,000,000
Commodity hardware server configuration:
Chipset: 4 X 6 –core Intel Xeon 3GHz
Memory: 32GB
Operating System: Red Hat Enterprise Linux 5
Network: 2 Gbps (Bonded Network Interface Card)
Disk Space: 2TB X 12 JBOD (Just a Bunch of Disks)
Stage 3: Data processing and preparation
Traditional solution: Scripts (e.g. using
Perl scripting language) on High
Performance Compute hardware
Big Data Solution: Hadoop Mapreduce
Benefits of Hadoop Mapreduce over Perl on HPC hardware
• Scalable to thousands of nodes, shared nothing
• Abstracts complexity of distributed programming
• Reduced human resource cost to 0.5X
• High availability, fault tolerance
• Abstracts cluster functions
• High performance esp. for unstructured data on one time
processing.
Hardware costs for Data Preparation and Processing
Traditional Solution:
• 10TB /day =121MB/sec.
• Average throughput
(MB/s) per Node for
analytics workload = 1
• Desired throughput per
node = 121
• No. of nodes required ~
120
• Cost = 120 nodes X $5000
per node = $600,000
Big Data solution:
• 10TB /day =121MB/sec.
• Average throughput (MB/s)
per Node for analytics
workload = 10
• Desired throughput per
node = 121
• No. of nodes required ~ 12
• Cost = 12 nodes X $5000
per node = $60,000
Human Resource Cost for Data Preparation and
Processing
Traditional solution:
Complex skillset required
to handle distributed
computing complexity
Estimate: 50 person team
@$35000 per person per
year
Cost: $1750000
Big Data solution:
Simpler skillset required
as complexities are
abstracted from the
programmers.
Estimate: 50% cost
reduction
Cost: $875000
Stage 4: Analytics – Reporting, Ad hoc and
predictive analytics
Traditional solution: Row based data warehouses with Structured Query
Language
Big Data solution: NOSQL column stores
No additional hardware costs and similar human resource costs
• Big data solutions benefit as the schemas can be modified at a later stage
to keep the reports up to date with new type of data.
• Optimized for columnar storage and access which are main tasks in
analytics
Quantification of immediate business benefits
S No. Benefit Description Quantum
1 Increase in ad
revenue due to
better CTR
Improved ads will help ad
matching algorithms
more accurately target
the ads to the relevant
users with the relevant
publishers
Estimated CTR increase 5%
Corresponding increase in
Publisher’s ad revenue
5%
Corresponding increase in ad
network’s revenue (50% of
publisher’s ad revenue)
5%
Ad network’s increase in
revenue (current rev. $100M)
$5 M
2 Increase in ad
revenue by
enabling
advertisers to
better plan
campaigns
Better accuracy in
predicting CTR will help
advertisers in better
campaign planning. This
will help improve CTR in
turn increasing the
revenue for publishers
and the ad network
Estimated CTR increase 5%
Corresponding increase in
Publisher’s ad revenue
5%
Corresponding increase in ad
network’s revenue (50% of
publisher’s ad revenue)
5%
Ad network’s increase in
revenue (current rev. $100M)
$5 M
Quantification of immediate business benefits
Benefit Description Quantum
Increase in ad
revenue due to
better campaign
optimization
Timely and accurate real time
reports will help advertisers do
course correction helping further
with CTR improvement leading to
better ad revenue
Estimated CTR increase 5%
Corresponding increase in
Publisher’s ad revenue
5%
Corresponding increase in ad
network’s revenue (50% of
publisher’s ad revenue)
5%
Ad network’s increase in revenue
(current rev. $100M)
$5 M
Increase in ad
revenue due to
better availability
of reports
If the ad network provides better
continuity to advertisers, they will
be willing to pay premium.
Estimated premium payment 2%
Corresponding increase in ad
network’s revenue (50% of
publisher’s ad revenue)
2%
Ad network’s increase in revenue
(current rev. $100M)
$2 M
Total increase in ad Network’s revenue (1 + 2 + 3 + 4) $14 M
Value Element Mapping
Points of Parity • Open Source software available and the company can
customize and enhance it the way they want.
• Support for Java programming language, for which it is
easy to hire people and further enhance the software due to
abundantly available talent pool
Points of Difference • Simpler skillset required for in-house IT experts in case
of big data products.
• Ability to handle all aspects of big data problems in Big
data products unlike traditional data management products.
• Linearly scalable - Big data products can work with
cheaper hardware and are linearly scalable making them a
future proof investment.
Points of Contention • Adoption uncertainty Although there is community
support among developers to maintain and evolve the Big
data open source products which is growing very fast due
to the buzz but it is unclear whether it will pick up as good
as that in traditional software.
• Stability of big data vendors The commercial vendors are
mostly newly formed companies though founded by very
accomplished people. They are fast gaining traction but it is
unclear whether they will be able to sustain for long term.
Moreover, since pure play Big Data firms are privately held,
their growth and revenues are not clearly known.
Customer Value Model
Big data products Traditional products (Next Best Alternative
– NBA)
Benefits $17M Status quo with the existing systems
Cost Other than Price (Capex + Annual) in
the first year
$7600 (Data Collection)
+ $3M (Storage)
+ $60K (Processing)
+$875000 (Salaries)
+ $1.5M (Implementation and training)
(Already incurred in the existing systems)
$14M (Storage)
+ $600K(Processing)
+$1750000 (Salaries)
Total Cost $5442600 Sunk cost
Value = Benefit - Cost
$11557400
No additional value in the existing systems
Price Free and Open Source Free and Open Source
Delta(Price) 0
Value in Use = Delta(Value) - Delta(Price) $11557400
Effective value in use (for migration to Big
Data products) = Value in Use + Salvage
value of storage and processing + Salaries
saved $22067400
Ignoring the time value of money since the cash flows are considered over a
short period i.e. one year.
Framework reference: James C. Anderson, James A. Narus, DVR
Seshadri
Value placeholders (less tangible)
Positives
• Big data products architecture will be linearly scalable and hence future
proof, future data management requirements will be fulfilled by adding
incremental cost towards buying commodity hardware.
• Customer satisfaction and hence low customer churn due to increased
control in their hands for managing their advertisements.
• Skillset required for in-house IT experts is simpler in case of big data
products and mostly based on popular Java technology.
Negatives
• Although the above big data products are backed by strong companies
and open source communities, these companies and communities are
not as strong as the ones for traditional products.
• The commercial vendors are mostly newly formed companies but
founded by very capable people which are fast gaining traction but it is
unclear whether they will be able to sustain for long term.
Conclusion
• The above case study clearly builds a case for the
value proposition of Big Data products
• Similarly, big data products are being used
extensively across various industries and this value
model will help in building a concrete case for Big
Data products

More Related Content

What's hot

Rapid Response: Rebranding IT By Creating Transformation Business Value
Rapid Response: Rebranding IT By Creating Transformation Business ValueRapid Response: Rebranding IT By Creating Transformation Business Value
Rapid Response: Rebranding IT By Creating Transformation Business ValueSam Pakrashi
 
Lean Management by Bearing Point
Lean Management by Bearing PointLean Management by Bearing Point
Lean Management by Bearing Pointgiraudeau
 
Competitive Intelligence
Competitive IntelligenceCompetitive Intelligence
Competitive IntelligenceElijah Ezendu
 
Saratoga CRM Roadmap
Saratoga CRM RoadmapSaratoga CRM Roadmap
Saratoga CRM RoadmapAptean
 
Bmgt 518 term paper slides
Bmgt 518 term paper slidesBmgt 518 term paper slides
Bmgt 518 term paper slidessafyan83
 
E finance ppt. for bfi subject and global finance with e banking.
E finance ppt. for bfi subject and global finance with e banking.E finance ppt. for bfi subject and global finance with e banking.
E finance ppt. for bfi subject and global finance with e banking.Ramon Lapid
 
Integrating Marketing & BD into Everyones Job
Integrating Marketing & BD into Everyones JobIntegrating Marketing & BD into Everyones Job
Integrating Marketing & BD into Everyones JobDavid Blumentals
 
Multisourcing the new global trend
Multisourcing   the new global trendMultisourcing   the new global trend
Multisourcing the new global trendRam Garg
 
VY_FIMECC-S4Fleet_esite_valmis-low
VY_FIMECC-S4Fleet_esite_valmis-lowVY_FIMECC-S4Fleet_esite_valmis-low
VY_FIMECC-S4Fleet_esite_valmis-lowMathias Hasselblatt
 
GRA Retail Supply Chain Whitepaper - Perspectives on Strategic Investment
GRA Retail Supply Chain Whitepaper - Perspectives on Strategic InvestmentGRA Retail Supply Chain Whitepaper - Perspectives on Strategic Investment
GRA Retail Supply Chain Whitepaper - Perspectives on Strategic InvestmentRebecca Manjra
 
Peppersand rogersgrp whitepaper_ms_dynamicscrm_07_2011_a4_highres
Peppersand rogersgrp whitepaper_ms_dynamicscrm_07_2011_a4_highresPeppersand rogersgrp whitepaper_ms_dynamicscrm_07_2011_a4_highres
Peppersand rogersgrp whitepaper_ms_dynamicscrm_07_2011_a4_highresSuruli Kannan
 
Example Whitepaper
Example WhitepaperExample Whitepaper
Example Whitepapercsheppar
 
White Paper: How to bridge the gap between business, IT and networks – applyi...
White Paper: How to bridge the gap between business, IT and networks – applyi...White Paper: How to bridge the gap between business, IT and networks – applyi...
White Paper: How to bridge the gap between business, IT and networks – applyi...Ericsson
 
HCLT Brochure: Business Intelligence in Retail
HCLT Brochure: Business Intelligence in RetailHCLT Brochure: Business Intelligence in Retail
HCLT Brochure: Business Intelligence in RetailHCL Technologies
 
Creating a Capability-Led IT Organization
Creating a Capability-Led IT OrganizationCreating a Capability-Led IT Organization
Creating a Capability-Led IT OrganizationCognizant
 
Client X Enterprise Architecture
Client X Enterprise ArchitectureClient X Enterprise Architecture
Client X Enterprise ArchitectureClinton den Heyer
 
Insights To Accelerate Services Growth (Oco White Paper)
Insights To Accelerate Services Growth (Oco White Paper)Insights To Accelerate Services Growth (Oco White Paper)
Insights To Accelerate Services Growth (Oco White Paper)Jon Hansen
 

What's hot (18)

Rapid Response: Rebranding IT By Creating Transformation Business Value
Rapid Response: Rebranding IT By Creating Transformation Business ValueRapid Response: Rebranding IT By Creating Transformation Business Value
Rapid Response: Rebranding IT By Creating Transformation Business Value
 
Lean Management by Bearing Point
Lean Management by Bearing PointLean Management by Bearing Point
Lean Management by Bearing Point
 
Competitive Intelligence
Competitive IntelligenceCompetitive Intelligence
Competitive Intelligence
 
Saratoga CRM Roadmap
Saratoga CRM RoadmapSaratoga CRM Roadmap
Saratoga CRM Roadmap
 
Bmgt 518 term paper slides
Bmgt 518 term paper slidesBmgt 518 term paper slides
Bmgt 518 term paper slides
 
E finance ppt. for bfi subject and global finance with e banking.
E finance ppt. for bfi subject and global finance with e banking.E finance ppt. for bfi subject and global finance with e banking.
E finance ppt. for bfi subject and global finance with e banking.
 
Integrating Marketing & BD into Everyones Job
Integrating Marketing & BD into Everyones JobIntegrating Marketing & BD into Everyones Job
Integrating Marketing & BD into Everyones Job
 
Multisourcing the new global trend
Multisourcing   the new global trendMultisourcing   the new global trend
Multisourcing the new global trend
 
VY_FIMECC-S4Fleet_esite_valmis-low
VY_FIMECC-S4Fleet_esite_valmis-lowVY_FIMECC-S4Fleet_esite_valmis-low
VY_FIMECC-S4Fleet_esite_valmis-low
 
GRA Retail Supply Chain Whitepaper - Perspectives on Strategic Investment
GRA Retail Supply Chain Whitepaper - Perspectives on Strategic InvestmentGRA Retail Supply Chain Whitepaper - Perspectives on Strategic Investment
GRA Retail Supply Chain Whitepaper - Perspectives on Strategic Investment
 
Peppersand rogersgrp whitepaper_ms_dynamicscrm_07_2011_a4_highres
Peppersand rogersgrp whitepaper_ms_dynamicscrm_07_2011_a4_highresPeppersand rogersgrp whitepaper_ms_dynamicscrm_07_2011_a4_highres
Peppersand rogersgrp whitepaper_ms_dynamicscrm_07_2011_a4_highres
 
Example Whitepaper
Example WhitepaperExample Whitepaper
Example Whitepaper
 
White Paper: How to bridge the gap between business, IT and networks – applyi...
White Paper: How to bridge the gap between business, IT and networks – applyi...White Paper: How to bridge the gap between business, IT and networks – applyi...
White Paper: How to bridge the gap between business, IT and networks – applyi...
 
HCLT Brochure: Business Intelligence in Retail
HCLT Brochure: Business Intelligence in RetailHCLT Brochure: Business Intelligence in Retail
HCLT Brochure: Business Intelligence in Retail
 
Creating a Capability-Led IT Organization
Creating a Capability-Led IT OrganizationCreating a Capability-Led IT Organization
Creating a Capability-Led IT Organization
 
Client X Enterprise Architecture
Client X Enterprise ArchitectureClient X Enterprise Architecture
Client X Enterprise Architecture
 
Procurement challenges
Procurement challengesProcurement challenges
Procurement challenges
 
Insights To Accelerate Services Growth (Oco White Paper)
Insights To Accelerate Services Growth (Oco White Paper)Insights To Accelerate Services Growth (Oco White Paper)
Insights To Accelerate Services Growth (Oco White Paper)
 

Viewers also liked

Real-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormReal-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormNati Shalom
 
Stream processing using Apache Storm - Big Data Meetup Athens 2016
Stream processing using Apache Storm - Big Data Meetup Athens 2016Stream processing using Apache Storm - Big Data Meetup Athens 2016
Stream processing using Apache Storm - Big Data Meetup Athens 2016Adrianos Dadis
 
Service Blueprinting / Service Design Drinks Berlin
Service Blueprinting / Service Design Drinks BerlinService Blueprinting / Service Design Drinks Berlin
Service Blueprinting / Service Design Drinks BerlinService Design Berlin
 
Real Time Data Streaming using Kafka & Storm
Real Time Data Streaming using Kafka & StormReal Time Data Streaming using Kafka & Storm
Real Time Data Streaming using Kafka & StormRan Silberman
 
Standardization and customization
Standardization and customizationStandardization and customization
Standardization and customizationYASHADA, Pune
 
Project titles for mba research project
Project titles for mba research projectProject titles for mba research project
Project titles for mba research projectEzhil Arasan
 
Kapferer Brand identity Prism
Kapferer Brand identity PrismKapferer Brand identity Prism
Kapferer Brand identity PrismZeynep Çıkın
 
Branding & Brand Positioning
Branding & Brand PositioningBranding & Brand Positioning
Branding & Brand PositioningSamer Meqdad
 
Personal Branding | Stand Out From The Crowd
Personal Branding | Stand Out From The CrowdPersonal Branding | Stand Out From The Crowd
Personal Branding | Stand Out From The CrowdMoataz Yasser
 
Intro to Branding & Brand management - Elkottab
Intro to Branding & Brand management - ElkottabIntro to Branding & Brand management - Elkottab
Intro to Branding & Brand management - ElkottabMuhammad Omar
 
Kapferer Model Brand Identity Prism
Kapferer Model Brand Identity PrismKapferer Model Brand Identity Prism
Kapferer Model Brand Identity Prismnitin59
 

Viewers also liked (12)

Real-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormReal-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using Storm
 
Stream processing using Apache Storm - Big Data Meetup Athens 2016
Stream processing using Apache Storm - Big Data Meetup Athens 2016Stream processing using Apache Storm - Big Data Meetup Athens 2016
Stream processing using Apache Storm - Big Data Meetup Athens 2016
 
Brand strategy
Brand strategyBrand strategy
Brand strategy
 
Service Blueprinting / Service Design Drinks Berlin
Service Blueprinting / Service Design Drinks BerlinService Blueprinting / Service Design Drinks Berlin
Service Blueprinting / Service Design Drinks Berlin
 
Real Time Data Streaming using Kafka & Storm
Real Time Data Streaming using Kafka & StormReal Time Data Streaming using Kafka & Storm
Real Time Data Streaming using Kafka & Storm
 
Standardization and customization
Standardization and customizationStandardization and customization
Standardization and customization
 
Project titles for mba research project
Project titles for mba research projectProject titles for mba research project
Project titles for mba research project
 
Kapferer Brand identity Prism
Kapferer Brand identity PrismKapferer Brand identity Prism
Kapferer Brand identity Prism
 
Branding & Brand Positioning
Branding & Brand PositioningBranding & Brand Positioning
Branding & Brand Positioning
 
Personal Branding | Stand Out From The Crowd
Personal Branding | Stand Out From The CrowdPersonal Branding | Stand Out From The Crowd
Personal Branding | Stand Out From The Crowd
 
Intro to Branding & Brand management - Elkottab
Intro to Branding & Brand management - ElkottabIntro to Branding & Brand management - Elkottab
Intro to Branding & Brand management - Elkottab
 
Kapferer Model Brand Identity Prism
Kapferer Model Brand Identity PrismKapferer Model Brand Identity Prism
Kapferer Model Brand Identity Prism
 

Similar to Customer value analysis of big data products

Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Group
 
Hadoop Boosts Profits in Media and Telecom Industry
Hadoop Boosts Profits in Media and Telecom IndustryHadoop Boosts Profits in Media and Telecom Industry
Hadoop Boosts Profits in Media and Telecom IndustryDataWorks Summit
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Nathan Bijnens
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2RojaT4
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSPhilip Filleul
 
IBM Relay 2015: Open for Data
IBM Relay 2015: Open for Data IBM Relay 2015: Open for Data
IBM Relay 2015: Open for Data IBM
 
SMAC - Social, Mobile, Analytics and Cloud - An overview
SMAC - Social, Mobile, Analytics and Cloud - An overview SMAC - Social, Mobile, Analytics and Cloud - An overview
SMAC - Social, Mobile, Analytics and Cloud - An overview Rajesh Menon
 
AWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
AWS Webcast - Sales Productivity Solutions with MicroStrategy and RedshiftAWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
AWS Webcast - Sales Productivity Solutions with MicroStrategy and RedshiftAmazon Web Services
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big DataInfochimps, a CSC Big Data Business
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolutionitnewsafrica
 
Denodo DataFest 2017: Conquering the Edge with Data Virtualization
Denodo DataFest 2017: Conquering the Edge with Data VirtualizationDenodo DataFest 2017: Conquering the Edge with Data Virtualization
Denodo DataFest 2017: Conquering the Edge with Data VirtualizationDenodo
 
Qo Introduction V2
Qo Introduction V2Qo Introduction V2
Qo Introduction V2Joe_F
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarioskcmallu
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02email2jl
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitecturePerficient, Inc.
 
Gse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedGse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedcedrinemadera
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
 

Similar to Customer value analysis of big data products (20)

Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Hadoop Boosts Profits in Media and Telecom Industry
Hadoop Boosts Profits in Media and Telecom IndustryHadoop Boosts Profits in Media and Telecom Industry
Hadoop Boosts Profits in Media and Telecom Industry
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FS
 
IBM Relay 2015: Open for Data
IBM Relay 2015: Open for Data IBM Relay 2015: Open for Data
IBM Relay 2015: Open for Data
 
SMAC - Social, Mobile, Analytics and Cloud - An overview
SMAC - Social, Mobile, Analytics and Cloud - An overview SMAC - Social, Mobile, Analytics and Cloud - An overview
SMAC - Social, Mobile, Analytics and Cloud - An overview
 
AWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
AWS Webcast - Sales Productivity Solutions with MicroStrategy and RedshiftAWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
AWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Denodo DataFest 2017: Conquering the Edge with Data Virtualization
Denodo DataFest 2017: Conquering the Edge with Data VirtualizationDenodo DataFest 2017: Conquering the Edge with Data Virtualization
Denodo DataFest 2017: Conquering the Edge with Data Virtualization
 
Qo Introduction V2
Qo Introduction V2Qo Introduction V2
Qo Introduction V2
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data Architecture
 
Gse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedGse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-shared
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 

Recently uploaded

Inbound Marekting 2.0 - The Paradigm Shift in Marketing | Axon Garside
Inbound Marekting 2.0 - The Paradigm Shift in Marketing | Axon GarsideInbound Marekting 2.0 - The Paradigm Shift in Marketing | Axon Garside
Inbound Marekting 2.0 - The Paradigm Shift in Marketing | Axon Garsiderobwhite630290
 
Call Girls in Lajpat Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Lajpat Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Lajpat Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Lajpat Nagar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Common Culture: Paul Willis Symbolic Creativity
Common Culture: Paul Willis Symbolic CreativityCommon Culture: Paul Willis Symbolic Creativity
Common Culture: Paul Willis Symbolic CreativityMonishka Adhikari
 
GreenSEO April 2024: Join the Green Web Revolution
GreenSEO April 2024: Join the Green Web RevolutionGreenSEO April 2024: Join the Green Web Revolution
GreenSEO April 2024: Join the Green Web RevolutionWilliam Barnes
 
Digital Marketing Spotlight: Lifecycle Advertising Strategies.pdf
Digital Marketing Spotlight: Lifecycle Advertising Strategies.pdfDigital Marketing Spotlight: Lifecycle Advertising Strategies.pdf
Digital Marketing Spotlight: Lifecycle Advertising Strategies.pdfDemandbase
 
BrightonSEO - Addressing SEO & CX - CMDL - Apr 24 .pptx
BrightonSEO -  Addressing SEO & CX - CMDL - Apr 24 .pptxBrightonSEO -  Addressing SEO & CX - CMDL - Apr 24 .pptx
BrightonSEO - Addressing SEO & CX - CMDL - Apr 24 .pptxcollette15
 
Cost-effective tactics for navigating CPC surges
Cost-effective tactics for navigating CPC surgesCost-effective tactics for navigating CPC surges
Cost-effective tactics for navigating CPC surgesPushON Ltd
 
VIP Call Girls In Green Park 9654467111 Escorts Service
VIP Call Girls In Green Park 9654467111 Escorts ServiceVIP Call Girls In Green Park 9654467111 Escorts Service
VIP Call Girls In Green Park 9654467111 Escorts ServiceSapana Sha
 
(Generative) AI & Marketing: - Out of the Hype - Empowering the Marketing M...
(Generative) AI & Marketing: - Out of the Hype - Empowering the Marketing M...(Generative) AI & Marketing: - Out of the Hype - Empowering the Marketing M...
(Generative) AI & Marketing: - Out of the Hype - Empowering the Marketing M...Hugues Rey
 
pptx.marketing strategy of tanishq. pptx
pptx.marketing strategy of tanishq. pptxpptx.marketing strategy of tanishq. pptx
pptx.marketing strategy of tanishq. pptxarsathsahil
 
What are the 4 characteristics of CTAs that convert?
What are the 4 characteristics of CTAs that convert?What are the 4 characteristics of CTAs that convert?
What are the 4 characteristics of CTAs that convert?Juan Pineda
 
marketing strategy of tanishq word PPROJECT.pdf
marketing strategy of tanishq word PPROJECT.pdfmarketing strategy of tanishq word PPROJECT.pdf
marketing strategy of tanishq word PPROJECT.pdfarsathsahil
 
定制(ULV毕业证书)拉文大学毕业证成绩单原版一比一
定制(ULV毕业证书)拉文大学毕业证成绩单原版一比一定制(ULV毕业证书)拉文大学毕业证成绩单原版一比一
定制(ULV毕业证书)拉文大学毕业证成绩单原版一比一s SS
 
The Pitfalls of Keyword Stuffing in SEO Copywriting
The Pitfalls of Keyword Stuffing in SEO CopywritingThe Pitfalls of Keyword Stuffing in SEO Copywriting
The Pitfalls of Keyword Stuffing in SEO CopywritingJuan Pineda
 
DIGITAL MARKETING STRATEGY_INFOGRAPHIC IMAGE.pdf
DIGITAL MARKETING STRATEGY_INFOGRAPHIC IMAGE.pdfDIGITAL MARKETING STRATEGY_INFOGRAPHIC IMAGE.pdf
DIGITAL MARKETING STRATEGY_INFOGRAPHIC IMAGE.pdfmayanksharma0441
 
DIGITAL MARKETING COURSE IN BTM -Influencer Marketing Strategy
DIGITAL MARKETING COURSE IN BTM -Influencer Marketing StrategyDIGITAL MARKETING COURSE IN BTM -Influencer Marketing Strategy
DIGITAL MARKETING COURSE IN BTM -Influencer Marketing StrategySouvikRay24
 
Jai Institute for Parenting Program Guide
Jai Institute for Parenting Program GuideJai Institute for Parenting Program Guide
Jai Institute for Parenting Program Guidekiva6
 
Do More with Less: Navigating Customer Acquisition Challenges for Today's Ent...
Do More with Less: Navigating Customer Acquisition Challenges for Today's Ent...Do More with Less: Navigating Customer Acquisition Challenges for Today's Ent...
Do More with Less: Navigating Customer Acquisition Challenges for Today's Ent...Search Engine Journal
 
Local SEO Domination: Put your business at the forefront of local searches!
Local SEO Domination:  Put your business at the forefront of local searches!Local SEO Domination:  Put your business at the forefront of local searches!
Local SEO Domination: Put your business at the forefront of local searches!dstvtechnician
 
Forecast of Content Marketing through AI
Forecast of Content Marketing through AIForecast of Content Marketing through AI
Forecast of Content Marketing through AIRinky
 

Recently uploaded (20)

Inbound Marekting 2.0 - The Paradigm Shift in Marketing | Axon Garside
Inbound Marekting 2.0 - The Paradigm Shift in Marketing | Axon GarsideInbound Marekting 2.0 - The Paradigm Shift in Marketing | Axon Garside
Inbound Marekting 2.0 - The Paradigm Shift in Marketing | Axon Garside
 
Call Girls in Lajpat Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Lajpat Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Lajpat Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Lajpat Nagar Delhi 💯Call Us 🔝8264348440🔝
 
Common Culture: Paul Willis Symbolic Creativity
Common Culture: Paul Willis Symbolic CreativityCommon Culture: Paul Willis Symbolic Creativity
Common Culture: Paul Willis Symbolic Creativity
 
GreenSEO April 2024: Join the Green Web Revolution
GreenSEO April 2024: Join the Green Web RevolutionGreenSEO April 2024: Join the Green Web Revolution
GreenSEO April 2024: Join the Green Web Revolution
 
Digital Marketing Spotlight: Lifecycle Advertising Strategies.pdf
Digital Marketing Spotlight: Lifecycle Advertising Strategies.pdfDigital Marketing Spotlight: Lifecycle Advertising Strategies.pdf
Digital Marketing Spotlight: Lifecycle Advertising Strategies.pdf
 
BrightonSEO - Addressing SEO & CX - CMDL - Apr 24 .pptx
BrightonSEO -  Addressing SEO & CX - CMDL - Apr 24 .pptxBrightonSEO -  Addressing SEO & CX - CMDL - Apr 24 .pptx
BrightonSEO - Addressing SEO & CX - CMDL - Apr 24 .pptx
 
Cost-effective tactics for navigating CPC surges
Cost-effective tactics for navigating CPC surgesCost-effective tactics for navigating CPC surges
Cost-effective tactics for navigating CPC surges
 
VIP Call Girls In Green Park 9654467111 Escorts Service
VIP Call Girls In Green Park 9654467111 Escorts ServiceVIP Call Girls In Green Park 9654467111 Escorts Service
VIP Call Girls In Green Park 9654467111 Escorts Service
 
(Generative) AI & Marketing: - Out of the Hype - Empowering the Marketing M...
(Generative) AI & Marketing: - Out of the Hype - Empowering the Marketing M...(Generative) AI & Marketing: - Out of the Hype - Empowering the Marketing M...
(Generative) AI & Marketing: - Out of the Hype - Empowering the Marketing M...
 
pptx.marketing strategy of tanishq. pptx
pptx.marketing strategy of tanishq. pptxpptx.marketing strategy of tanishq. pptx
pptx.marketing strategy of tanishq. pptx
 
What are the 4 characteristics of CTAs that convert?
What are the 4 characteristics of CTAs that convert?What are the 4 characteristics of CTAs that convert?
What are the 4 characteristics of CTAs that convert?
 
marketing strategy of tanishq word PPROJECT.pdf
marketing strategy of tanishq word PPROJECT.pdfmarketing strategy of tanishq word PPROJECT.pdf
marketing strategy of tanishq word PPROJECT.pdf
 
定制(ULV毕业证书)拉文大学毕业证成绩单原版一比一
定制(ULV毕业证书)拉文大学毕业证成绩单原版一比一定制(ULV毕业证书)拉文大学毕业证成绩单原版一比一
定制(ULV毕业证书)拉文大学毕业证成绩单原版一比一
 
The Pitfalls of Keyword Stuffing in SEO Copywriting
The Pitfalls of Keyword Stuffing in SEO CopywritingThe Pitfalls of Keyword Stuffing in SEO Copywriting
The Pitfalls of Keyword Stuffing in SEO Copywriting
 
DIGITAL MARKETING STRATEGY_INFOGRAPHIC IMAGE.pdf
DIGITAL MARKETING STRATEGY_INFOGRAPHIC IMAGE.pdfDIGITAL MARKETING STRATEGY_INFOGRAPHIC IMAGE.pdf
DIGITAL MARKETING STRATEGY_INFOGRAPHIC IMAGE.pdf
 
DIGITAL MARKETING COURSE IN BTM -Influencer Marketing Strategy
DIGITAL MARKETING COURSE IN BTM -Influencer Marketing StrategyDIGITAL MARKETING COURSE IN BTM -Influencer Marketing Strategy
DIGITAL MARKETING COURSE IN BTM -Influencer Marketing Strategy
 
Jai Institute for Parenting Program Guide
Jai Institute for Parenting Program GuideJai Institute for Parenting Program Guide
Jai Institute for Parenting Program Guide
 
Do More with Less: Navigating Customer Acquisition Challenges for Today's Ent...
Do More with Less: Navigating Customer Acquisition Challenges for Today's Ent...Do More with Less: Navigating Customer Acquisition Challenges for Today's Ent...
Do More with Less: Navigating Customer Acquisition Challenges for Today's Ent...
 
Local SEO Domination: Put your business at the forefront of local searches!
Local SEO Domination:  Put your business at the forefront of local searches!Local SEO Domination:  Put your business at the forefront of local searches!
Local SEO Domination: Put your business at the forefront of local searches!
 
Forecast of Content Marketing through AI
Forecast of Content Marketing through AIForecast of Content Marketing through AI
Forecast of Content Marketing through AI
 

Customer value analysis of big data products

  • 1. Customer Value Analysis of Big Data products Vikas Sardana Indian Institute of Management, Bangalore
  • 2. Agenda • Background – evolution of data, challenges, products and vendors • Top Big Data Use cases • Case Analysis: Customer Value model for Big Data analytics use case for a mobile advertising network • Conclusion
  • 3. What is Big Data • “Big data refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze.” • “Big data is high-volume, high-velocity and high- variety information assets that demand cost- effective, innovative forms of information processing for enhanced insight and decision making.” Source: Source:
  • 4. Some Sources of Big data • Web and social media • Machine generated data – Radio Frequency Identification, Global Positioning Systems, Phone apps etc. • Biometric data • Human interactions (email, mobile phones, voice mails, call centers)
  • 5. Big Data Challenges • Acquisition • Storage • Processing and Analysis
  • 6. Big Data products • Hadoop platform and tools • NOSQL databases
  • 7. Consumption Models Open Source(Build) Open Source (Buy support) Proprietary(Buy) On Premise Externally hosted (Cloud) Trade-offs • Building requires in-house expertise • On Premise leads to capital expenditure while cloud leads to operational expenses
  • 8. Prominent Vendors • Cloudera • MapR • HortonWorks • IBM • Amazon
  • 9. Top Big Data customer use cases • Predictive analytics Building classification and prediction systems e.g. predicting the buying preferences of customers. • Revenue optimization Pricing in real time based on several factors such as demand, cost, competition e.g. dynamic pricing. This is popular in various verticals esp. airline industry. • Revenue generation Activities to create revenue streams e.g. segmentation and targeting.
  • 10. Top Big Data customer use cases ( Cont. …) • Maximizing human and physical resources • Scientific research in new areas • Fraud detection Detect potential fraud patterns in transactions • Security and crime prevention
  • 11. Gartner’s hype cycle for Big Data - 2012 “Big data has gone into Peak of inflated expectations and is likely to plateau in 2 – 5 years” Is there value for customers … ? - Motivation for this study Source: Gartner
  • 12. Roger’s ACCORD model for diffusion of innovation Dimension Measure Justification Relative Advantage High (Favorable) Big data products have solved many new problems and are far ahead of traditional data management products Compatibility High (Favorable) Most Big Data products use commodity hardware and popular programming languages and hence are highly compatible in the current IT ecosystem Complexity High (Unfavorable) With a different paradigm of parallelism and a bunch of solutions, users need to understand the new ways of processing and storing data. However, it requires simpler programming skills for engineers. Observability Moderate Although Big Data has been popularized but it is a background IT infrastructure. Nevertheless, due to the power of problems it has solved, this has been topic of discussion in various forums Risk High (Unfavorable) It requires considerable investment of resources and energy and is still in its initial years
  • 13. Case Analysis Big Data Analytics use case for a Mobile Advertising Network
  • 14. Research Methodology • Primary research with the buying center • Interviews with business stakeholders and domain experts to understand business requirements and business metrics • Interviews with analytics technology experts to understand system level requirements • Interviews with hardware procurement and planning experts to understand costs and sizing methodologies • Secondary research • Research and analyst reports on Big Data • User manuals of the products for Big Data management • Books, articles and blogs on Big Data technologies and products • Blogs and websites of prominent mobile Ad networks
  • 15. Advertising Network overview Two sided network with advertisers buying ad space on one side and ad publishers selling the space on the other. Image source: www.altitudedigital.com
  • 16. Ad Serving and Click Flows Image source: www.inmobi.com
  • 17. Pricing Model • Cost Per Click (CPC) – Outcome based pricing, advertiser is charged only when the ad is clicked. • Ad network revenue – 50% of the revenue generated from advertisers is appropriated by the ad network and the rest 50% is realized by the publisher.
  • 18. Business Goals Metrics for Ad network Business Goal Metrics Revenue optimization for publishers and self Maximize Click Through Rate (CTR) Help advertisers with campaign planning Accuracy of CTR prediction Help advertisers with campaign optimization through ongoing improvements Accuracy and timeliness of real time reports Help advertisers with campaign analysis later Ability and accuracy for canned and ad hoc reporting Business continuity Availability of reports on a sustained basis Business Problem Ad network has set up its data analytics systems to achieve its business goals but isn’t fairing very well on its performance metrics
  • 19. Functions of data analytics systems * This is a high level functionality detail to highlight the hardware requirements though the actual technical steps are different to process data for real time than those for batch reports. The insights from various analytics and reporting mechanisms help in effective placements and effectiveness of ads.
  • 20. Challenges in data analytics • Accessing the huge volume of data from the ad servers • Preparing huge data for analytics • Analyzing the data at a large scale and providing timely insights
  • 21. Steps for analytics and suitable products Step Big Data offering suitable Other suitable alternatives Data Collection of logs and feeds at a massive scale ( 8 billion collection events per day) Challenges: Burst bandwidth, latency, backlog, operability Technical metrics: Throughput, latency, data loss and reliability, linearly scalable Distributed Log Collectors. e.g. Scribe(Facebook) Flume(Cloudera), Kafka(LinkedIn) Log files transferred through network protocols such as FTP, rsync. Storing the collected data Technical metrics: Throughput, reliability, high availability, durability. HDFS, S3, NOSQL stores Files, databases Processing of data, ETL functions Technical metrics: Throughput, high availability HDFS , Hadoop mapreduce, EMR on Amazon Home grown solutions using scripting languages such as Perl
  • 22. Steps for analytics and suitable products Step Big Data offering suitable Other suitable alternatives BI Reporting Technical metrics: Query latency, data freshness NOSQL Columnar stores, warehouses Traditional row based data warehouses Ad hoc reporting based on historical data Technical metrics: Throughput, latency Hadoop mapreduce, Cloudera Impala, HortonWorks Stinger, Apache Dremel, Greenplum, Netezza, Teradata Relational databases Predictive Analytics Technical metrics: Throughput, latency R, Hadoop map reduce Home grown solution run on Massively Parallel Processing systems running on expensive, specialized hardware.
  • 23. IT Systems architecture using traditional data management products IT Systems architecture using Big Data products
  • 24. Choice of Big data product deployment Open Source(Build) Open Source (Buy support) Proprietary(Bu y) On Premise Externally hosted (Cloud) Decision criteria: Intellectual property A strong technology and intellectual property are key success factors in the mobile ad network and can help them develop a competitive advantage
  • 25. Typical case facts about data generated by Ad Network • Monthly Ad impressions served: 100 billion • Events received per day:10 billion events (An event is triggered at various stages of serving an ad. Some example events: Ad Request and Ad Impression events, User Click events, User Ad Interaction events, Conversion/Acquisition events, and Monetization events) • Average size of data received per event: 1 KB • Data received per day: 10 terabytes (10 billion events X 1 KB of data per event) Source: https://hasgeek.tv/fifthelephant/2012-2/68-the- elephant-that-flew-big-data-analytics-inmobi
  • 26. Stage 1: Data Collection • Traditional solution: Rsync and FTP are the popular tools used to move these logs. With Wide Area Network capacity up to 10 gigabit/sec available, it is easily possible to send 10 terabytes of data per day from machines that produce logs to those that consume them as required but the challenges are:  WAN links are usually weak leads to backlogs on the producer machine.  Consumer systems being down leads to data choking and delay in event delivery.  Duplicate data transfer consumes unnecessarily more bandwidth. • Big Data solution: • Distributed Log Collectors – Few examples: o Apache Flume (Initially built by Cloudera) o Scribe (Facebook) o Kafka (LinkedIn)
  • 27. Technical benefits of using distributed log collectors • Ability to work with distributed producers over WAN, with consumers sitting in local or remote datacenters. • Producers are decoupled from consumers, so consumers can process at their own pace. • Efficient: no duplicate data transfers, uses compression • Reliable and linearly scalable
  • 28. Apache Flume Hardware requirements Image source: http://flume.apache.org
  • 29. No. of agents required Tier 1 agents • Ratio of 1:16 for outer tier Number of tier 1 agents = 100/16 ~ 7 Tier 2 agents • Ratio of 1:4 for inner tier since more data will be pushed in to Tier-2 from Tier-1 Number of tier 2 agents = 7/4 ~ 2 Total agents required = 9
  • 30. Physical storage requirements Calculating the size of physical storage (hard drive) required • Ad server data – 10 terabyte/day • No. of ad servers = 100 • Data per sec. from ad server = 1012/(24*60*60*100) =115 KB • Data to be collected in two hours at this rate = 115 x 60 x 60 x 2 = 828 MB. (Assume expected resolution time for downstream failures is two hours) • Increase by safety margin factor say 1.5 = 828 MB x 1.5 = 1,242 MB • Required File Channel Capacity = 1.2 GB The physical storage capacity requirement is around 1.2 GB.
  • 31. CPU Requirements Multiple sources and sinks can be defined on a given agent based on the event batch size. Larger the batch size, greater the risk of duplication, hence batch size is limited to a max of 2500 events Events per sec. = 10TB/(1KB*24*60*60) = 115 For Agent 1: • Total Exit Batch Size from 16 upstream servers = 16 x 115 = 1840 • No. of sinks to accommodate 1840 events = [ 1840/2500 ] = 1 For Agent2: • Receiving a batch of 1840 events from each of four upstream agents • No. of sinks = [ 1840 * 4 / 2500 ] = 3 Cores = (Sources + Sinks) / 2 For Agent 1, Cores = 1 For Agent 2, Cores = 2
  • 32. Apache Flume Total Hardware Requirements 7 single core machines, each $800 2 dual core machines, each $1000 Total Hardware cost • $5600 + $2000 = $7600
  • 33. Stage 2: Storing the collected data Traditional solution: Network storage as a part of High Performance Computing (HPC) Clusters • Ten times extra overhead than commodity hard drives due to communication requirements within the cluster • Ten times costlier than commodity hardware due to specialized features such as redundant storage, high availability etc. Big Data solution: Hadoop Distributed File System (HDFS) • Low storage cost per byte as compared to other alternatives such as Storage Area Network • Tuned to deliver fast data for Mapreduce workloads up to 2 gigabyte per second. • Data reliability is the primary use case and it has been used by various organizations • Uses commodity hardware – less initial and maintenance cost. • Shares cost with compute layer since it is built into the Hadoop kernel. • Linearly scalable in terms of performance and cost even at very high volume.
  • 34. Storage Requirements and costs Traditional Solution: HPC Network storage • Network storage used with HPC costs $100000 for 100GB of data • For the ad network’s current requirement of 14 Petabytes, cost = $14 M • In order to move to move away from this architecture, there would be a salvage value of 60% of this hardware. Big Data solution • 10TB per day is 30TB physical space (3x replication factor) with a 30% overhead for MR jobs' local space (10 * 3 * 1.30) = 39TB physical space per day • 1.65 hosts per day's worth of data. • For a 1 year retention, storage required = 39 Terabytes X 365 = 14 Petabytes • ~600 hosts • 600 hosts X $5000 per host = $3,000,000 Commodity hardware server configuration: Chipset: 4 X 6 –core Intel Xeon 3GHz Memory: 32GB Operating System: Red Hat Enterprise Linux 5 Network: 2 Gbps (Bonded Network Interface Card) Disk Space: 2TB X 12 JBOD (Just a Bunch of Disks)
  • 35. Stage 3: Data processing and preparation Traditional solution: Scripts (e.g. using Perl scripting language) on High Performance Compute hardware Big Data Solution: Hadoop Mapreduce Benefits of Hadoop Mapreduce over Perl on HPC hardware • Scalable to thousands of nodes, shared nothing • Abstracts complexity of distributed programming • Reduced human resource cost to 0.5X • High availability, fault tolerance • Abstracts cluster functions • High performance esp. for unstructured data on one time processing.
  • 36. Hardware costs for Data Preparation and Processing Traditional Solution: • 10TB /day =121MB/sec. • Average throughput (MB/s) per Node for analytics workload = 1 • Desired throughput per node = 121 • No. of nodes required ~ 120 • Cost = 120 nodes X $5000 per node = $600,000 Big Data solution: • 10TB /day =121MB/sec. • Average throughput (MB/s) per Node for analytics workload = 10 • Desired throughput per node = 121 • No. of nodes required ~ 12 • Cost = 12 nodes X $5000 per node = $60,000
  • 37. Human Resource Cost for Data Preparation and Processing Traditional solution: Complex skillset required to handle distributed computing complexity Estimate: 50 person team @$35000 per person per year Cost: $1750000 Big Data solution: Simpler skillset required as complexities are abstracted from the programmers. Estimate: 50% cost reduction Cost: $875000
  • 38. Stage 4: Analytics – Reporting, Ad hoc and predictive analytics Traditional solution: Row based data warehouses with Structured Query Language Big Data solution: NOSQL column stores No additional hardware costs and similar human resource costs • Big data solutions benefit as the schemas can be modified at a later stage to keep the reports up to date with new type of data. • Optimized for columnar storage and access which are main tasks in analytics
  • 39. Quantification of immediate business benefits S No. Benefit Description Quantum 1 Increase in ad revenue due to better CTR Improved ads will help ad matching algorithms more accurately target the ads to the relevant users with the relevant publishers Estimated CTR increase 5% Corresponding increase in Publisher’s ad revenue 5% Corresponding increase in ad network’s revenue (50% of publisher’s ad revenue) 5% Ad network’s increase in revenue (current rev. $100M) $5 M 2 Increase in ad revenue by enabling advertisers to better plan campaigns Better accuracy in predicting CTR will help advertisers in better campaign planning. This will help improve CTR in turn increasing the revenue for publishers and the ad network Estimated CTR increase 5% Corresponding increase in Publisher’s ad revenue 5% Corresponding increase in ad network’s revenue (50% of publisher’s ad revenue) 5% Ad network’s increase in revenue (current rev. $100M) $5 M
  • 40. Quantification of immediate business benefits Benefit Description Quantum Increase in ad revenue due to better campaign optimization Timely and accurate real time reports will help advertisers do course correction helping further with CTR improvement leading to better ad revenue Estimated CTR increase 5% Corresponding increase in Publisher’s ad revenue 5% Corresponding increase in ad network’s revenue (50% of publisher’s ad revenue) 5% Ad network’s increase in revenue (current rev. $100M) $5 M Increase in ad revenue due to better availability of reports If the ad network provides better continuity to advertisers, they will be willing to pay premium. Estimated premium payment 2% Corresponding increase in ad network’s revenue (50% of publisher’s ad revenue) 2% Ad network’s increase in revenue (current rev. $100M) $2 M Total increase in ad Network’s revenue (1 + 2 + 3 + 4) $14 M
  • 41. Value Element Mapping Points of Parity • Open Source software available and the company can customize and enhance it the way they want. • Support for Java programming language, for which it is easy to hire people and further enhance the software due to abundantly available talent pool Points of Difference • Simpler skillset required for in-house IT experts in case of big data products. • Ability to handle all aspects of big data problems in Big data products unlike traditional data management products. • Linearly scalable - Big data products can work with cheaper hardware and are linearly scalable making them a future proof investment. Points of Contention • Adoption uncertainty Although there is community support among developers to maintain and evolve the Big data open source products which is growing very fast due to the buzz but it is unclear whether it will pick up as good as that in traditional software. • Stability of big data vendors The commercial vendors are mostly newly formed companies though founded by very accomplished people. They are fast gaining traction but it is unclear whether they will be able to sustain for long term. Moreover, since pure play Big Data firms are privately held, their growth and revenues are not clearly known.
  • 42. Customer Value Model Big data products Traditional products (Next Best Alternative – NBA) Benefits $17M Status quo with the existing systems Cost Other than Price (Capex + Annual) in the first year $7600 (Data Collection) + $3M (Storage) + $60K (Processing) +$875000 (Salaries) + $1.5M (Implementation and training) (Already incurred in the existing systems) $14M (Storage) + $600K(Processing) +$1750000 (Salaries) Total Cost $5442600 Sunk cost Value = Benefit - Cost $11557400 No additional value in the existing systems Price Free and Open Source Free and Open Source Delta(Price) 0 Value in Use = Delta(Value) - Delta(Price) $11557400 Effective value in use (for migration to Big Data products) = Value in Use + Salvage value of storage and processing + Salaries saved $22067400 Ignoring the time value of money since the cash flows are considered over a short period i.e. one year. Framework reference: James C. Anderson, James A. Narus, DVR Seshadri
  • 43. Value placeholders (less tangible) Positives • Big data products architecture will be linearly scalable and hence future proof, future data management requirements will be fulfilled by adding incremental cost towards buying commodity hardware. • Customer satisfaction and hence low customer churn due to increased control in their hands for managing their advertisements. • Skillset required for in-house IT experts is simpler in case of big data products and mostly based on popular Java technology. Negatives • Although the above big data products are backed by strong companies and open source communities, these companies and communities are not as strong as the ones for traditional products. • The commercial vendors are mostly newly formed companies but founded by very capable people which are fast gaining traction but it is unclear whether they will be able to sustain for long term.
  • 44. Conclusion • The above case study clearly builds a case for the value proposition of Big Data products • Similarly, big data products are being used extensively across various industries and this value model will help in building a concrete case for Big Data products