SlideShare a Scribd company logo
1 of 34
Download to read offline
Evidence Based Big Data Benchmarking to Improve Business Performance
Virtual BenchLearning
Success Stories on Big Data & Analytics
Chiara Francalanci
Politecnico di Milano
May 28, 2020
Research objectives
5/28/20 DataBench Project - GA Nr 780966 2
• The main objective of our research in Databench is to evaluate the impact of big data
technologies (BDTs) on business performance in key use cases.
• A fundamental output of our work is the industrial and business performance
benchmarks of the use of advanced BDTs in representative use cases.
• We are also providing insights on how technical benchmarking can help to make
informed decisions and maximize benefits, as a input to the design of the Databench
Toolbox and as an aid to the definition of an interpretative framework of the
relationship between technical and business performance.
• We are writing a handbook describing the main industrial and business performance
benchmarks targeted at industrial users and European technology developers.
Summary of data collection and analysis
5/28/20 DataBench Project - GA Nr 780966 3
IDC’s 700+ survey
responses
(technical questions)
700+ use cases from
desk analysis
20+ case studies
Descriptive analysis
Statistical analysis
Clustering
Tagging (same data
schema as IDC’s survey)
Descriptive analysis
Clustering
Qualitative analysis
Interpretive framework
Insights from the survey – descriptive analytics
• Companies mainly analyze and store gigabytes and terabytes of data, while a small
number of companies (less than 10%) deal with petabytes and exabytes.
• Tables and structured data seem to play a prominent role, followed by structured-text
and graph data.
• Currently, descriptive and diagnostic analytics are the most popular types of analytics
among European companies.
• The batch processing approach is most common, and only 16% of companies are
pioneering the management and exploitation of real-time data.
• In the future, companies are planning to move to prescriptive and predictive analytics.
• These results highlight the emerging need to integrate heterogenous data to effectively
exploit all the information gathered by companies.
• The most adopted technical performance metric is data quality.
5/28/20 DataBench Project - GA Nr 780966 4
DataBench Project - GA Nr 780966 5
Desk analysis contribution
DataBench Project - GA Nr 780966 5
ü 703 use cases in total
ü 58 use cases per industry on average
Main sources of information: scientific literature, Web sites/white
papers from IT providers, public documentation from ICT 14-15 projects.
Comparing data from the survey with data from the desk analysis
provides mainstream vs. innovation insights.
A quali-quantitative analysis (tagging) can be found at:
http://131.175.15.19/databench/desk-analysis_17_2_2020.xlsx
RESEARCH
PAPERS
ICT
VENDORS
ICT 14-15
PROJECTS
5/28/20
Insights from the desk analysis
• Use cases from the desk analysis mainly deal with terabytes of data.
• Most use cases are mainly processing data in streaming, as well as iterative/in-memory
processing.
• The most widely used analytics type is by far predictive analytics, while prescriptive,
descriptive and diagnostic analytics are adopted in approximately the 30% of use cases.
• The most widely adopted performance metric seems to be the throughput.
• Data types are primarily tables and structured data, including structured legacy data,
graph and linked data and text and semi-structured data.
• Use cases store and process highly heterogenous data, thus stressing the growing need
and potential for data integration.
5/28/20 DataBench Project - GA Nr 780966 6
Most common use-cases by industry
28/05/2020 DataBench Project - GA Nr 780966 7
Agriculture Crops monitoring Equipment optimization Precision agriculture
Automotive Predictive maintenance Self driving Smart services
Financial Services Fraud detection Risk assessment Targeting
Healthcare Diagnostic Patient monitoring Preventive systems
Manufacturing Predictive maintenance Smart manufacturing
R&D optimization/
Smart design
Retail
Assortment optimization/
Intelligent fulfilment
Price optimization/
Promotions
Targeting
Telecommunication
Churn prediction/
Promotions
Network capacity
optimization
Targeting
Transport & logistics
Churn prediction/
Promotions
Fleet management
Network capacity
optimization
Utilities
Churn prediction/
Promotions
Network capacity
optimization
Personalized fares
Most frequent Business KPI by use-case (1)
28/05/2020 DataBench Project - GA Nr 780966 8
Agriculture
Crops monitoring:
Costs = -10%
Equipment optimization Precision agriculture
Automotive Predictive maintenance Self driving
Smart services:
Costs = -80%
Financial Services
Fraud detection:
Operational Ex. = -80%
Risk assessment
Targeting:
Marketing costs = -35%
TCO costs = -80%
Conversion rate = 10x
Healthcare Diagnostic Patient monitoring Preventive systems
Manufacturing
Predictive maintenance:
Maintenance costs = -30%
Smart manufacturing:
Utilities costs = -20%
Cust. retention = +110%
R&D optimization/
Smart design
Most frequent Business KPI by use-case (2)
28/05/2020 DataBench Project - GA Nr 780966 9
Retail
Assortment optimization/
Intelligent fulfilment
Price optimization/
Promotions:
Conversion rate = 50%
Cust. retention = +14%
Targeting:
Conversion rate = +85%
TCO costs = -15%
Telecommunication
Churn prediction/
Promotions
Network capacity
optimization
Targeting:
Conversion rate = +130%
Transport & logistics
Churn prediction/
Promotions
Fleet management
Network capacity
optimization:
TCO costs = -90%
Utilities
Churn prediction/
Promotions
Network capacity
optimization:
Costs = -20%
Cust. Expenses = -30%
Personalized fares:
Marketing costs = -50%
TCO costs = -50%
Case study analysis methodology
10
First interview Candidate provides documentation
(including blueprints of the IT
infrastructure, schemas of data
sources etc)
DataBench self-assessment toolDataBench industrial needs survey
Classification and selection of potential case
study candidates
Recruitment of the candidates 
Maturity evaluation
Analysis of technical challenges
Analysis of business benefits
On-site visit
Engagement
Use case analysis 
Second interview
Follow-up
5/28/20 DataBench Project - GA Nr 780966
The methodology was tested in the
Whirlpool pilot case study that allowed us
to improve the interview template to
serve important areas of in-depth
analysis:
- Analysis of technical challenges
- Analysis of business benefits
As a consequence of the pilot, we
introduced a second interview to collect
additional information and/or involve
other company profiles.
Interview template
Date / interviewer(s) / interviewee(s)
Company description
Case study description
Data characteristics Volume, velocity, variety and variability
Data sharing and exchange platform use
Data anonymization and privacy needs
Data processing and analytics characteristics Volatility, veracity, monetary value,
visualization, storage, processing, analytics and
machine learning/AI
Big Data specific challenges Short term, long term
Technical benchmark adoption Current, short term, long term
Relevant technical performance metrics
Expected benefits (including business KPIs) Current (measured), short term, long term
5/28/20 DataBench Project - GA Nr 780966 11
General insights (1/2)
• We have evidence of business KPIs for a subset of case studies, evidence is aligned with
results from the survey (business impact is in the 4-8% range).
• From the evidence that has been collected so far, an important lesson learnt is that most
companies believe that technical benchmarking requires highly specialized skills and a
considerable investment. We have found that very few companies have performed an
accurate and extensive benchmarking initiative. In this respect, using cloud solutions grants
them with an easier access to a broader set of technologies that they can experiment with.
• On the other hand, they acknowledge the variety and complexity of technical solutions for
big data and envision the following risks:
• The risk of realizing that they have chosen a technology that proves non scalable
over time, either technically or economically.
• The risk of relying on cloud technologies that might create a lock in and require a
considerable redesign of software to be migrated to other cloud technologies.
• The risk of discovering that cloud services are expensive, especially as a
consequence of scalability, and that technology costs are higher than business
benefits (edge vs. cloud decisions).
5/28/20 DataBench Project - GA Nr 780966 12
General insights (2/2)
• From a technical benchmarking perspective, it is important that
benchmarking is supported with tools that reduce complexity by guiding
users along predefined user journeys towards the identification and
execution of benchmarks.
• Results from previous benchmarking initiatives are also very useful.
• It is important to have cost estimates of individual technologies and end-
to-end solutions, on premises and in cloud to support edge vs. cloud
solutions.
• There exists a growing number of tools designed for a specific use case
(e.g. recommendation systems, markdown optimization systems, sensor
data normalization, etc.), which represent end-to-end off-the-shelf
solutions that are typically outside of the scope of benchmarking.
5/28/20 DataBench Project - GA Nr 780966 13
Business KPIs for case studies
• Intelligent fulfilment: +5% margin
• Recommendation systems: +3/4% margin
• Markdown: -20% promotional investment, +7% revenues, +5% margin
• Yield prediction in agriculture: +10% precision in yield prediction corresponding to +0.3%
profit increase from trading
• Rail transport quality of service: +10% logistic efficiency
• Manufacturing production quality: +5% reduction of quality issues
• New business models in manufacturing: cloud-based prediction maintenance service
5/28/20 14DataBench Project - GA Nr 780966
Yield prediction, Agriculture
5/28/20 15
• A company specialized on earth observation services has designed an innovative yield
prediction machine learning algorithm based on Sentinel and Landsat high-resolution
satellite information.
• The approach was tested with reference to the needs of a company operating in the
financial industry, providing predictions as a support to trading decisions.
• Machine learning algorithms have demonstrated roughly 10% more accurate, supporting
better investment decisions.
• The focus was the production of soy beans and corn in the US.
• The export market of soy beans and corn is aroung 21 billion dollar.
• The delta between projected and actual price was around 3.2%. This figure has been
reduced by 10%, with a corresponding gain from trading around 0.32% on traded
volumes.
DataBench Project - GA Nr 780966
Intelligent fulfilment, Retail
• Retail (grocery) industry
• Based on the idea of using machine learning to optimize assortment
selection and automated fulfilment at an individual shop level
• Complex AI system including machine learning (sales prediction)
• Piloted in one shop: 5% increase of margins (equivalent to roughly 5
million/year)
• Run on Spark in cloud on Amazon, 100 euro per run, per category, per
shop
• There are hundreds of categories, shops, and runs …
• Full deployment of project is currently on hold due to the economic
scalability of IT.
5/28/20 DataBench Project - GA Nr 780966 16
Online recommendation system
• Retail (grocery) industry
• Based on the idea of personalizing recommendations at an individual
customer level
• System designed to increase margins with up-sell and cross-sell
recommendations
• Measured 3-4% impact on margin
• Cross-sell tables too large to be deployed on a hardware appliance on
premises have been simplified to the client segment level
• Less personalized recommendations have been found to generate
1/10th of the clicks
5/28/20 DataBench Project - GA Nr 780966 17
A dangerous shift towards mass market
• Economic scalability issues cause business superficiality
• In turn, business superficiality drives a shift towards mass market
• For example, off-the-shelf recommendation systems suggest products
that are «most frequently purchased», that is mass market, shifting
customer behaviour towards purchasing choices that are less profitable
for the company (the opposite of up-selling)
• Simplifying recommendations by adopting an off-the-shelf
recommendation system is likely to result in a loss of competitiveness
5/28/20 DataBench Project - GA Nr 780966 18
Benchmarking can help reduce costs
Examples:
• Choice of the top performing DB can reduce costs by an order of magnitude
• AI benchmarking
• Choice of the most convenient machine in cloud
28/05/2020 19
Choice of optimal cloud instances
28/05/2020 DataBench Project - GA Nr 780966 20
AWS instance vCPU Memory # instances Price EMR Price Total cost
c5.9xlarge 36 72 2 $ 1.728 $ 0.270 $ 1 166 832.00
r5.xlarge 4 32 3 $ 0.252 $ 0.063 $ 275 940.00
- 76 %
• Intelligent fulfilment
• Targeting
Comparison between the cheapest optimal machine and the most expensive one for two different
use-cases.
AWS instance vCPU Memory # instances Price EMR Price Total cost
c5.4xlarge 16 32 45 $ 0.768 $ 0.170 $ 52 678.08
r5.24xlarge 96 768 2 $ 6.048 $ 0.270 $ 15 769.73
- 70 %
Is the cost issue general?
• Framing big data architecture by defining building blocks
• Design a per-use case architecture by combining building blocks
• Estimating data size and computation time
• Estimating cost in cloud
• We can tell where the cost issue is and what component should be benchmarked
• Prediction model/AI algorithm (e.g. intelligent fulfilment)
• Data preparation/query (e.g. targeting in telephone companies)
• Data management/database massive insert (IoT)
• Data storage (e.g. Satellite data)
5/28/20 21DataBench Project - GA Nr 780966
Integrated Architectural blueprint…
5/28/20 22DataBench Project - GA Nr 780966
…as a unified view of 27 use-case specific
blueprints
5/28/20 23DataBench Project - GA Nr 780966
Example: Intelligent fulfilment blueprint
28/05/2020 DataBench Project - GA Nr 780966 24
Intelligent fulfilment: sizing
28/05/2020 DataBench Project - GA Nr 780966 25
• Receipts per day: 200’000
• Average products per receipt: 7
• This results in 511’000’000 products sold per year in all shops
• Which means approximately 65 GB/year
• Suppose using 5 years of receipts to get better results, we would need at
least 325 GB of memory
Intelligent fulfilment: simulation
28/05/2020 DataBench Project - GA Nr 780966 26
• Four periods considered: weekdays with and without promotions,
weekends with and without promotions.
• Elapsed time for computing the optimal reorder point for each period:
around 60 minutes.
• Executions on Amazon AWS, prices refer to Amazon Ireland data centre.
• The following tables express the annual cost, by using respectively the
last one year and last five years datasets.
Intelligent fulfilment: costs (1 year)
28/05/2020 DataBench Project - GA Nr 780966 27
AWS instance vCPU Memory # instances Price EMR Price Total cost x 200 shops
c5.4xlarge 16 32 3 $ 0.768 $ 0.170 $ 4 108.44 $ 821 688.00
c5.9xlarge 36 72 2 $ 1.728 $ 0.270 $ 5 834.16 $ 1 166 832.00
c5.18xlarge 72 144 1 $ 3.456 $ 0.270 $ 5 439.96 $ 1 087 992.00
m5.2xlarge 8 32 3 $ 0.428 $ 0.096 $ 2 295.12 $ 459 024.00
m5.4xlarge 16 64 2 $ 0.856 $ 0.192 $ 3 060.16 $ 612 032.00
m5.12xlarge 48 192 1 $ 2.568 $ 0.270 $ 4 143.48 $ 828 696.00
r5.xlarge 4 32 3 $ 0.252 $ 0.063 $ 1 379.70 $ 275 940.00
r5.2xlarge 8 64 2 $ 0.504 $ 0.126 $ 1 839.60 $ 367 920.00
r5.4xlarge 16 128 1 $ 1.008 $ 0.252 $ 1 839.60 $ 367 920.00
• Cheapest combination of instances: $ 275 940.00
• Most expensive combination of instances: $ 1 166 832.00
x 4.2
Intelligent fulfilment: costs (5 years)
28/05/2020 DataBench Project - GA Nr 780966 28
AWS instance vCPU Memory # instances Price EMR Price Total cost x 200 shops
c5.4xlarge 16 32 15 $ 0.768 $ 0.170 $ 20 542.20 $ 4 108 440.00
c5.9xlarge 36 72 7 $ 1.728 $ 0.270 $ 20 419.56 $ 4 083 912.00
c5.18xlarge 72 144 4 $ 3.456 $ 0.270 $ 21 759.84 $ 4 351 968.00
m5.2xlarge 8 32 15 $ 0.428 $ 0.096 $ 11 475.60 $ 2 295 120.00
m5.4xlarge 16 64 8 $ 0.856 $ 0.192 $ 12 240.64 $ 2 448 128.00
m5.24xlarge 96 384 2 $ 5.136 $ 0.270 $ 15 785.52 $ 3 157 104.00
r5.xlarge 4 32 15 $ 0.252 $ 0.063 $ 6 898.50 $ 1 379 700.00
r5.2xlarge 8 64 8 $ 0.504 $ 0.126 $ 7 358.40 $ 1 471 680.00
r5.16xlarge 64 512 1 $ 4.032 $ 0.270 $ 6 280.92 $ 1 256 184.00
• Cheapest instances combination: $ 4 351 968.00
• Most expensive instances combination: $ 1 256 184.00
x 3.5
Targeting in telecom industry: schema
28/05/2020 DataBench Project - GA Nr 780966 29
Targeting: results
28/05/2020 DataBench Project - GA Nr 780966 30
• Targeting advantages are:
• Higher campaigns conversion rate
• Up-selling strategies to targeted customers
• Lower customer fatigue
• Targeting reduce the number of calls to customers, hence reducing the number
of call centre operators from 380 to 10.
• It results in cost reduction by over 90%. (From 15M€ to 400K€ yearly)
• This cost reduction enables new marketing models with higher redemption
rate.
Targeting: sizing
28/05/2020 DataBench Project - GA Nr 780966 31
• Number of customers: 5’000’000 (Big companies: 30M)
• Average SMS/MMS per day per customer: 10
• Average Calls per day per customer: 7
• Weekly processing of data to produce aggregated profiles
• 24 hours of processing: 12h for data preparation, 12h for targets
Targeting: costs
28/05/2020 DataBench Project - GA Nr 780966 32
AWS instance vCPU Memory # instances Price EMR Price Total cost
c5.4xlarge 16 32 45 $ 0.768 $ 0.170 $ 52 678.08
c5.18xlarge 72 144 10 $ 3.456 $ 0.270 $ 46 500.48
m5.2xlarge 8 32 45 $ 0.428 $ 0.096 $ 29 427.84
m5.24xlarge 96 384 4 $ 5.136 $ 0.270 $ 26 986.75
r5.xlarge 4 32 45 $ 0.252 $ 0.063 $ 17 690.40
r5.8xlarge 32 256 6 $ 2.016 $ 0.270 $ 17 117.57
r5.24xlarge 96 768 2 $ 6.048 $ 0.270 $ 15 769.73
• Cheapest instances combination: $ 15 760.73
• Most expensive instances combination: $ 52 678.08
x 3.3
On-going work
28/05/2020 DataBench Project - GA Nr 780966 33
• Mapping 300 market technologies on blueprint components
• Mapping 60 benchmarks on market technologies
• Organize Databench knowledge around this knowledge base (in
cooperation with Sintef)
Contacts
28/05/2020 DataBench Project - GA Nr 780966 34
chiara.francalanci@polimi.it
paolo.ravanelli@polimi.it
gianmarco.ruggiero@polimi.it
giulio.costa@polimi.it

More Related Content

What's hot

An innovative software framework and toolkit for process optimization deploye...
An innovative software framework and toolkit for process optimization deploye...An innovative software framework and toolkit for process optimization deploye...
An innovative software framework and toolkit for process optimization deploye...
Sudhendu Rai
 
I Minds2009 Kpi Ware
I Minds2009 Kpi WareI Minds2009 Kpi Ware
I Minds2009 Kpi Ware
imec.archive
 

What's hot (18)

Regulatory Reporting Dashboard
Regulatory Reporting DashboardRegulatory Reporting Dashboard
Regulatory Reporting Dashboard
 
Geographic Analytics - How HP Visualises its Supply Chain
Geographic Analytics - How HP Visualises its Supply ChainGeographic Analytics - How HP Visualises its Supply Chain
Geographic Analytics - How HP Visualises its Supply Chain
 
Building the DataBench Workflow and Architecture
Building the DataBench Workflow and ArchitectureBuilding the DataBench Workflow and Architecture
Building the DataBench Workflow and Architecture
 
Eu research
Eu researchEu research
Eu research
 
Analytics for Water utilities
Analytics for Water utilitiesAnalytics for Water utilities
Analytics for Water utilities
 
Big Data Value in Mobility and Logistics
Big Data Value in Mobility and Logistics Big Data Value in Mobility and Logistics
Big Data Value in Mobility and Logistics
 
UtiliPERFORM - Utility Operational Excellence - Indigo Advisory Group
UtiliPERFORM - Utility Operational Excellence - Indigo Advisory GroupUtiliPERFORM - Utility Operational Excellence - Indigo Advisory Group
UtiliPERFORM - Utility Operational Excellence - Indigo Advisory Group
 
Independent models validation and automation
Independent models validation and automationIndependent models validation and automation
Independent models validation and automation
 
Analytics in the Manufacturing industry
Analytics in the Manufacturing industryAnalytics in the Manufacturing industry
Analytics in the Manufacturing industry
 
Process Mining in Action: Self-service data science for business teams
Process Mining in Action: Self-service data science for business teamsProcess Mining in Action: Self-service data science for business teams
Process Mining in Action: Self-service data science for business teams
 
Claims
ClaimsClaims
Claims
 
markheatonfinalresume
markheatonfinalresumemarkheatonfinalresume
markheatonfinalresume
 
An innovative software framework and toolkit for process optimization deploye...
An innovative software framework and toolkit for process optimization deploye...An innovative software framework and toolkit for process optimization deploye...
An innovative software framework and toolkit for process optimization deploye...
 
IFRS 9 automation and loan pricing
IFRS 9 automation and loan pricingIFRS 9 automation and loan pricing
IFRS 9 automation and loan pricing
 
Solving the data management challenges of FRTB
Solving the data management challenges of FRTBSolving the data management challenges of FRTB
Solving the data management challenges of FRTB
 
I Minds2009 Kpi Ware
I Minds2009 Kpi WareI Minds2009 Kpi Ware
I Minds2009 Kpi Ware
 
IFRS 9 Impairment Analyzer - Brochure
IFRS 9 Impairment Analyzer - BrochureIFRS 9 Impairment Analyzer - Brochure
IFRS 9 Impairment Analyzer - Brochure
 
Bcbs 239 v4 30 oct
Bcbs 239 v4 30 octBcbs 239 v4 30 oct
Bcbs 239 v4 30 oct
 

Similar to Success Stories on Big Data & Analytics

STS. Smarter devices. Smarter test systems.
STS. Smarter devices. Smarter test systems.STS. Smarter devices. Smarter test systems.
STS. Smarter devices. Smarter test systems.
Hank Lydick
 
NI Automated Test Outlook 2016
NI Automated Test Outlook 2016NI Automated Test Outlook 2016
NI Automated Test Outlook 2016
Hank Lydick
 

Similar to Success Stories on Big Data & Analytics (20)

Building a Bridge between Technical and Business Benchmarking, Gabriella Catt...
Building a Bridge between Technical and Business Benchmarking, Gabriella Catt...Building a Bridge between Technical and Business Benchmarking, Gabriella Catt...
Building a Bridge between Technical and Business Benchmarking, Gabriella Catt...
 
Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...
Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...
Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...
 
Relating Big Data Business and Technical Performance Indicators, Barbara Pern...
Relating Big Data Business and Technical Performance Indicators, Barbara Pern...Relating Big Data Business and Technical Performance Indicators, Barbara Pern...
Relating Big Data Business and Technical Performance Indicators, Barbara Pern...
 
Building a Bridge between Technical and Business Benchmarking, Gabriella Catt...
Building a Bridge between Technical and Business Benchmarking, Gabriella Catt...Building a Bridge between Technical and Business Benchmarking, Gabriella Catt...
Building a Bridge between Technical and Business Benchmarking, Gabriella Catt...
 
A Practical Approach for Power Utilities Seeking to Create Sustaining Busines...
A Practical Approach for Power Utilities Seeking to Create Sustaining Busines...A Practical Approach for Power Utilities Seeking to Create Sustaining Busines...
A Practical Approach for Power Utilities Seeking to Create Sustaining Busines...
 
DataBench in a Nutshell - The market: Assessing Industrial Needs, Richard Ste...
DataBench in a Nutshell - The market: Assessing Industrial Needs, Richard Ste...DataBench in a Nutshell - The market: Assessing Industrial Needs, Richard Ste...
DataBench in a Nutshell - The market: Assessing Industrial Needs, Richard Ste...
 
DataBench in a nutshell - The market: Assessing industrial needs, Richard Ste...
DataBench in a nutshell - The market: Assessing industrial needs, Richard Ste...DataBench in a nutshell - The market: Assessing industrial needs, Richard Ste...
DataBench in a nutshell - The market: Assessing industrial needs, Richard Ste...
 
Improving Business Performance Through Big Data Benchmarking, Todor Ivanov, B...
Improving Business Performance Through Big Data Benchmarking, Todor Ivanov, B...Improving Business Performance Through Big Data Benchmarking, Todor Ivanov, B...
Improving Business Performance Through Big Data Benchmarking, Todor Ivanov, B...
 
Pmac It Project Management 2010
Pmac It Project Management 2010Pmac It Project Management 2010
Pmac It Project Management 2010
 
STS. Smarter devices. Smarter test systems.
STS. Smarter devices. Smarter test systems.STS. Smarter devices. Smarter test systems.
STS. Smarter devices. Smarter test systems.
 
NI Automated Test Outlook 2016
NI Automated Test Outlook 2016NI Automated Test Outlook 2016
NI Automated Test Outlook 2016
 
Bitrock manufacturing
Bitrock manufacturing Bitrock manufacturing
Bitrock manufacturing
 
Big Data Benchmarking, Tomas Pariente Lobo, Open Expo Europe, 20/06/2019
Big Data Benchmarking, Tomas Pariente Lobo, Open Expo Europe, 20/06/2019Big Data Benchmarking, Tomas Pariente Lobo, Open Expo Europe, 20/06/2019
Big Data Benchmarking, Tomas Pariente Lobo, Open Expo Europe, 20/06/2019
 
Building the DataBench Workflow and Architecture, Todor Ivanov, Bench 2019 - ...
Building the DataBench Workflow and Architecture, Todor Ivanov, Bench 2019 - ...Building the DataBench Workflow and Architecture, Todor Ivanov, Bench 2019 - ...
Building the DataBench Workflow and Architecture, Todor Ivanov, Bench 2019 - ...
 
The IT Cost Reduction Journey
The IT Cost Reduction JourneyThe IT Cost Reduction Journey
The IT Cost Reduction Journey
 
Oil & Gas Analytic Solution
Oil & Gas Analytic SolutionOil & Gas Analytic Solution
Oil & Gas Analytic Solution
 
Big Data for Product Managers
Big Data for Product ManagersBig Data for Product Managers
Big Data for Product Managers
 
Pentaho Reporting Solution for a Leading Energy Company in US
Pentaho Reporting Solution for a Leading Energy Company in USPentaho Reporting Solution for a Leading Energy Company in US
Pentaho Reporting Solution for a Leading Energy Company in US
 
Telecom product cost models development approach
Telecom product cost models development approachTelecom product cost models development approach
Telecom product cost models development approach
 
IBM Maximo for Utilities
IBM Maximo for UtilitiesIBM Maximo for Utilities
IBM Maximo for Utilities
 

More from DataBench

More from DataBench (17)

Welcome to DataBench
Welcome to DataBenchWelcome to DataBench
Welcome to DataBench
 
Session 1 - The Current Landscape of Big Data Benchmarks
Session 1 - The Current Landscape of Big Data BenchmarksSession 1 - The Current Landscape of Big Data Benchmarks
Session 1 - The Current Landscape of Big Data Benchmarks
 
Session 2 - A Project Perspective on Big Data Architectural Pipelines and Ben...
Session 2 - A Project Perspective on Big Data Architectural Pipelines and Ben...Session 2 - A Project Perspective on Big Data Architectural Pipelines and Ben...
Session 2 - A Project Perspective on Big Data Architectural Pipelines and Ben...
 
Session 3 - The DataBench Framework: A compelling offering to measure the Imp...
Session 3 - The DataBench Framework: A compelling offering to measure the Imp...Session 3 - The DataBench Framework: A compelling offering to measure the Imp...
Session 3 - The DataBench Framework: A compelling offering to measure the Imp...
 
Session 4 - A practical journey on how to use the DataBench Toolbox
Session 4 - A practical journey on how to use the DataBench ToolboxSession 4 - A practical journey on how to use the DataBench Toolbox
Session 4 - A practical journey on how to use the DataBench Toolbox
 
CoreBigBench: Benchmarking Big Data Core Operations
CoreBigBench: Benchmarking Big Data Core OperationsCoreBigBench: Benchmarking Big Data Core Operations
CoreBigBench: Benchmarking Big Data Core Operations
 
DataBench Toolbox in a Nutshell
DataBench Toolbox in a NutshellDataBench Toolbox in a Nutshell
DataBench Toolbox in a Nutshell
 
DataBench Virtual BenchLearning "Success storie on Big Data & Analytics use c...
DataBench Virtual BenchLearning "Success storie on Big Data & Analytics use c...DataBench Virtual BenchLearning "Success storie on Big Data & Analytics use c...
DataBench Virtual BenchLearning "Success storie on Big Data & Analytics use c...
 
DataBench Virtual BenchLearning "Big Data - Benchmark your way to Excellent B...
DataBench Virtual BenchLearning "Big Data - Benchmark your way to Excellent B...DataBench Virtual BenchLearning "Big Data - Benchmark your way to Excellent B...
DataBench Virtual BenchLearning "Big Data - Benchmark your way to Excellent B...
 
DataBench Virtual BenchLearning "Big Data - Benchmark your way to Excellent B...
DataBench Virtual BenchLearning "Big Data - Benchmark your way to Excellent B...DataBench Virtual BenchLearning "Big Data - Benchmark your way to Excellent B...
DataBench Virtual BenchLearning "Big Data - Benchmark your way to Excellent B...
 
DataBench Toolbox Demo, Ivan Martinez, Tomas Pariente Lobo, BDV Meet-Up Riga,...
DataBench Toolbox Demo, Ivan Martinez, Tomas Pariente Lobo, BDV Meet-Up Riga,...DataBench Toolbox Demo, Ivan Martinez, Tomas Pariente Lobo, BDV Meet-Up Riga,...
DataBench Toolbox Demo, Ivan Martinez, Tomas Pariente Lobo, BDV Meet-Up Riga,...
 
DataBench session @ BDV Meet-Up Riga: The case of HOBBIT, 27/06/2019
DataBench session @ BDV Meet-Up Riga: The case of HOBBIT, 27/06/2019DataBench session @ BDV Meet-Up Riga: The case of HOBBIT, 27/06/2019
DataBench session @ BDV Meet-Up Riga: The case of HOBBIT, 27/06/2019
 
Impacts of data-driven AI in business sectors, Richard Stevens, ICT 2018, 05/...
Impacts of data-driven AI in business sectors, Richard Stevens, ICT 2018, 05/...Impacts of data-driven AI in business sectors, Richard Stevens, ICT 2018, 05/...
Impacts of data-driven AI in business sectors, Richard Stevens, ICT 2018, 05/...
 
Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...
Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...
Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...
 
Big Data Technical Benchmarking, Arne Berre, BDVe Webinar series, 09/10/2018
Big Data Technical Benchmarking, Arne Berre, BDVe Webinar series, 09/10/2018 Big Data Technical Benchmarking, Arne Berre, BDVe Webinar series, 09/10/2018
Big Data Technical Benchmarking, Arne Berre, BDVe Webinar series, 09/10/2018
 
Adding Velocity to BigBench, Todor Ivanov, Patrick Bedué, Roberto Zicari, Ahm...
Adding Velocity to BigBench, Todor Ivanov, Patrick Bedué, Roberto Zicari, Ahm...Adding Velocity to BigBench, Todor Ivanov, Patrick Bedué, Roberto Zicari, Ahm...
Adding Velocity to BigBench, Todor Ivanov, Patrick Bedué, Roberto Zicari, Ahm...
 
DataBench - Project fiche
DataBench - Project ficheDataBench - Project fiche
DataBench - Project fiche
 

Recently uploaded

一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
pyhepag
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
pyhepag
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
pyhepag
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
cyebo
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
cyebo
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertainty
RafigAliyev2
 

Recently uploaded (20)

一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptx
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prison
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdf
 
Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptxMALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call
 
Machine Learning for Accident Severity Prediction
Machine Learning for Accident Severity PredictionMachine Learning for Accident Severity Prediction
Machine Learning for Accident Severity Prediction
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp online
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertainty
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdf
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 

Success Stories on Big Data & Analytics

  • 1. Evidence Based Big Data Benchmarking to Improve Business Performance Virtual BenchLearning Success Stories on Big Data & Analytics Chiara Francalanci Politecnico di Milano May 28, 2020
  • 2. Research objectives 5/28/20 DataBench Project - GA Nr 780966 2 • The main objective of our research in Databench is to evaluate the impact of big data technologies (BDTs) on business performance in key use cases. • A fundamental output of our work is the industrial and business performance benchmarks of the use of advanced BDTs in representative use cases. • We are also providing insights on how technical benchmarking can help to make informed decisions and maximize benefits, as a input to the design of the Databench Toolbox and as an aid to the definition of an interpretative framework of the relationship between technical and business performance. • We are writing a handbook describing the main industrial and business performance benchmarks targeted at industrial users and European technology developers.
  • 3. Summary of data collection and analysis 5/28/20 DataBench Project - GA Nr 780966 3 IDC’s 700+ survey responses (technical questions) 700+ use cases from desk analysis 20+ case studies Descriptive analysis Statistical analysis Clustering Tagging (same data schema as IDC’s survey) Descriptive analysis Clustering Qualitative analysis Interpretive framework
  • 4. Insights from the survey – descriptive analytics • Companies mainly analyze and store gigabytes and terabytes of data, while a small number of companies (less than 10%) deal with petabytes and exabytes. • Tables and structured data seem to play a prominent role, followed by structured-text and graph data. • Currently, descriptive and diagnostic analytics are the most popular types of analytics among European companies. • The batch processing approach is most common, and only 16% of companies are pioneering the management and exploitation of real-time data. • In the future, companies are planning to move to prescriptive and predictive analytics. • These results highlight the emerging need to integrate heterogenous data to effectively exploit all the information gathered by companies. • The most adopted technical performance metric is data quality. 5/28/20 DataBench Project - GA Nr 780966 4
  • 5. DataBench Project - GA Nr 780966 5 Desk analysis contribution DataBench Project - GA Nr 780966 5 ü 703 use cases in total ü 58 use cases per industry on average Main sources of information: scientific literature, Web sites/white papers from IT providers, public documentation from ICT 14-15 projects. Comparing data from the survey with data from the desk analysis provides mainstream vs. innovation insights. A quali-quantitative analysis (tagging) can be found at: http://131.175.15.19/databench/desk-analysis_17_2_2020.xlsx RESEARCH PAPERS ICT VENDORS ICT 14-15 PROJECTS 5/28/20
  • 6. Insights from the desk analysis • Use cases from the desk analysis mainly deal with terabytes of data. • Most use cases are mainly processing data in streaming, as well as iterative/in-memory processing. • The most widely used analytics type is by far predictive analytics, while prescriptive, descriptive and diagnostic analytics are adopted in approximately the 30% of use cases. • The most widely adopted performance metric seems to be the throughput. • Data types are primarily tables and structured data, including structured legacy data, graph and linked data and text and semi-structured data. • Use cases store and process highly heterogenous data, thus stressing the growing need and potential for data integration. 5/28/20 DataBench Project - GA Nr 780966 6
  • 7. Most common use-cases by industry 28/05/2020 DataBench Project - GA Nr 780966 7 Agriculture Crops monitoring Equipment optimization Precision agriculture Automotive Predictive maintenance Self driving Smart services Financial Services Fraud detection Risk assessment Targeting Healthcare Diagnostic Patient monitoring Preventive systems Manufacturing Predictive maintenance Smart manufacturing R&D optimization/ Smart design Retail Assortment optimization/ Intelligent fulfilment Price optimization/ Promotions Targeting Telecommunication Churn prediction/ Promotions Network capacity optimization Targeting Transport & logistics Churn prediction/ Promotions Fleet management Network capacity optimization Utilities Churn prediction/ Promotions Network capacity optimization Personalized fares
  • 8. Most frequent Business KPI by use-case (1) 28/05/2020 DataBench Project - GA Nr 780966 8 Agriculture Crops monitoring: Costs = -10% Equipment optimization Precision agriculture Automotive Predictive maintenance Self driving Smart services: Costs = -80% Financial Services Fraud detection: Operational Ex. = -80% Risk assessment Targeting: Marketing costs = -35% TCO costs = -80% Conversion rate = 10x Healthcare Diagnostic Patient monitoring Preventive systems Manufacturing Predictive maintenance: Maintenance costs = -30% Smart manufacturing: Utilities costs = -20% Cust. retention = +110% R&D optimization/ Smart design
  • 9. Most frequent Business KPI by use-case (2) 28/05/2020 DataBench Project - GA Nr 780966 9 Retail Assortment optimization/ Intelligent fulfilment Price optimization/ Promotions: Conversion rate = 50% Cust. retention = +14% Targeting: Conversion rate = +85% TCO costs = -15% Telecommunication Churn prediction/ Promotions Network capacity optimization Targeting: Conversion rate = +130% Transport & logistics Churn prediction/ Promotions Fleet management Network capacity optimization: TCO costs = -90% Utilities Churn prediction/ Promotions Network capacity optimization: Costs = -20% Cust. Expenses = -30% Personalized fares: Marketing costs = -50% TCO costs = -50%
  • 10. Case study analysis methodology 10 First interview Candidate provides documentation (including blueprints of the IT infrastructure, schemas of data sources etc) DataBench self-assessment toolDataBench industrial needs survey Classification and selection of potential case study candidates Recruitment of the candidates  Maturity evaluation Analysis of technical challenges Analysis of business benefits On-site visit Engagement Use case analysis  Second interview Follow-up 5/28/20 DataBench Project - GA Nr 780966 The methodology was tested in the Whirlpool pilot case study that allowed us to improve the interview template to serve important areas of in-depth analysis: - Analysis of technical challenges - Analysis of business benefits As a consequence of the pilot, we introduced a second interview to collect additional information and/or involve other company profiles.
  • 11. Interview template Date / interviewer(s) / interviewee(s) Company description Case study description Data characteristics Volume, velocity, variety and variability Data sharing and exchange platform use Data anonymization and privacy needs Data processing and analytics characteristics Volatility, veracity, monetary value, visualization, storage, processing, analytics and machine learning/AI Big Data specific challenges Short term, long term Technical benchmark adoption Current, short term, long term Relevant technical performance metrics Expected benefits (including business KPIs) Current (measured), short term, long term 5/28/20 DataBench Project - GA Nr 780966 11
  • 12. General insights (1/2) • We have evidence of business KPIs for a subset of case studies, evidence is aligned with results from the survey (business impact is in the 4-8% range). • From the evidence that has been collected so far, an important lesson learnt is that most companies believe that technical benchmarking requires highly specialized skills and a considerable investment. We have found that very few companies have performed an accurate and extensive benchmarking initiative. In this respect, using cloud solutions grants them with an easier access to a broader set of technologies that they can experiment with. • On the other hand, they acknowledge the variety and complexity of technical solutions for big data and envision the following risks: • The risk of realizing that they have chosen a technology that proves non scalable over time, either technically or economically. • The risk of relying on cloud technologies that might create a lock in and require a considerable redesign of software to be migrated to other cloud technologies. • The risk of discovering that cloud services are expensive, especially as a consequence of scalability, and that technology costs are higher than business benefits (edge vs. cloud decisions). 5/28/20 DataBench Project - GA Nr 780966 12
  • 13. General insights (2/2) • From a technical benchmarking perspective, it is important that benchmarking is supported with tools that reduce complexity by guiding users along predefined user journeys towards the identification and execution of benchmarks. • Results from previous benchmarking initiatives are also very useful. • It is important to have cost estimates of individual technologies and end- to-end solutions, on premises and in cloud to support edge vs. cloud solutions. • There exists a growing number of tools designed for a specific use case (e.g. recommendation systems, markdown optimization systems, sensor data normalization, etc.), which represent end-to-end off-the-shelf solutions that are typically outside of the scope of benchmarking. 5/28/20 DataBench Project - GA Nr 780966 13
  • 14. Business KPIs for case studies • Intelligent fulfilment: +5% margin • Recommendation systems: +3/4% margin • Markdown: -20% promotional investment, +7% revenues, +5% margin • Yield prediction in agriculture: +10% precision in yield prediction corresponding to +0.3% profit increase from trading • Rail transport quality of service: +10% logistic efficiency • Manufacturing production quality: +5% reduction of quality issues • New business models in manufacturing: cloud-based prediction maintenance service 5/28/20 14DataBench Project - GA Nr 780966
  • 15. Yield prediction, Agriculture 5/28/20 15 • A company specialized on earth observation services has designed an innovative yield prediction machine learning algorithm based on Sentinel and Landsat high-resolution satellite information. • The approach was tested with reference to the needs of a company operating in the financial industry, providing predictions as a support to trading decisions. • Machine learning algorithms have demonstrated roughly 10% more accurate, supporting better investment decisions. • The focus was the production of soy beans and corn in the US. • The export market of soy beans and corn is aroung 21 billion dollar. • The delta between projected and actual price was around 3.2%. This figure has been reduced by 10%, with a corresponding gain from trading around 0.32% on traded volumes. DataBench Project - GA Nr 780966
  • 16. Intelligent fulfilment, Retail • Retail (grocery) industry • Based on the idea of using machine learning to optimize assortment selection and automated fulfilment at an individual shop level • Complex AI system including machine learning (sales prediction) • Piloted in one shop: 5% increase of margins (equivalent to roughly 5 million/year) • Run on Spark in cloud on Amazon, 100 euro per run, per category, per shop • There are hundreds of categories, shops, and runs … • Full deployment of project is currently on hold due to the economic scalability of IT. 5/28/20 DataBench Project - GA Nr 780966 16
  • 17. Online recommendation system • Retail (grocery) industry • Based on the idea of personalizing recommendations at an individual customer level • System designed to increase margins with up-sell and cross-sell recommendations • Measured 3-4% impact on margin • Cross-sell tables too large to be deployed on a hardware appliance on premises have been simplified to the client segment level • Less personalized recommendations have been found to generate 1/10th of the clicks 5/28/20 DataBench Project - GA Nr 780966 17
  • 18. A dangerous shift towards mass market • Economic scalability issues cause business superficiality • In turn, business superficiality drives a shift towards mass market • For example, off-the-shelf recommendation systems suggest products that are «most frequently purchased», that is mass market, shifting customer behaviour towards purchasing choices that are less profitable for the company (the opposite of up-selling) • Simplifying recommendations by adopting an off-the-shelf recommendation system is likely to result in a loss of competitiveness 5/28/20 DataBench Project - GA Nr 780966 18
  • 19. Benchmarking can help reduce costs Examples: • Choice of the top performing DB can reduce costs by an order of magnitude • AI benchmarking • Choice of the most convenient machine in cloud 28/05/2020 19
  • 20. Choice of optimal cloud instances 28/05/2020 DataBench Project - GA Nr 780966 20 AWS instance vCPU Memory # instances Price EMR Price Total cost c5.9xlarge 36 72 2 $ 1.728 $ 0.270 $ 1 166 832.00 r5.xlarge 4 32 3 $ 0.252 $ 0.063 $ 275 940.00 - 76 % • Intelligent fulfilment • Targeting Comparison between the cheapest optimal machine and the most expensive one for two different use-cases. AWS instance vCPU Memory # instances Price EMR Price Total cost c5.4xlarge 16 32 45 $ 0.768 $ 0.170 $ 52 678.08 r5.24xlarge 96 768 2 $ 6.048 $ 0.270 $ 15 769.73 - 70 %
  • 21. Is the cost issue general? • Framing big data architecture by defining building blocks • Design a per-use case architecture by combining building blocks • Estimating data size and computation time • Estimating cost in cloud • We can tell where the cost issue is and what component should be benchmarked • Prediction model/AI algorithm (e.g. intelligent fulfilment) • Data preparation/query (e.g. targeting in telephone companies) • Data management/database massive insert (IoT) • Data storage (e.g. Satellite data) 5/28/20 21DataBench Project - GA Nr 780966
  • 22. Integrated Architectural blueprint… 5/28/20 22DataBench Project - GA Nr 780966
  • 23. …as a unified view of 27 use-case specific blueprints 5/28/20 23DataBench Project - GA Nr 780966
  • 24. Example: Intelligent fulfilment blueprint 28/05/2020 DataBench Project - GA Nr 780966 24
  • 25. Intelligent fulfilment: sizing 28/05/2020 DataBench Project - GA Nr 780966 25 • Receipts per day: 200’000 • Average products per receipt: 7 • This results in 511’000’000 products sold per year in all shops • Which means approximately 65 GB/year • Suppose using 5 years of receipts to get better results, we would need at least 325 GB of memory
  • 26. Intelligent fulfilment: simulation 28/05/2020 DataBench Project - GA Nr 780966 26 • Four periods considered: weekdays with and without promotions, weekends with and without promotions. • Elapsed time for computing the optimal reorder point for each period: around 60 minutes. • Executions on Amazon AWS, prices refer to Amazon Ireland data centre. • The following tables express the annual cost, by using respectively the last one year and last five years datasets.
  • 27. Intelligent fulfilment: costs (1 year) 28/05/2020 DataBench Project - GA Nr 780966 27 AWS instance vCPU Memory # instances Price EMR Price Total cost x 200 shops c5.4xlarge 16 32 3 $ 0.768 $ 0.170 $ 4 108.44 $ 821 688.00 c5.9xlarge 36 72 2 $ 1.728 $ 0.270 $ 5 834.16 $ 1 166 832.00 c5.18xlarge 72 144 1 $ 3.456 $ 0.270 $ 5 439.96 $ 1 087 992.00 m5.2xlarge 8 32 3 $ 0.428 $ 0.096 $ 2 295.12 $ 459 024.00 m5.4xlarge 16 64 2 $ 0.856 $ 0.192 $ 3 060.16 $ 612 032.00 m5.12xlarge 48 192 1 $ 2.568 $ 0.270 $ 4 143.48 $ 828 696.00 r5.xlarge 4 32 3 $ 0.252 $ 0.063 $ 1 379.70 $ 275 940.00 r5.2xlarge 8 64 2 $ 0.504 $ 0.126 $ 1 839.60 $ 367 920.00 r5.4xlarge 16 128 1 $ 1.008 $ 0.252 $ 1 839.60 $ 367 920.00 • Cheapest combination of instances: $ 275 940.00 • Most expensive combination of instances: $ 1 166 832.00 x 4.2
  • 28. Intelligent fulfilment: costs (5 years) 28/05/2020 DataBench Project - GA Nr 780966 28 AWS instance vCPU Memory # instances Price EMR Price Total cost x 200 shops c5.4xlarge 16 32 15 $ 0.768 $ 0.170 $ 20 542.20 $ 4 108 440.00 c5.9xlarge 36 72 7 $ 1.728 $ 0.270 $ 20 419.56 $ 4 083 912.00 c5.18xlarge 72 144 4 $ 3.456 $ 0.270 $ 21 759.84 $ 4 351 968.00 m5.2xlarge 8 32 15 $ 0.428 $ 0.096 $ 11 475.60 $ 2 295 120.00 m5.4xlarge 16 64 8 $ 0.856 $ 0.192 $ 12 240.64 $ 2 448 128.00 m5.24xlarge 96 384 2 $ 5.136 $ 0.270 $ 15 785.52 $ 3 157 104.00 r5.xlarge 4 32 15 $ 0.252 $ 0.063 $ 6 898.50 $ 1 379 700.00 r5.2xlarge 8 64 8 $ 0.504 $ 0.126 $ 7 358.40 $ 1 471 680.00 r5.16xlarge 64 512 1 $ 4.032 $ 0.270 $ 6 280.92 $ 1 256 184.00 • Cheapest instances combination: $ 4 351 968.00 • Most expensive instances combination: $ 1 256 184.00 x 3.5
  • 29. Targeting in telecom industry: schema 28/05/2020 DataBench Project - GA Nr 780966 29
  • 30. Targeting: results 28/05/2020 DataBench Project - GA Nr 780966 30 • Targeting advantages are: • Higher campaigns conversion rate • Up-selling strategies to targeted customers • Lower customer fatigue • Targeting reduce the number of calls to customers, hence reducing the number of call centre operators from 380 to 10. • It results in cost reduction by over 90%. (From 15M€ to 400K€ yearly) • This cost reduction enables new marketing models with higher redemption rate.
  • 31. Targeting: sizing 28/05/2020 DataBench Project - GA Nr 780966 31 • Number of customers: 5’000’000 (Big companies: 30M) • Average SMS/MMS per day per customer: 10 • Average Calls per day per customer: 7 • Weekly processing of data to produce aggregated profiles • 24 hours of processing: 12h for data preparation, 12h for targets
  • 32. Targeting: costs 28/05/2020 DataBench Project - GA Nr 780966 32 AWS instance vCPU Memory # instances Price EMR Price Total cost c5.4xlarge 16 32 45 $ 0.768 $ 0.170 $ 52 678.08 c5.18xlarge 72 144 10 $ 3.456 $ 0.270 $ 46 500.48 m5.2xlarge 8 32 45 $ 0.428 $ 0.096 $ 29 427.84 m5.24xlarge 96 384 4 $ 5.136 $ 0.270 $ 26 986.75 r5.xlarge 4 32 45 $ 0.252 $ 0.063 $ 17 690.40 r5.8xlarge 32 256 6 $ 2.016 $ 0.270 $ 17 117.57 r5.24xlarge 96 768 2 $ 6.048 $ 0.270 $ 15 769.73 • Cheapest instances combination: $ 15 760.73 • Most expensive instances combination: $ 52 678.08 x 3.3
  • 33. On-going work 28/05/2020 DataBench Project - GA Nr 780966 33 • Mapping 300 market technologies on blueprint components • Mapping 60 benchmarks on market technologies • Organize Databench knowledge around this knowledge base (in cooperation with Sintef)
  • 34. Contacts 28/05/2020 DataBench Project - GA Nr 780966 34 chiara.francalanci@polimi.it paolo.ravanelli@polimi.it gianmarco.ruggiero@polimi.it giulio.costa@polimi.it