This presentation was given by Prof. Chiara Francalanci from Politecnico di Milano during the second Virtual BenchLearning organised by the H2020 DataBench project.
BDVe Webinar Series: DataBench – Benchmarking Big Data. Gabriella Cattaneo. T...Big Data Value Association
This webinar presents the DataBench project. Gabriella Cattaneo (IDC) will provide ideas on how big data benchmarking could help organizations to get better business insights and take informed decisions.
Predictive analytics are increasingly a must-have competitive tool. A well-defined workflow and effective decision modeling approach ensures that the right predictive analytic models get built and deployed.
By applying engineering analytics across the business, manufacturers can reimagine how they design, produce and deliver new products and services that resonate with customer needs and preferences.
BDVe Webinar Series: DataBench – Benchmarking Big Data. Gabriella Cattaneo. T...Big Data Value Association
This webinar presents the DataBench project. Gabriella Cattaneo (IDC) will provide ideas on how big data benchmarking could help organizations to get better business insights and take informed decisions.
Predictive analytics are increasingly a must-have competitive tool. A well-defined workflow and effective decision modeling approach ensures that the right predictive analytic models get built and deployed.
By applying engineering analytics across the business, manufacturers can reimagine how they design, produce and deliver new products and services that resonate with customer needs and preferences.
In this new Accenture Finance & Risk presentation we explore how our Regulatory Reporting Dashboard and offerings can help clients create greater efficiencies in their financial reporting process.
For more on regulatory reporting, view the presentation "User Defined Tools": accntu.re/2qAJBaO
For more information about Accenture Finance & Risk Practice, visit bit.ly/2j2JD6X
Geographic Analytics - How HP Visualises its Supply ChainNUS-ISS
This is an article by Dr Jozo Acksteiner and Ms Claudia Trautmann on Geographical Analytics. Dr Acksteiner was a speaker at ISS Seminar: Supply Chain Analytics on 20 Nov 2013
Building the DataBench Workflow and Architecturet_ivanov
In the era of Big Data and AI, it is challenging to know all technical and business advantages of the emerging technologies. The goal of DataBench is to design a benchmarking process helping organizations developing Big Data Technologies (BDT) to reach for excellence and constantly improve their performance, by measuring their technology development activity against parameters of high business relevance. This paper focuses on the internals of the DataBench framework and presents our methodological workflow and framework architecture.
Big data is expected to have a profound economic and societal impact in mobility and logistics. Examples include 500 billion USD in value worldwide in the form of time and fuel savings, and savings of 380 megatons CO2. With freight transport activities projected to increase by 40% in 2030, transforming the current mobility and logistics processes to become significantly more efficient, will have a profound impact. A 10% efficiency improvement may lead to EU cost savings of 100 billion EUR. This keynote will highlight the key value dimensions for big data in mobility and logistics. The talk will present examples from the EU Horizon 2020 lighthouse project TransformingTransport, demonstrating the transformations that big data can bring to the mobility and logistics sector. TransformingTransport addresses 13 pilots in seven highly relevant pilot domains within mobility and transport that will benefit from big data solutions and the increased availability of data. The talk will close with an outlook on barriers and future opportunities.
Independent models validation and automationSohail_farooq
This deck describes our service approach for independent third party validation of risk and capital models, and automation of validation tests for 2nd and 3rd lines of defense.
The rest of this deck is organized as follows:
Independent validation and description of validation tests and reporting
Automation of validation tests and a case study on cost savings
Process Mining in Action: Self-service data science for business teamsMarlon Dumas
Talk delivered at University of Tartu's Data Science Seminar, 17 February 2021. The talk explains the role of process mining as a self-service data analytics technology for business teams.
An innovative software framework and toolkit for process optimization deploye...Sudhendu Rai
Many business enterprises outsource the management of processes and technology that they consider non-core to third-party service providers. The challenge for the service providers is to leverage their expertise to deliver managed services that more efficient, productive and profitable. Examples include IT infrastructure management (IBM, HP), print services management (Xerox, Ricoh, Pitney Bowes), food services (Aramark, Compass Group, Sodexo) and others. Many traditional product companies have increasingly diversified as service providers of this type and rely on a combination of people and technology for services delivery.
The capability and expertise to deliver high quality optimized processes on a large scale without incurring high costs is a key imperative for these companies. This is traditionally achieved through standardization of processes, capture and dissemination of best practices and domain knowledge and structured training programs. Process optimization technologies such as discrete-event simulation, stochastic process modeling and advanced analytics have traditionally been the forte of expert (and often high-paid) consultants which has limited their broad use in these services industries primarily due to cost constraints.
This talk will describe how process optimization solutions were developed and delivered on a large scale to the Xerox document production outsourcing-services business. Early on, extensive consulting engagements with end-customers were used to abstract and generalize the process optimization problem. Then the various steps of the process optimization problem such as data collection, statistical analysis, simulation and modeling, optimization, scheduling and monitoring were abstracted and modeled. Technology and algorithms for each step were developed, refined, automated, integrated and then encapsulated in an easy-to-use software toolkit that supported a structured customer engagement process by less-skilled delivery personnel. The delivery of process optimization services using these automated tools that encapsulate advanced analytics, process modeling, optimization and scheduling techniques has enabled significant savings and improvement in customer satisfaction for the document production outsourcing-services business. The talk will conclude with a discussion of lessons learnt and next steps to further automate and simplify the services delivery process.
Banks and credit unions continue to be challenged by the complexity of IFRS 9. The technical complexity challenge is compounded by associated high cost of managing and running IFRS 9 models.
Developing IFRS 9 analytics and applying them in day-to-day business can provide an important competitive advantage.
To realize these benefits, and to address computational complexity under IFRS 9, BankingBook Analytics has developed an automated solution.
Solving the data management challenges of FRTBLeigh Hill
Despite the compliance deadline of the Fundamental Review of the Trading Book (FRTB) being pushed back to January 2022 by many national regulators, the time to address the data management challenges of the regulation is now. The webinar will discuss key elements of FRTB – including risk models, liquidity horizons and data sourcing for risk calculations, back testing and hedging – and consider the data management challenges these present. It will also identify solutions to the challenges and consider how best they can be implemented and to what advantage.
Join the webinar to find out about:
-FRTB requirements
-Data management challenges
-Technology solutions
-Best practice implementation
Benefits beyond compliance
BankingBook Analytics develops risk and capital modelling applications for financial institutions. These applications are engineered for regulatory compliance and effective business management.
Whether you are looking for a brand new IFRS 9 solution, would like to migrate your existing Excel model to a more secure, web-based environment, or apply IFRS 9 analytics for pricing decisions, BBA’s uncomplicated IFRS 9 Impairment Analyzer addresses these requirements. Developed by a team of quants, reviewed by independent accountants, this solution seamlessly transforms fit-for-purpose models into a powerful business-as-usual solution. Our solution accepts rated and unrated obligors’ as input.
In this new Accenture Finance & Risk presentation we explore how our Regulatory Reporting Dashboard and offerings can help clients create greater efficiencies in their financial reporting process.
For more on regulatory reporting, view the presentation "User Defined Tools": accntu.re/2qAJBaO
For more information about Accenture Finance & Risk Practice, visit bit.ly/2j2JD6X
Geographic Analytics - How HP Visualises its Supply ChainNUS-ISS
This is an article by Dr Jozo Acksteiner and Ms Claudia Trautmann on Geographical Analytics. Dr Acksteiner was a speaker at ISS Seminar: Supply Chain Analytics on 20 Nov 2013
Building the DataBench Workflow and Architecturet_ivanov
In the era of Big Data and AI, it is challenging to know all technical and business advantages of the emerging technologies. The goal of DataBench is to design a benchmarking process helping organizations developing Big Data Technologies (BDT) to reach for excellence and constantly improve their performance, by measuring their technology development activity against parameters of high business relevance. This paper focuses on the internals of the DataBench framework and presents our methodological workflow and framework architecture.
Big data is expected to have a profound economic and societal impact in mobility and logistics. Examples include 500 billion USD in value worldwide in the form of time and fuel savings, and savings of 380 megatons CO2. With freight transport activities projected to increase by 40% in 2030, transforming the current mobility and logistics processes to become significantly more efficient, will have a profound impact. A 10% efficiency improvement may lead to EU cost savings of 100 billion EUR. This keynote will highlight the key value dimensions for big data in mobility and logistics. The talk will present examples from the EU Horizon 2020 lighthouse project TransformingTransport, demonstrating the transformations that big data can bring to the mobility and logistics sector. TransformingTransport addresses 13 pilots in seven highly relevant pilot domains within mobility and transport that will benefit from big data solutions and the increased availability of data. The talk will close with an outlook on barriers and future opportunities.
Independent models validation and automationSohail_farooq
This deck describes our service approach for independent third party validation of risk and capital models, and automation of validation tests for 2nd and 3rd lines of defense.
The rest of this deck is organized as follows:
Independent validation and description of validation tests and reporting
Automation of validation tests and a case study on cost savings
Process Mining in Action: Self-service data science for business teamsMarlon Dumas
Talk delivered at University of Tartu's Data Science Seminar, 17 February 2021. The talk explains the role of process mining as a self-service data analytics technology for business teams.
An innovative software framework and toolkit for process optimization deploye...Sudhendu Rai
Many business enterprises outsource the management of processes and technology that they consider non-core to third-party service providers. The challenge for the service providers is to leverage their expertise to deliver managed services that more efficient, productive and profitable. Examples include IT infrastructure management (IBM, HP), print services management (Xerox, Ricoh, Pitney Bowes), food services (Aramark, Compass Group, Sodexo) and others. Many traditional product companies have increasingly diversified as service providers of this type and rely on a combination of people and technology for services delivery.
The capability and expertise to deliver high quality optimized processes on a large scale without incurring high costs is a key imperative for these companies. This is traditionally achieved through standardization of processes, capture and dissemination of best practices and domain knowledge and structured training programs. Process optimization technologies such as discrete-event simulation, stochastic process modeling and advanced analytics have traditionally been the forte of expert (and often high-paid) consultants which has limited their broad use in these services industries primarily due to cost constraints.
This talk will describe how process optimization solutions were developed and delivered on a large scale to the Xerox document production outsourcing-services business. Early on, extensive consulting engagements with end-customers were used to abstract and generalize the process optimization problem. Then the various steps of the process optimization problem such as data collection, statistical analysis, simulation and modeling, optimization, scheduling and monitoring were abstracted and modeled. Technology and algorithms for each step were developed, refined, automated, integrated and then encapsulated in an easy-to-use software toolkit that supported a structured customer engagement process by less-skilled delivery personnel. The delivery of process optimization services using these automated tools that encapsulate advanced analytics, process modeling, optimization and scheduling techniques has enabled significant savings and improvement in customer satisfaction for the document production outsourcing-services business. The talk will conclude with a discussion of lessons learnt and next steps to further automate and simplify the services delivery process.
Banks and credit unions continue to be challenged by the complexity of IFRS 9. The technical complexity challenge is compounded by associated high cost of managing and running IFRS 9 models.
Developing IFRS 9 analytics and applying them in day-to-day business can provide an important competitive advantage.
To realize these benefits, and to address computational complexity under IFRS 9, BankingBook Analytics has developed an automated solution.
Solving the data management challenges of FRTBLeigh Hill
Despite the compliance deadline of the Fundamental Review of the Trading Book (FRTB) being pushed back to January 2022 by many national regulators, the time to address the data management challenges of the regulation is now. The webinar will discuss key elements of FRTB – including risk models, liquidity horizons and data sourcing for risk calculations, back testing and hedging – and consider the data management challenges these present. It will also identify solutions to the challenges and consider how best they can be implemented and to what advantage.
Join the webinar to find out about:
-FRTB requirements
-Data management challenges
-Technology solutions
-Best practice implementation
Benefits beyond compliance
BankingBook Analytics develops risk and capital modelling applications for financial institutions. These applications are engineered for regulatory compliance and effective business management.
Whether you are looking for a brand new IFRS 9 solution, would like to migrate your existing Excel model to a more secure, web-based environment, or apply IFRS 9 analytics for pricing decisions, BBA’s uncomplicated IFRS 9 Impairment Analyzer addresses these requirements. Developed by a team of quants, reviewed by independent accountants, this solution seamlessly transforms fit-for-purpose models into a powerful business-as-usual solution. Our solution accepts rated and unrated obligors’ as input.
A Practical Approach for Power Utilities Seeking to Create Sustaining Busines...Cognizant
For power utilities, analytics are a key to enhanced operational performance and competitive standing. We offer a roadmap for determining and prioritizing relevant analytics, assessing analytics maturity, and implementing an effective analytics process encompassing smart meters, phasor measurement units and other useful sources.
It is an analytic solution for an oil and gas industry. This will provide the business intelligence solution for the industry business for management and deep insights.
Check out this presentation from Pentaho and ESRG to learn why product managers should understand Big Data and hear about real-life products that have been elevated with these innovative technologies.
Learn more in the brief that inspired the presentation, Product Innovation with Big Data: http://www.pentaho.com/resources/whitepaper/product-innovation-big-data
Telecom product cost models development approachParcus Group
Presentation to Pacific Islands Telecom Association (PITA) AGM and Conference in Tahiti 2016 on telecom businesses cases and product cost models development approach.
DataBench Virtual BenchLearning "Big Data - Benchmark your way to Excellent B...DataBench
This presentation was given by Gabriella Cattaneo and Erica Spinoni from IDC during the first Virtual BenchLearning organised by the H2020 DataBench project.
DataBench Virtual BenchLearning "Big Data - Benchmark your way to Excellent B...DataBench
This presentation is the introduction to the first DataBench Virtual BenchLearning organised by the H2020 DataBench project and held on April 29, 2020.
DataBench is an EU H2020 Research & Innovation Action providing EU organisations with evidence based Big Data Benchmarks to improve Business Performance. DataBench will investigate existing Big Data benchmarking tools and projects, identify the main gaps and provide a robust set of metrics to compare technical results coming from those tools. The project will liaise closely with the BDVA, ICT-14 and 15 projects to build consensus and to reach out to key industrial communities, to ensure that benchmarking responds to real needs and problems, and will bring together Research, Academia and industry with the aim to establish the Big Data Benchmarking Community.
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Success Stories on Big Data & Analytics
1. Evidence Based Big Data Benchmarking to Improve Business Performance
Virtual BenchLearning
Success Stories on Big Data & Analytics
Chiara Francalanci
Politecnico di Milano
May 28, 2020
2. Research objectives
5/28/20 DataBench Project - GA Nr 780966 2
• The main objective of our research in Databench is to evaluate the impact of big data
technologies (BDTs) on business performance in key use cases.
• A fundamental output of our work is the industrial and business performance
benchmarks of the use of advanced BDTs in representative use cases.
• We are also providing insights on how technical benchmarking can help to make
informed decisions and maximize benefits, as a input to the design of the Databench
Toolbox and as an aid to the definition of an interpretative framework of the
relationship between technical and business performance.
• We are writing a handbook describing the main industrial and business performance
benchmarks targeted at industrial users and European technology developers.
3. Summary of data collection and analysis
5/28/20 DataBench Project - GA Nr 780966 3
IDC’s 700+ survey
responses
(technical questions)
700+ use cases from
desk analysis
20+ case studies
Descriptive analysis
Statistical analysis
Clustering
Tagging (same data
schema as IDC’s survey)
Descriptive analysis
Clustering
Qualitative analysis
Interpretive framework
4. Insights from the survey – descriptive analytics
• Companies mainly analyze and store gigabytes and terabytes of data, while a small
number of companies (less than 10%) deal with petabytes and exabytes.
• Tables and structured data seem to play a prominent role, followed by structured-text
and graph data.
• Currently, descriptive and diagnostic analytics are the most popular types of analytics
among European companies.
• The batch processing approach is most common, and only 16% of companies are
pioneering the management and exploitation of real-time data.
• In the future, companies are planning to move to prescriptive and predictive analytics.
• These results highlight the emerging need to integrate heterogenous data to effectively
exploit all the information gathered by companies.
• The most adopted technical performance metric is data quality.
5/28/20 DataBench Project - GA Nr 780966 4
5. DataBench Project - GA Nr 780966 5
Desk analysis contribution
DataBench Project - GA Nr 780966 5
ü 703 use cases in total
ü 58 use cases per industry on average
Main sources of information: scientific literature, Web sites/white
papers from IT providers, public documentation from ICT 14-15 projects.
Comparing data from the survey with data from the desk analysis
provides mainstream vs. innovation insights.
A quali-quantitative analysis (tagging) can be found at:
http://131.175.15.19/databench/desk-analysis_17_2_2020.xlsx
RESEARCH
PAPERS
ICT
VENDORS
ICT 14-15
PROJECTS
5/28/20
6. Insights from the desk analysis
• Use cases from the desk analysis mainly deal with terabytes of data.
• Most use cases are mainly processing data in streaming, as well as iterative/in-memory
processing.
• The most widely used analytics type is by far predictive analytics, while prescriptive,
descriptive and diagnostic analytics are adopted in approximately the 30% of use cases.
• The most widely adopted performance metric seems to be the throughput.
• Data types are primarily tables and structured data, including structured legacy data,
graph and linked data and text and semi-structured data.
• Use cases store and process highly heterogenous data, thus stressing the growing need
and potential for data integration.
5/28/20 DataBench Project - GA Nr 780966 6
10. Case study analysis methodology
10
First interview Candidate provides documentation
(including blueprints of the IT
infrastructure, schemas of data
sources etc)
DataBench self-assessment toolDataBench industrial needs survey
Classification and selection of potential case
study candidates
Recruitment of the candidates
Maturity evaluation
Analysis of technical challenges
Analysis of business benefits
On-site visit
Engagement
Use case analysis
Second interview
Follow-up
5/28/20 DataBench Project - GA Nr 780966
The methodology was tested in the
Whirlpool pilot case study that allowed us
to improve the interview template to
serve important areas of in-depth
analysis:
- Analysis of technical challenges
- Analysis of business benefits
As a consequence of the pilot, we
introduced a second interview to collect
additional information and/or involve
other company profiles.
11. Interview template
Date / interviewer(s) / interviewee(s)
Company description
Case study description
Data characteristics Volume, velocity, variety and variability
Data sharing and exchange platform use
Data anonymization and privacy needs
Data processing and analytics characteristics Volatility, veracity, monetary value,
visualization, storage, processing, analytics and
machine learning/AI
Big Data specific challenges Short term, long term
Technical benchmark adoption Current, short term, long term
Relevant technical performance metrics
Expected benefits (including business KPIs) Current (measured), short term, long term
5/28/20 DataBench Project - GA Nr 780966 11
12. General insights (1/2)
• We have evidence of business KPIs for a subset of case studies, evidence is aligned with
results from the survey (business impact is in the 4-8% range).
• From the evidence that has been collected so far, an important lesson learnt is that most
companies believe that technical benchmarking requires highly specialized skills and a
considerable investment. We have found that very few companies have performed an
accurate and extensive benchmarking initiative. In this respect, using cloud solutions grants
them with an easier access to a broader set of technologies that they can experiment with.
• On the other hand, they acknowledge the variety and complexity of technical solutions for
big data and envision the following risks:
• The risk of realizing that they have chosen a technology that proves non scalable
over time, either technically or economically.
• The risk of relying on cloud technologies that might create a lock in and require a
considerable redesign of software to be migrated to other cloud technologies.
• The risk of discovering that cloud services are expensive, especially as a
consequence of scalability, and that technology costs are higher than business
benefits (edge vs. cloud decisions).
5/28/20 DataBench Project - GA Nr 780966 12
13. General insights (2/2)
• From a technical benchmarking perspective, it is important that
benchmarking is supported with tools that reduce complexity by guiding
users along predefined user journeys towards the identification and
execution of benchmarks.
• Results from previous benchmarking initiatives are also very useful.
• It is important to have cost estimates of individual technologies and end-
to-end solutions, on premises and in cloud to support edge vs. cloud
solutions.
• There exists a growing number of tools designed for a specific use case
(e.g. recommendation systems, markdown optimization systems, sensor
data normalization, etc.), which represent end-to-end off-the-shelf
solutions that are typically outside of the scope of benchmarking.
5/28/20 DataBench Project - GA Nr 780966 13
14. Business KPIs for case studies
• Intelligent fulfilment: +5% margin
• Recommendation systems: +3/4% margin
• Markdown: -20% promotional investment, +7% revenues, +5% margin
• Yield prediction in agriculture: +10% precision in yield prediction corresponding to +0.3%
profit increase from trading
• Rail transport quality of service: +10% logistic efficiency
• Manufacturing production quality: +5% reduction of quality issues
• New business models in manufacturing: cloud-based prediction maintenance service
5/28/20 14DataBench Project - GA Nr 780966
15. Yield prediction, Agriculture
5/28/20 15
• A company specialized on earth observation services has designed an innovative yield
prediction machine learning algorithm based on Sentinel and Landsat high-resolution
satellite information.
• The approach was tested with reference to the needs of a company operating in the
financial industry, providing predictions as a support to trading decisions.
• Machine learning algorithms have demonstrated roughly 10% more accurate, supporting
better investment decisions.
• The focus was the production of soy beans and corn in the US.
• The export market of soy beans and corn is aroung 21 billion dollar.
• The delta between projected and actual price was around 3.2%. This figure has been
reduced by 10%, with a corresponding gain from trading around 0.32% on traded
volumes.
DataBench Project - GA Nr 780966
16. Intelligent fulfilment, Retail
• Retail (grocery) industry
• Based on the idea of using machine learning to optimize assortment
selection and automated fulfilment at an individual shop level
• Complex AI system including machine learning (sales prediction)
• Piloted in one shop: 5% increase of margins (equivalent to roughly 5
million/year)
• Run on Spark in cloud on Amazon, 100 euro per run, per category, per
shop
• There are hundreds of categories, shops, and runs …
• Full deployment of project is currently on hold due to the economic
scalability of IT.
5/28/20 DataBench Project - GA Nr 780966 16
17. Online recommendation system
• Retail (grocery) industry
• Based on the idea of personalizing recommendations at an individual
customer level
• System designed to increase margins with up-sell and cross-sell
recommendations
• Measured 3-4% impact on margin
• Cross-sell tables too large to be deployed on a hardware appliance on
premises have been simplified to the client segment level
• Less personalized recommendations have been found to generate
1/10th of the clicks
5/28/20 DataBench Project - GA Nr 780966 17
18. A dangerous shift towards mass market
• Economic scalability issues cause business superficiality
• In turn, business superficiality drives a shift towards mass market
• For example, off-the-shelf recommendation systems suggest products
that are «most frequently purchased», that is mass market, shifting
customer behaviour towards purchasing choices that are less profitable
for the company (the opposite of up-selling)
• Simplifying recommendations by adopting an off-the-shelf
recommendation system is likely to result in a loss of competitiveness
5/28/20 DataBench Project - GA Nr 780966 18
19. Benchmarking can help reduce costs
Examples:
• Choice of the top performing DB can reduce costs by an order of magnitude
• AI benchmarking
• Choice of the most convenient machine in cloud
28/05/2020 19
20. Choice of optimal cloud instances
28/05/2020 DataBench Project - GA Nr 780966 20
AWS instance vCPU Memory # instances Price EMR Price Total cost
c5.9xlarge 36 72 2 $ 1.728 $ 0.270 $ 1 166 832.00
r5.xlarge 4 32 3 $ 0.252 $ 0.063 $ 275 940.00
- 76 %
• Intelligent fulfilment
• Targeting
Comparison between the cheapest optimal machine and the most expensive one for two different
use-cases.
AWS instance vCPU Memory # instances Price EMR Price Total cost
c5.4xlarge 16 32 45 $ 0.768 $ 0.170 $ 52 678.08
r5.24xlarge 96 768 2 $ 6.048 $ 0.270 $ 15 769.73
- 70 %
21. Is the cost issue general?
• Framing big data architecture by defining building blocks
• Design a per-use case architecture by combining building blocks
• Estimating data size and computation time
• Estimating cost in cloud
• We can tell where the cost issue is and what component should be benchmarked
• Prediction model/AI algorithm (e.g. intelligent fulfilment)
• Data preparation/query (e.g. targeting in telephone companies)
• Data management/database massive insert (IoT)
• Data storage (e.g. Satellite data)
5/28/20 21DataBench Project - GA Nr 780966
25. Intelligent fulfilment: sizing
28/05/2020 DataBench Project - GA Nr 780966 25
• Receipts per day: 200’000
• Average products per receipt: 7
• This results in 511’000’000 products sold per year in all shops
• Which means approximately 65 GB/year
• Suppose using 5 years of receipts to get better results, we would need at
least 325 GB of memory
26. Intelligent fulfilment: simulation
28/05/2020 DataBench Project - GA Nr 780966 26
• Four periods considered: weekdays with and without promotions,
weekends with and without promotions.
• Elapsed time for computing the optimal reorder point for each period:
around 60 minutes.
• Executions on Amazon AWS, prices refer to Amazon Ireland data centre.
• The following tables express the annual cost, by using respectively the
last one year and last five years datasets.
29. Targeting in telecom industry: schema
28/05/2020 DataBench Project - GA Nr 780966 29
30. Targeting: results
28/05/2020 DataBench Project - GA Nr 780966 30
• Targeting advantages are:
• Higher campaigns conversion rate
• Up-selling strategies to targeted customers
• Lower customer fatigue
• Targeting reduce the number of calls to customers, hence reducing the number
of call centre operators from 380 to 10.
• It results in cost reduction by over 90%. (From 15M€ to 400K€ yearly)
• This cost reduction enables new marketing models with higher redemption
rate.
31. Targeting: sizing
28/05/2020 DataBench Project - GA Nr 780966 31
• Number of customers: 5’000’000 (Big companies: 30M)
• Average SMS/MMS per day per customer: 10
• Average Calls per day per customer: 7
• Weekly processing of data to produce aggregated profiles
• 24 hours of processing: 12h for data preparation, 12h for targets
33. On-going work
28/05/2020 DataBench Project - GA Nr 780966 33
• Mapping 300 market technologies on blueprint components
• Mapping 60 benchmarks on market technologies
• Organize Databench knowledge around this knowledge base (in
cooperation with Sintef)
34. Contacts
28/05/2020 DataBench Project - GA Nr 780966 34
chiara.francalanci@polimi.it
paolo.ravanelli@polimi.it
gianmarco.ruggiero@polimi.it
giulio.costa@polimi.it