At the heart of this DataBench webinar is the goal to share a benchmarking process helping European organisations developing Big Data Technologies to reach for excellence and constantly improve their performance, by measuring their technology development activity against parameters of high business relevance.
The webinar aims to provide the audience with a framework and tools to assess the performance and impact of Big Data and AI technologies, by providing real insights coming from DataBench. In addition, representatives from other projects part of the BDV PPP such as DeepHealth and They-Buy-for-You will participate to share the challenges and opportunities they have identified on the use of Big Data, Analytics, AI. The perspective of other projects that also have looked into benchmarking, such as Track&Now and I-BiDaaS will be introduced.
Virtual BenchLearning - I-BiDaaS - Industrial-Driven Big Data as a Self-Service Solution
1. Dusan Jakovetic
University of Novi Sad, Faculty of Sciences, Serbia
Virtual BenchLearning Webinar
July 8, 2020
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 780787
Industrial-Driven Big Data as
a Self-Service Solution
2. Brief introduction to I-BiDaaS
I-BiDaaS pipeline, architecture & technologies
I-BiDaaS & BDVA reference model
Benchmarking landscape and opportunities @ I-BiDaaS
Outline
2
4. Consortium
1. FOUNDATION FOR RESEARCH AND TECHNOLOGY HELLAS (FORTH)
2. BARCELONA SUPERCOMPUTING CENTER - CENTRO NACIONAL DE
SUPERCOMPUTACION (BSC)
3. IBM ISRAEL - SCIENCE AND TECHNOLOGY LTD (IBM)
4. CENTRO RICERCHE FIAT SCPA (CRF)
5. SOFTWARE AG (SAG)
6. CAIXABANK, S.A (CAIXA)
7. THE UNIVERSITY OF MANCHESTER (UNIMAN)
8. ECOLE NATIONALE DES PONTS ET CHAUSSEES (ENPC)
9. ATOS SPAIN SA (ATOS)
10. AEGIS IT RESEARCH LTD (AEGIS)
11. INFORMATION TECHNOLOGY FOR MARKET LEADERSHIP (ITML)
12. UNIVERSITY OF NOVI SAD FACULTY OF SCIENCES SERBIA (UNSPMF)
13. TELEFONICA INVESTIGACION Y DESARROLLO SA (TID)
4
5. A complete and safe environment for
methodological big data experimentation
Tool and services to increase the quality of
data analytics
A Big Data as a Self-Service solution
that helps in breaking silos and boosts
EU's data-driven economy
Tools and services for fast ingestion and
consolidation of both realistic and fabricated
data
Tools and services for the management of
heterogeneous infrastructures
Increases impact in research community and
contributes to industrial innovation capacity
5
Key messages
6. • Expert mode
Analyze your DataUsers
• Import your data
• Self-service mode
Data
• Fabricate Data
• Stream & Batch Analytics
• Expert: Upload your code
• Self-service: Select an
algorithm from the pool
Results
• Visualize the results
• Share models
Do it yourself
In a flexible
manner
Break data silos Safe environment Interact with Big Data
technologies
Increase speed of data
analysis
Cope with the rate of data
asset growth
Intra- and inter-
domain data-flow
Benefits of using I-BiDaaS
• Co-develop mode
I-BiDaaS pipeline
• Co-develop: custom end-to-
end application
• Tokenize data
6
7. • Expert mode
Analyze your DataUsers
• Import your data
• Self-service mode
Data
• Fabricate Data
• Stream & Batch Analytics
• Expert: Upload your code
• Self-service: Select an
algorithm from the pool
Results
• Visualize the results
• Share models
Do it yourself
In a flexible
manner
Break data silos Safe environment Interact with Big Data
technologies
Increase speed of data
analysis
Cope with the rate of data
asset growth
Intra- and inter-
domain data-flow
Benefits of using I-BiDaaS
• Co-develop mode
• Co-develop: custom end-to-
end application
• Tokenize data
7
Flexible solution
I-BiDaaS pipeline
8. • Expert mode
Analyze your DataUsers
• Import your data
• Self-service mode
Data
• Fabricate Data
• Stream & Batch Analytics
• Expert: Upload your code
• Self-service: Select an
algorithm from the pool
Results
• Visualize the results
• Share models
Do it yourself
In a flexible
manner
Break data silos Safe environment Interact with Big Data
technologies
Increase speed of data
analysis
Cope with the rate of data
asset growth
Intra- and inter-
domain data-flow
Benefits of using I-BiDaaS
• Co-develop mode
• Co-develop: custom end-to-
end application
• Tokenize data
Data sharing
& breaking silos
8
I-BiDaaS pipeline
9. • Expert mode
Analyze your DataUsers
• Import your data
• Self-service mode
Data
• Fabricate Data
• Stream & Batch Analytics
• Expert: Upload your code
• Self-service: Select an
algorithm from the pool
Results
• Visualize the results
• Share models
Do it yourself
In a flexible
manner
Break data silos Safe environment Interact with Big Data
technologies
Increase speed of data
analysis
Cope with the rate of data
asset growth
Intra- and inter-
domain data-flow
Benefits of using I-BiDaaS
• Co-develop mode
• Co-develop: custom end-to-
end application
• Tokenize data
9
I-BiDaaS pipeline
10. • Expert mode
Analyze your DataUsers
• Import your data
• Self-service mode
Data
• Fabricate Data
• Stream & Batch Analytics
• Expert: Upload your code
• Self-service: Select an
algorithm from the pool
Results
• Visualize the results
• Share models
Do it yourself
In a flexible
manner
Break data silos Safe environment Interact with Big Data
technologies
Increase speed of data
analysis
Cope with the rate of data
asset growth
Intra- and inter-
domain data-flow
Benefits of using I-BiDaaS
• Co-develop mode
• Co-develop: custom end-to-
end application
• Tokenize data
10
I-BiDaaS pipeline
11. The I-BiDaaS solution:
Architecture & technologies
Medium to long term business decisions
Data
Fabrication
Platform
(IBM)
Refined
specifications
for data
fabrication
GPU-accelerated Analytics
(FORTH)
Apama Complex Event
Processing (SAG)
Streaming Analytics
Batch Processing
Advanced ML (UNSPMF)
COMPsProgramming
Model(BSC)
Query Partitioning
Infrastructurelayer: Private cloud; Commodity cluster; GPUs
Pre-defined
Queries
SQL-like
interface
Domain
Language
Programming
API
User Interface
Resource managementand orchestration (ATOS)
Advanced
Visualis.
Advanced IT services for
Big Data processing tasks;
Open source pool of ML
algorithms
Data ingestion
Programming
Interface /
Sequential
Programming
(AEGIS+SAG)
(AEGIS)
COMPsRuntime
(BSC)
Distributed
large scale
layer
Application layer
UniversalMessaging(SAG)
TestData
Fabrication(IBM)
Meta-
data;
Data
descri-
ption
Hecuba tools
(BSC)
Short term decisions
real time alerts
Model structure improvements
Learnedpatterns correlations
11
12. WP2:
Data, user interface, visualization
Technologies:
• IBM TDF
• SAG UM
• AEGIS AVT
Medium to long term business decisions
Data
Fabrication
Platform
(IBM)
Refined
specifications
for data
fabrication
GPU-accelerated Analytics
(FORTH)
Apama Complex Event
Processing (SAG)
Streaming Analytics
Batch Processing
Advanced ML (UNSPMF)
COMPs Programming
Model (BSC)
Query Partitioning
Infrastructure layer: Private cloud; Commodity cluster; GPUs
Pre-defined
Queries
SQL-like
interface
Domain
Language
Programming
API
User Interface
Resource management and orchestration (ATOS)
Advanced
Visualis.
Advanced IT services for
Big Data processing tasks;
Open source pool of ML
algorithms
Programming
Interface /
Sequential
Programming
(AEGIS+SAG)
(AEGIS)
COMPs Runtime
(BSC)
Distributed
large scale
layer
Application layer
UniversalMessaging(SAG)
DataFabrication
Platform(IBM)
Meta-
data;
Data
descri-
ption
Hecuba tools
(BSC)
Short term decisions
real time alerts
Model structure improvements
Learned patterns correlations
12
http://ibidaas.eu/tools
13. 13
Medium to long term business decisions
Data
Fabrication
Platform
(IBM)
Refined
specifications
for data
fabrication
GPU-accelerated Analytics
(FORTH)
Apama Complex Event
Processing (SAG)
Streaming Analytics
Batch Processing
Advanced ML (UNSPMF)
COMPs Programming
Model (BSC)
Query Partitioning
Infrastructure layer: Private cloud; Commodity cluster; GPUs
Pre-defined
Queries
SQL-like
interface
Domain
Language
Programming
API
User Interface
Resource management and orchestration (ATOS)
Advanced
Visualis.
Advanced IT services for
Big Data processing tasks;
Open source pool of ML
algorithms
Programming
Interface /
Sequential
Programming
(AEGIS+SAG)
(AEGIS)
COMPs Runtime
(BSC)
Distributed
large scale
layer
Application layer
UniversalMessaging(SAG)
DataFabrication
Platform(IBM)
Meta-
data;
Data
descri-
ption
Hecuba tools
(BSC)
Short term decisions
real time alerts
Model structure improvements
Learned patterns correlations
WP3:
Batch analytics
Technologies:
• BSC COMPSs
• BSC Hecuba
• BSC Qbeast
• Advanced ML (UNSPMF)
13
http://ibidaas.eu/tools
14. 14
Medium to long term business decisions
Data
Fabrication
Platform
(IBM)
Refined
specifications
for data
fabrication
GPU-accelerated Analytics
(FORTH)
Apama Complex Event
Processing (SAG)
Streaming Analytics
Batch Processing
Advanced ML (UNSPMF)
COMPs Programming
Model (BSC)
Query Partitioning
Infrastructure layer: Private cloud; Commodity cluster; GPUs
Pre-defined
Queries
SQL-like
interface
Domain
Language
Programming
API
User Interface
Resource management and orchestration (ATOS)
Advanced
Visualis.
Advanced IT services for
Big Data processing tasks;
Open source pool of ML
algorithms
Programming
Interface /
Sequential
Programming
(AEGIS+SAG)
(AEGIS)
COMPs Runtime
(BSC)
Distributed
large scale
layer
Application layer
UniversalMessaging(SAG)
DataFabrication
Platform(IBM)
Meta-
data;
Data
descri-
ption
Hecuba tools
(BSC)
Short term decisions
real time alerts
Model structure improvements
Learned patterns correlations
WP4:
Streaming analytics
Technologies:
• SAG Apama CEP
• FORTH GPU-accel. analytics
14
http://ibidaas.eu/tools
15. 15
Medium to long term business decisions
Data
Fabrication
Platform
(IBM)
Refined
specifications
for data
fabrication
GPU-accelerated Analytics
(FORTH)
Apama Complex Event
Processing (SAG)
Streaming Analytics
Batch Processing
Advanced ML (UNSPMF)
COMPs Programming
Model (BSC)
Query Partitioning
Infrastructure layer: Private cloud; Commodity cluster; GPUs
Pre-defined
Queries
SQL-like
interface
Domain
Language
Programming
API
User Interface
Resource management and orchestration (ATOS)
Advanced
Visualis.
Advanced IT services for
Big Data processing tasks;
Open source pool of ML
algorithms
Programming
Interface /
Sequential
Programming
(AEGIS+SAG)
(AEGIS)
COMPs Runtime
(BSC)
Distributed
large scale
layer
Application layer
UniversalMessaging(SAG)
DataFabrication
Platform(IBM)
Meta-
data;
Data
descri-
ption
Hecuba tools
(BSC)
Short term decisions
real time alerts
Model structure improvements
Learned patterns correlations
WP5:
Resource mgmt & integration
Technologies:
• ATOS Resource mgmt
• ITML integration services
15
http://ibidaas.eu/tools
17. 17
BDVA reference model horizontal concerns
& I-BiDaaS
BDV reference model horizontal concern I-BiDaaS module or platform as a whole
Data visualization and user interaction
Advanced visualization module (AEGIS+SAG);
User interface (AEGIS)
Data analytics
Batch processing module (UNSPMF+BSC);
Streaming analytics module (SAG+FORTH)
Data processing architectures expected
advances according to BDVA SRIA
I-BiDaaS platform
Data protection
FORTH commodity cluster privacy preservation through
commodity hardware (Intel SGX); TDF (IBM) for generation of
realistic synthetic data when real data cannot be uploaded to
cloud or similar systems
Data management
COMPSs runtime (BSC); ATOS resource management and
orchestration module
The Cloud and HPC
(efficient usage of Cloud) ATOS resource management and
orchestration module
18. Brief introduction to I-BiDaaS
I-BiDaaS pipeline, architecture & technologies
I-BiDaaS & BDVA reference model
Benchmarking landscape and opportunities @ I-BiDaaS
Outline
18
19. 19
Benchmarking: Technology level
I-BiDaaS partner Technology name Big Data pipeline element Current benchmarks
FORTH GPU accelerator technology Data pre-processing, Streaming
Analytics
Custom benchmark (throughput, latency)
BSC COMPSs Sequential programming model for
distributed architectures
Applications (Own use cases)
BSC Hecuba Data management framework with
easy interface
Applications (Own use cases)
BSC Qbeast Multidimensional indexing and
storage
TCP-H
IBM Test Data Fabrication Synthetic test data fabrication Several open source + commercial products (e.g., Grid
tools of CA) / No known benchmarks yet
SAG Apama Streaming Analytics Platform Streaming Analytics Custom benchmark (throughput)
SAG Universal Messaging Message Broker Custom benchmark (throughput)
SAG WebMethods Integration Platform Integration N/A
SAG MashZone Visualization N/A
AEGIS Advanced visualization and monitoring Visualization and interface N/A
UNSPMF Pool of ML algorithms in
COMPSs/Python
Batch analytics Respective MPI implementation; Sklearn
ATOS Resource management and orchestration
module
Resource management N/A
20. Business Objectives Data Sets Data Size Processing Type Type of Analysis
Telecoms
⁻ improve and optimize
current operations
⁻ Anonymized mobility data
(structured)
⁻ Anonymized call center data
(unstructured)
TB ⁻ batch & streaming ⁻ predictive
⁻ descriptive /
diagnostic
Finance
⁻ improve decision
making
⁻ improve efficiency of Big
Data solutions
⁻ Tokenized online banking
control data (structured)
⁻ Tokenized bank transfer data
(structured)
⁻ Tokenized IP address data
(structured)
PB ⁻ batch
⁻ batch & streaming
⁻ descriptive /
diagnostic
Manufacturing
⁻ improve and optimise
current operations
⁻ improve the quality of
the process and product
⁻ Anonymized SCADA/MES data
(structured)
⁻ Anonymized Aluminum Die-
casting (structured)
GB ⁻ batch
⁻ batch & streaming
⁻ predictive
⁻ diagnostic
Benchmarking: Business, data & analytics level
20
21. I-BiDaaS
Partner
Use Case Most relevant business KPIs
TID Accurate location prediction with high traffic and visibility - Acquisition of insights on the dynamics of
cellular sectors
- Processing costs (cost reduction)
- Customer satisfaction
TID Optimization of placement of telecommunication equipment
TID Quality of service in Call Centers
CAIXA Enhanced control on online banking - Cost reduction
- Data accessibility
- Time efficiency
- End-to-end execution time (from data request
to data provision)
CAIXA Advanced analysis of bank transfer payment in financial terminal
CAIXA Analysis of relationships through IP addresses
CRF Production process of aluminium die-casting - 3 Product quality levels (High. Medium, Low)
- Overall Equipment Effectiveness (OEE),
- Maintenance cost
- Cost reduction
CRF Maintenance and monitoring of production assets
Benchmarking: Business level
21
Editor's Notes
The full name of the I-BiDaaS project is Industrial-driven big data as a self service solution. I-BiDaaS is a European Project, RIA (Research and Innovation Action) that means focused on research but as applied as possible. The duration is 36 months. We started in the beginning of 2018 and we have just completed the 1st year.
The I-BiDaaS consortium has 13 partners. A mixture of Universities, Research Centers, SMEs, Industry.
The I-BiDaaS consortium was constructed in such a way as to put together a group of organizations with high complementarity in terms of technical competence, organizational, business and market experience. All the partner share a track record of proven leadership, technical expertise, extensive experience and critical skills in all the key areas required for the project to achieve its objectives. With respect to the ability to carry out the proposed work, all partners in I-BiDaaS have participated in successful EU projects in the past.