SlideShare a Scribd company logo
Magneti Marelli, ICT Innovation
Road to Enterprise Architecture for Big Data
Applications
Mixing Apache Spark with singletons, wrapping, facade
London, United Kingdom
#SAISEnt4
#SAISEnt4
Company Overview
Magneti Marelli is an international company committed to the design and production of hi-tech systems and
components for the automotive sector.
AUTOMOTIVE LIGHTING
(Headlamp, Rearlamp, Lighting and Body Electronics)
ELECTRONICS
(Instrument Clusters, Infotainment & Telematics)
SUSPENSION SYSTEMS AND SHOCK ABSORBERS
(Suspension Systems, Shock Absorbers and Dynamic Systems)
PLASTIC COMPONENTS AND MODULES
(Bumper, Dashboard, Central Console, Pedals, Hand Brake Levers and Fuel System)
AFTERMARKET PARTS & SERVICES
(Mechanical, Body Work, Electrics and Electronic and Consumables)
EXHAUST SYSTEMS
(Manifolds, Catalytic converter, Diesel Particulate Filter and Mufflers)
POWERTRAIN
(Gasoline and Diesel engine control, Electric Motor, Inverter and Transmission)
Corporate Presentation 2October 6, 2018
MOTORSPORT
(Injection Systems, Electronic Control Units, Hybrid Systems, Telemetry Systems, Electric Actuators)
Magneti Marelli Worldwide Footprint
Corporate Presentation 3
PP - AC
PP - AC
PP - AC
USA PP – AC
MEXICO
BRASIL
ARGENTINA
GERMANY
POLAND
CZECH REP.
SLOVAKIA
RUSSIA
SERBIA
TURKEY
PP - R&D – ACCHINA
JAPAN
KOREA
MALAYSIA
INDIAITALY
SPAIN
PP - AC
PP – R&D - AC
PP
PP - AC
PP – R&D – AC PP – R&D - AC
PP - AC
PP - R&D – AC
PP - R&D – AC
PP
PP - AC
AC
PP
October 6, 2018
PP: Production Plant R&D: R&D Center AC: Application Center
FRANCE
ROMANIA R&D - AC
#SAISEnt4
Big Data storyline
4
BUSINESS
EXPLORATION
PHASE
PROOF OF CONCEPT
PHASE
PRODUCTION
PHASE
CUMULATIVE ROWS PROCESSED
30bn
0bn
10bn
20bn
Apr 2017 Jun 2017 Oct 2017 Jan 2018 Mar 2018 Jul 2018
Jan 2017
Big Data
group was
created
Aug 2017
Welding
Machines
POC
29th Jan 2018
Databricks
was adopted
Apr 2018
MARC 1.0
released
Jun 2018
The SMT
project
Aug 2018
The
Metalizers
project
Nov 2017
SMT POC
Sep 2017
Telemetry
POC
Feb 2017
First POC
approved,
data loaded
from USB
#SAISEnt4
The Surface-Mount Technology (SMT) project
5
Surface-Mount Technology PCB Preparation Assembly Line
Pre Production & Assembly Line
#SAISEnt4
The Surface-Mount Technology (SMT) project
6
Surface-Mount Technology PCB Preparation Assembly Line
Pre Production & Assembly Line
Timestamp
PCB ID
1 file per Item
Timestamp
Temperature
Humidity
1 file per day
Timestamp
PCB ID
Soldering Paste
Sensor Data
Temperature
…
1 file per Item
NIP
Pick Up Info
Feeder Info
DB SQL
NIP
Images
Sensor Data
Anomalies
MDB
NIP
Images
Sensor Data
PCB Final Status
DB SQL
Lasermarker Serigraphy
Automated Optical
Inspection
Post Printing
Pick and Place Oven
Automated Optical
Inspection
Post Reflow
#SAISEnt4
The Surface-Mount Technology (SMT) project
7
Surface-Mount Technology PCB Preparation Assembly Line
Pre Production & Assembly Line
Timestamp
PCB ID
1 file per Item
Timestamp
Temperature
Humidity
1 file per day
Timestamp
PCB ID
Soldering Paste
Sensor Data
Temperature
…
1 file per Item
NIP
Pick Up Info
Feeder Info
DB SQL
NIP
Images
Sensor Data
Anomalies
MDB
NIP
Images
Sensor Data
PCB Final Status
DB SQL
Lasermarker Serigraphy
Automated Optical
Inspection
Post Printing
Pick and Place Oven
Automated Optical
Inspection
Post Reflow
Machine Learning
Problems
1. Machine Status
Monitoring
2. Bottleneck
harmonic model
3. Anomaly
Recommender
Engine
#SAISEnt4
A Dream Project
8
Production process is well known1
1
Data source is clearly defined2
2
Need is raised by plant people3
3
Algorithmic challenges are clear4
4
#SAISEnt4
#SAISEnt4
?
Becoming a Nightmare
9
Success!!!
• So… Where is the data?
• How can I read/access the data?
• How can I be supported by my data
scientist colleagues?
• How can I attach a Spark cluster to my
Jupyter notebook?
• Who is going to port the notebook to
production?
• What do you mean with production?
Production process is well known1
Data source is clearly defined2
Need is raised by plant people3
Algorithmic challenges are clear4
#SAISEnt4
Acquire Store, Transform, Enrich, Organize & Aggregate
Data Integration, Transform, Aggregate
Data Sources
Structured Data
10
Enterprise Architecture
Logical Data Warehouse
Data Marts
Traditional Enterprise Data Warehouse
Distributed Process
Other/
HadoopNoSQL
Analyze & Deliver Insight
Data
Services
Market-
place or
Datasets
Self-Service
Data
Preparation
/
Data
Access
Layer
Analytic
Capabilities
Analyze
Optimize
Forecast
Report
Plan
Discover
Collaborate
Predict
Model
Advanced
Analytics
AudioVideo
IoT Feeds
Streaming
Unstructured/Semi-
structured Data
ImageIT Log
SAP
Operational Systems
Text Doc External
IOT
MES
Stream Ingestion
Real Time
Batch/Micro-
batch/CDC
Staging, At Rest,
CDC
Streaming, In Motion
External Doc
Text
Other Sources
Master Data Management
Data Quality
MDM DQ
SQL
Data Lake (Curated, Enriched & Transformed Data)
DataLake(RawData)
SQL
Machine Learning layer
Data Science
Environment
The Technology Bazar
ARCHITECTURE != LIST OF TECHNOLOGIES AND FANCY ARROWS
#SAISEnt4 11
Keep it simple, stupid
PRESENTATIONEXPLORATION PRODUCTIONINGESTION AND STORAGE
Hammer
Gateway
HG Job 1
HG Job 2
HG Job n
DATA LAKE
Azure Data Lake
Store
DATA
EXPLORATION
Notebooks
Datamart
ARCHITECTURAL OBJECTS
MESSAGE
QUEUE
FORWARDER
BUSINESS
LOGICS
Dbutils+Spark
APP
MCO
mco.read
mco.write
AZURE
FUNCTIONS
mco.log
#SAISEnt4 12
Enterprise Architecture – Data, where are you?
Hammer
Gateway
HG Job 1
HG Job 2
HG Job n
Pick & Place
DATA
EXPLORATION
Noteb
ooks DATAMART
APP
ENGINE
M
ES
S
A
G
E
Q
U
E
U
E
F
OR
W
A
R
D
E
RMCO
mco.read
mco.write
mco.run
I
ON
S
BUSINESS
LOGICS
Dbutils
+Spark
HammerGateway
HG
Job1
HG
Job2
HG
Jobn
DATALAKE
Dat
a
Lak
e
Sto
re
• Where are the csv?
• What do you mean with bcp out?
• How can I get a copy in the cloud?
• How can I update data on a regular basis?
?
Write
from
scratch
a sort
of ETL
tool
Production ready tentative
(10% of times)
USB
loading
Quick and Dirty (90%
of times)
DATA LAKE
Azure Data Lake
Store
#SAISEnt4
Enterprise Architecture – The Jupyter case
• How could I work together with other data
scientists?
• How can I deal with computation spikes?
• How can I attach an Apache Spark cluster to
my Jupyter?
• Damn, Java Heap Memory Exception: what
do you mean?
Hammer Gateway
HG Job 1
HG Job 2
HG Job n
Pick & Place
HammerGateway
HG
Job1
HG
Job2
HG
Jobn
DATALAKE
Dat
a
Lak
e
Sto
re
DATAMART
APP
ENGINE
M
ES
S
A
G
E
Q
U
E
U
E
F
OR
W
A
R
D
E
RMCO
mco.read
mco.write
mco.run
BUSINESS
LOGICS
Dbutils
+Spark
DATA
EXPLORATION
Noteb
ooks
DATA
EXPLORATION
Notebooks
MCO
mco.read
mco.write
AZURE
FUNCTIONS
mco.log
DATA LAKE
Azure Data Lake
Store
#SAISEnt4
One singleton to rule them all
mco
Pattern: Singleton
Use: bring tokens and
technical access to notebook
Benefit:
• enhancing security
• enabling access control
• reducing vendor lock-in
effect
mco.read
Pattern: Wrapping
Use: take data from data lake
knowing only data names
Benefit:
• no one will need to know
where data are or how
data are stored
• incremental read
capability out-of-the-box
• reducing time to port code
in production
• reduced reading time
• reduce the vendor lock-in
effect (to propagate a
new HDFS PAAS vendor
on all services is a matter
of hours)
mco.log
Pattern: Wrapping
Use: bring developer grade
logging capability
Benefit:
• reducing debug time
• enabling process audits
mco.write
Pattern: Wrapping
Use: save data everywhere
Benefit:
• no one will need to know
where data must be put
or how
• avoid dangerous
behavious such as writing
on a SQL with a
transformation action
(connection pool, my
beloved friend…)
MCO
mco.read
mco.write
AZURE
FUNCTIONS
mco.log
#SAISEnt4 15
Enterprise Architecture – The model is ready!
• Is the code production ready?
• Who is going to port the notebook to
production?
• Developer algorithm is wrong: it produces
different numbers…
• Ok, I got it! I’ll need a crontab… but where?
HammerGateway
HG
Job1
HG
Job2
HG
Job
n
DATALAKE
Dat
a
Lak
e
Sto
re
DATAMART
APP
ENGINE
M
ES
S
A
G
E
Q
U
E
U
E
F
OR
W
A
R
D
E
RMCO
mco.read
mco.write
mco.run
I
O
N
S
BUSINESS
LOGICS
Dbutils
+Spark
DATA
EXPLORATION
Noteb
ooks
Hammer Gateway
HG Job 1
HG Job 2
HG Job n
Pick &
Place
DATA
EXPLORATION
Notebooks
BUSINESS
LOGICS
Dbutils+Spark
MESSAGE
QUEUE
FORWARDER
#SAISEnt4 16
A For-what? What the hell?
MESSAGE
QUEUE
FORWARDER
Clean
SMT Data
Cycle
Time
Anomaly Det
Super
Secret
Service
…
DATAMART
Clean SMT Data service is the
first in the MESSAGE QUEUE
and is sent to the
FORWARDER. Status is set to
WIP
This service is forwarded to
Databricks
Databricks gets data through
the MCO
Once data is loaded, the Spark
code starts the cleaning job
Cleaned data is cached in the
DATAMART through the MCO.
Status is set to DONE
TO-DO
TO-DO
TO-DO
…
WIPDONE
WIPDONE
APP
BUSINESS
LOGICS
Dbutils+Spark
#SAISEnt4 17
A For-what? What the hell?
MESSAGE
QUEUE
FORWARDER
Clean
SMT Data
Cycle
Time
Anomaly Det
Super
Secret
Service
…
DATAMART
TO-DO
TO-DO
TO-DO
…
WIPDONE
WIPDONE
APP
WIP
BUSINESS
LOGICS
Dbutils+Spark
#SAISEnt4 18
A For-what? What the hell?
MESSAGE
QUEUE
FORWARDER
Clean
SMT Data
Cycle
Time
Anomaly Det
Super
Secret
Service
…
DATAMART
TO-DO
TO-DO
TO-DO
…
WIPDONE
WIPDONE
APP
WIP
X
• What if some datasets could not be moved into the cloud?
• How to deal with super secret business logic?
FORWARDER as the main component for cloud hybridization!
Data needed to run Super
Secret Service cannot be
moved outside Magneti Marelli
servers
BUSINESS
LOGICS
Dbutils+Spark
#SAISEnt4
ON PREMISE
19
A For-what? What the hell?
MESSAGE
QUEUE
FORWARDER
Clean
SMT Data
Cycle
Time
Anomaly Det
Super
Secret
Service
…
DATAMART
TO-DO
TO-DO
TO-DO
…
WIPDONE
WIPDONE
APP
WIP
EDW
Super Secret Service is
forwarded to an on premise
Apache Spark cluster
Apache Spark cluster gets data
through the MCO
Outcome is persisted on an on
premise SQL Server
A custom web app allow user to
see job output
BUSINESS
LOGICS
Dbutils+Spark
#SAISEnt4 20
A For-what? What the hell? – Predictive balancing
Clean
PS Data
RAM
Intensive
JOB
TO-DO
TO-DO
WIPDONE
WIP
X
A first service is submitted by
the forwarder
A second service is submited
before the first finished. The
Cluster is busy witht the other
computation
The forwarder can submit the
job to any application server, so
it creates a new Databricks
cluster and submit to it. Cluster
creation is anticipated using
predictive algorithms.
MESSAGE
QUEUE
Clean
SMT Data
Cycle
Time
Anomaly Det
Super
Urgent
Service
…
TO-DO
TO-DO
TO-DO
…
WIPDONE
WIPDONE
WIP
300 GB RAM (90%)
2 hours long
100 GB RAM
5 minutes long
Slow Services
Cluster
Dbutils+Spark
Fast Services
Cluster
Dbutils+Spark
#SAISEnt4 21
Enterprise Architecture – Don’t mind about nerd stuff
HammerGateway
HG
Job1
HG
Job2
HG
Job
n
DATALAKE
Dat
a
Lak
e
Sto
re
DATAMART
APP
ENGINE
M
ES
S
A
G
E
Q
U
E
U
E
F
OR
W
A
R
D
E
RMCO
mco.read
mco.write
mco.run
I
O
N
S
BUSINESS
LOGICS
Dbutils
+Spark
DATA
EXPLORATION
Noteb
ooks
Hammer Gateway
HG Job 1
HG Job 2
HG Job n
Pick &
Place
BUSINESS
LOGICS
Dbutils+Spark
Data Scientists’ presentation concerns:
• How do I write a web page?
• Do I need to bootstrap?
• MV-what? I thought Spring was just a
season!
• Single sign-on? What do you mean? MESSAGE
QUEUE
FORWARDER
DATAMART
DATA
EXPLORATION
Notebooks
APP
#SAISEnt4
NO-COMPLEXITY
DATA INGESTION AND STORAGE
22
PRESENTATION
DATA
SCIENTIST
TOYBOX
PRODUCTION
Enterprise Architecture (“…and in the darkness bind them”)
1 day to add a new source1
Architecture «embeds» the guideline2
Presentation layer is drag and drop3
Service queuing ensures enterprise
and managed scalability
4
Data scientists do not waste time in
boring activities
5
Pick & Place
MESSAGE
QUEUE
FORWARDER
Hammer
Gateway
HG Job 1
HG Job 2
HG Job n
DATA
EXPLORATION
Notebooks
Datamart
BUSINESS
LOGICS
Dbutils+Spark
APP
MCO
mco.read
mco.write
AZURE
FUNCTIONS
mco.log
DATA LAKE
Azure Data Lake
Store
#SAISEnt4
“I have done the deed. Did you hear a noise?”
23
“The Guide says there is an art to flying", said Ford, "or
rather a knack. The knack lies in learning how to throw
yourself at the ground and miss.”
Production process is well known1
Data source is clearly defined2
Need is raised by plant people3
Algorithmic challenges are clear4
#SAISEnt4
The Surface-Mount Technology (SMT) project
24
Surface-Mount Technology PCB Preparation Assembly Line
Pre Production & Assembly Line
Timestamp
PCB ID
1 file per Item
Timestamp
Temperature
Humidity
1 file per day
Timestamp
PCB ID
Soldering Paste
Sensor Data
Temperature
…
1 file per Item
NIP
Pick Up Info
Feeder Info
DB SQL
NIP
Images
Sensor Data
Anomalies
MDB
NIP
Images
Sensor Data
PCB Final Status
DB SQL
Lasermarker Serigraphy
Automated Optical
Inspection
Post Printing
Pick and Place Oven
Automated Optical
Inspection
Post Reflow
Machine Learning
Problems
1. Machine Status
Monitoring
2. Bottleneck
harmonic model
3. Anomaly
Recommender
Engine
#SAISEnt4
Anomaly Recommender Engine
25
Use Case
Support maintenance team to
prioritize standard and
extraordinary maintenance
activities.
Benefit
Reducing machine stoppage losses
per year per line.
Down time reduction.
Description
A summary dashboard shows the
health of each part of the line. A drill
down with details is available.
#SAISEnt4
Much ado about nothing… ?
26
Surface-Mount Technology PCB Preparation Assembly Line
Pre Production & Assembly Line
Break-even point reached after
8 months
Cost per line reduced by 90%
after the first one
Return On Investment: 12X in 3
years
• Databricks and Microsoft PowerBI allow a very cost
effective first project
• Hammer Gateway allows cost effective ingestion
• MCO enabled Data Scientists to convert notebooks
in services with a very very low effort
• Microsoft Azure and Databricks ensure endless
scalability
#SAISEnt4
People behind
27
Andrea
CONDORELLI
Giovanni
FAZZI
Manuela
DETOMASO
Florindo
PALLADINO
THE TEAM
Alessandro
SICOLI
Dario
CASTELLO
Heinrich-Gerhard
SCHUERING
SUPPORTERS

More Related Content

Similar to Road to Enterprise Architecture for Big Data Applications: Mixing Apache Spark with Singletons, Wrapping, and Facade with Andrea Condorelli

Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Ali Hodroj
 
Bogdan Kecman INIT Presentation
Bogdan Kecman INIT PresentationBogdan Kecman INIT Presentation
Bogdan Kecman INIT Presentation
arhismece
 
The Fine Art of Time Travelling - Implementing Event Sourcing - Andrea Saltar...
The Fine Art of Time Travelling - Implementing Event Sourcing - Andrea Saltar...The Fine Art of Time Travelling - Implementing Event Sourcing - Andrea Saltar...
The Fine Art of Time Travelling - Implementing Event Sourcing - Andrea Saltar...
ITCamp
 
Big data bi-mature-oanyc summit
Big data bi-mature-oanyc summitBig data bi-mature-oanyc summit
Big data bi-mature-oanyc summit
Open Analytics
 
Reinventing DDC in the Age of Data Analytics
Reinventing DDC in the Age of Data AnalyticsReinventing DDC in the Age of Data Analytics
Reinventing DDC in the Age of Data Analytics
Memoori
 
L'Internet des objets (IDO)
L'Internet des objets (IDO)L'Internet des objets (IDO)
L'Internet des objets (IDO)
Cisco Canada
 
Lessons learned building a big data analytics engine, from proprietary to ope...
Lessons learned building a big data analytics engine, from proprietary to ope...Lessons learned building a big data analytics engine, from proprietary to ope...
Lessons learned building a big data analytics engine, from proprietary to ope...
J On The Beach
 
RightScale Roadtrip Boston: Accelerate to Cloud
RightScale Roadtrip Boston: Accelerate to CloudRightScale Roadtrip Boston: Accelerate to Cloud
RightScale Roadtrip Boston: Accelerate to Cloud
RightScale
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Paco Nathan
 
Get the most out of Oracle Data Guard - OOW version
Get the most out of Oracle Data Guard - OOW versionGet the most out of Oracle Data Guard - OOW version
Get the most out of Oracle Data Guard - OOW version
Ludovico Caldara
 
Apache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and SmarterApache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and Smarter
Databricks
 
Bogdan Kecman Advanced Databasing
Bogdan Kecman Advanced DatabasingBogdan Kecman Advanced Databasing
Bogdan Kecman Advanced Databasing
Bogdan Kecman
 
Real-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLReal-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQL
SingleStore
 
Cloud Computing ...changes everything
Cloud Computing ...changes everythingCloud Computing ...changes everything
Cloud Computing ...changes everything
Lew Tucker
 
Splunk App for Stream - Einblicke in Ihren Netzwerkverkehr
Splunk App for Stream - Einblicke in Ihren NetzwerkverkehrSplunk App for Stream - Einblicke in Ihren Netzwerkverkehr
Splunk App for Stream - Einblicke in Ihren Netzwerkverkehr
Georg Knon
 
Visual, Interactive, Predictive Analytics for Big Data
Visual, Interactive, Predictive Analytics for Big DataVisual, Interactive, Predictive Analytics for Big Data
Visual, Interactive, Predictive Analytics for Big Data
Arimo, Inc.
 
Cv2017
Cv2017Cv2017
Intro to Spark development
 Intro to Spark development  Intro to Spark development
Intro to Spark development
Spark Summit
 
Angular (v2 and up) - Morning to understand - Linagora
Angular (v2 and up) - Morning to understand - LinagoraAngular (v2 and up) - Morning to understand - Linagora
Angular (v2 and up) - Morning to understand - Linagora
LINAGORA
 
Preventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive IndustryPreventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive Industry
DataWorks Summit/Hadoop Summit
 

Similar to Road to Enterprise Architecture for Big Data Applications: Mixing Apache Spark with Singletons, Wrapping, and Facade with Andrea Condorelli (20)

Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
 
Bogdan Kecman INIT Presentation
Bogdan Kecman INIT PresentationBogdan Kecman INIT Presentation
Bogdan Kecman INIT Presentation
 
The Fine Art of Time Travelling - Implementing Event Sourcing - Andrea Saltar...
The Fine Art of Time Travelling - Implementing Event Sourcing - Andrea Saltar...The Fine Art of Time Travelling - Implementing Event Sourcing - Andrea Saltar...
The Fine Art of Time Travelling - Implementing Event Sourcing - Andrea Saltar...
 
Big data bi-mature-oanyc summit
Big data bi-mature-oanyc summitBig data bi-mature-oanyc summit
Big data bi-mature-oanyc summit
 
Reinventing DDC in the Age of Data Analytics
Reinventing DDC in the Age of Data AnalyticsReinventing DDC in the Age of Data Analytics
Reinventing DDC in the Age of Data Analytics
 
L'Internet des objets (IDO)
L'Internet des objets (IDO)L'Internet des objets (IDO)
L'Internet des objets (IDO)
 
Lessons learned building a big data analytics engine, from proprietary to ope...
Lessons learned building a big data analytics engine, from proprietary to ope...Lessons learned building a big data analytics engine, from proprietary to ope...
Lessons learned building a big data analytics engine, from proprietary to ope...
 
RightScale Roadtrip Boston: Accelerate to Cloud
RightScale Roadtrip Boston: Accelerate to CloudRightScale Roadtrip Boston: Accelerate to Cloud
RightScale Roadtrip Boston: Accelerate to Cloud
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
 
Get the most out of Oracle Data Guard - OOW version
Get the most out of Oracle Data Guard - OOW versionGet the most out of Oracle Data Guard - OOW version
Get the most out of Oracle Data Guard - OOW version
 
Apache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and SmarterApache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and Smarter
 
Bogdan Kecman Advanced Databasing
Bogdan Kecman Advanced DatabasingBogdan Kecman Advanced Databasing
Bogdan Kecman Advanced Databasing
 
Real-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLReal-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQL
 
Cloud Computing ...changes everything
Cloud Computing ...changes everythingCloud Computing ...changes everything
Cloud Computing ...changes everything
 
Splunk App for Stream - Einblicke in Ihren Netzwerkverkehr
Splunk App for Stream - Einblicke in Ihren NetzwerkverkehrSplunk App for Stream - Einblicke in Ihren Netzwerkverkehr
Splunk App for Stream - Einblicke in Ihren Netzwerkverkehr
 
Visual, Interactive, Predictive Analytics for Big Data
Visual, Interactive, Predictive Analytics for Big DataVisual, Interactive, Predictive Analytics for Big Data
Visual, Interactive, Predictive Analytics for Big Data
 
Cv2017
Cv2017Cv2017
Cv2017
 
Intro to Spark development
 Intro to Spark development  Intro to Spark development
Intro to Spark development
 
Angular (v2 and up) - Morning to understand - Linagora
Angular (v2 and up) - Morning to understand - LinagoraAngular (v2 and up) - Morning to understand - Linagora
Angular (v2 and up) - Morning to understand - Linagora
 
Preventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive IndustryPreventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive Industry
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

Digital Marketing Performance Marketing Sample .pdf
Digital Marketing Performance Marketing  Sample .pdfDigital Marketing Performance Marketing  Sample .pdf
Digital Marketing Performance Marketing Sample .pdf
Vineet
 
06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus
Timothy Spann
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
dataschool1
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
nyvan3
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
Data Scientist Machine Learning Profiles .pdf
Data Scientist Machine Learning  Profiles .pdfData Scientist Machine Learning  Profiles .pdf
Data Scientist Machine Learning Profiles .pdf
Vineet
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
Vietnam Cotton & Spinning Association
 
一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理
ugydym
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
Alireza Kamrani
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
Vineet
 
8 things to know before you start to code in 2024
8 things to know before you start to code in 20248 things to know before you start to code in 2024
8 things to know before you start to code in 2024
ArianaRamos54
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
vasanthatpuram
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
NABLAS株式会社
 
Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)
GeorgiiSteshenko
 
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdfreading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
perranet1
 
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdfOverview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
nhutnguyen355078
 
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative ClassifiersML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
MastanaihnaiduYasam
 
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理 原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
tzu5xla
 
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
9gr6pty
 

Recently uploaded (20)

Digital Marketing Performance Marketing Sample .pdf
Digital Marketing Performance Marketing  Sample .pdfDigital Marketing Performance Marketing  Sample .pdf
Digital Marketing Performance Marketing Sample .pdf
 
06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
Data Scientist Machine Learning Profiles .pdf
Data Scientist Machine Learning  Profiles .pdfData Scientist Machine Learning  Profiles .pdf
Data Scientist Machine Learning Profiles .pdf
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
 
一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
 
8 things to know before you start to code in 2024
8 things to know before you start to code in 20248 things to know before you start to code in 2024
8 things to know before you start to code in 2024
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
 
Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)
 
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdfreading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
 
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdfOverview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
 
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative ClassifiersML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
 
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理 原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
 
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
 

Road to Enterprise Architecture for Big Data Applications: Mixing Apache Spark with Singletons, Wrapping, and Facade with Andrea Condorelli

  • 1. Magneti Marelli, ICT Innovation Road to Enterprise Architecture for Big Data Applications Mixing Apache Spark with singletons, wrapping, facade London, United Kingdom #SAISEnt4 #SAISEnt4
  • 2. Company Overview Magneti Marelli is an international company committed to the design and production of hi-tech systems and components for the automotive sector. AUTOMOTIVE LIGHTING (Headlamp, Rearlamp, Lighting and Body Electronics) ELECTRONICS (Instrument Clusters, Infotainment & Telematics) SUSPENSION SYSTEMS AND SHOCK ABSORBERS (Suspension Systems, Shock Absorbers and Dynamic Systems) PLASTIC COMPONENTS AND MODULES (Bumper, Dashboard, Central Console, Pedals, Hand Brake Levers and Fuel System) AFTERMARKET PARTS & SERVICES (Mechanical, Body Work, Electrics and Electronic and Consumables) EXHAUST SYSTEMS (Manifolds, Catalytic converter, Diesel Particulate Filter and Mufflers) POWERTRAIN (Gasoline and Diesel engine control, Electric Motor, Inverter and Transmission) Corporate Presentation 2October 6, 2018 MOTORSPORT (Injection Systems, Electronic Control Units, Hybrid Systems, Telemetry Systems, Electric Actuators)
  • 3. Magneti Marelli Worldwide Footprint Corporate Presentation 3 PP - AC PP - AC PP - AC USA PP – AC MEXICO BRASIL ARGENTINA GERMANY POLAND CZECH REP. SLOVAKIA RUSSIA SERBIA TURKEY PP - R&D – ACCHINA JAPAN KOREA MALAYSIA INDIAITALY SPAIN PP - AC PP – R&D - AC PP PP - AC PP – R&D – AC PP – R&D - AC PP - AC PP - R&D – AC PP - R&D – AC PP PP - AC AC PP October 6, 2018 PP: Production Plant R&D: R&D Center AC: Application Center FRANCE ROMANIA R&D - AC
  • 4. #SAISEnt4 Big Data storyline 4 BUSINESS EXPLORATION PHASE PROOF OF CONCEPT PHASE PRODUCTION PHASE CUMULATIVE ROWS PROCESSED 30bn 0bn 10bn 20bn Apr 2017 Jun 2017 Oct 2017 Jan 2018 Mar 2018 Jul 2018 Jan 2017 Big Data group was created Aug 2017 Welding Machines POC 29th Jan 2018 Databricks was adopted Apr 2018 MARC 1.0 released Jun 2018 The SMT project Aug 2018 The Metalizers project Nov 2017 SMT POC Sep 2017 Telemetry POC Feb 2017 First POC approved, data loaded from USB
  • 5. #SAISEnt4 The Surface-Mount Technology (SMT) project 5 Surface-Mount Technology PCB Preparation Assembly Line Pre Production & Assembly Line
  • 6. #SAISEnt4 The Surface-Mount Technology (SMT) project 6 Surface-Mount Technology PCB Preparation Assembly Line Pre Production & Assembly Line Timestamp PCB ID 1 file per Item Timestamp Temperature Humidity 1 file per day Timestamp PCB ID Soldering Paste Sensor Data Temperature … 1 file per Item NIP Pick Up Info Feeder Info DB SQL NIP Images Sensor Data Anomalies MDB NIP Images Sensor Data PCB Final Status DB SQL Lasermarker Serigraphy Automated Optical Inspection Post Printing Pick and Place Oven Automated Optical Inspection Post Reflow
  • 7. #SAISEnt4 The Surface-Mount Technology (SMT) project 7 Surface-Mount Technology PCB Preparation Assembly Line Pre Production & Assembly Line Timestamp PCB ID 1 file per Item Timestamp Temperature Humidity 1 file per day Timestamp PCB ID Soldering Paste Sensor Data Temperature … 1 file per Item NIP Pick Up Info Feeder Info DB SQL NIP Images Sensor Data Anomalies MDB NIP Images Sensor Data PCB Final Status DB SQL Lasermarker Serigraphy Automated Optical Inspection Post Printing Pick and Place Oven Automated Optical Inspection Post Reflow Machine Learning Problems 1. Machine Status Monitoring 2. Bottleneck harmonic model 3. Anomaly Recommender Engine
  • 8. #SAISEnt4 A Dream Project 8 Production process is well known1 1 Data source is clearly defined2 2 Need is raised by plant people3 3 Algorithmic challenges are clear4 4 #SAISEnt4
  • 9. #SAISEnt4 ? Becoming a Nightmare 9 Success!!! • So… Where is the data? • How can I read/access the data? • How can I be supported by my data scientist colleagues? • How can I attach a Spark cluster to my Jupyter notebook? • Who is going to port the notebook to production? • What do you mean with production? Production process is well known1 Data source is clearly defined2 Need is raised by plant people3 Algorithmic challenges are clear4
  • 10. #SAISEnt4 Acquire Store, Transform, Enrich, Organize & Aggregate Data Integration, Transform, Aggregate Data Sources Structured Data 10 Enterprise Architecture Logical Data Warehouse Data Marts Traditional Enterprise Data Warehouse Distributed Process Other/ HadoopNoSQL Analyze & Deliver Insight Data Services Market- place or Datasets Self-Service Data Preparation / Data Access Layer Analytic Capabilities Analyze Optimize Forecast Report Plan Discover Collaborate Predict Model Advanced Analytics AudioVideo IoT Feeds Streaming Unstructured/Semi- structured Data ImageIT Log SAP Operational Systems Text Doc External IOT MES Stream Ingestion Real Time Batch/Micro- batch/CDC Staging, At Rest, CDC Streaming, In Motion External Doc Text Other Sources Master Data Management Data Quality MDM DQ SQL Data Lake (Curated, Enriched & Transformed Data) DataLake(RawData) SQL Machine Learning layer Data Science Environment The Technology Bazar ARCHITECTURE != LIST OF TECHNOLOGIES AND FANCY ARROWS
  • 11. #SAISEnt4 11 Keep it simple, stupid PRESENTATIONEXPLORATION PRODUCTIONINGESTION AND STORAGE Hammer Gateway HG Job 1 HG Job 2 HG Job n DATA LAKE Azure Data Lake Store DATA EXPLORATION Notebooks Datamart ARCHITECTURAL OBJECTS MESSAGE QUEUE FORWARDER BUSINESS LOGICS Dbutils+Spark APP MCO mco.read mco.write AZURE FUNCTIONS mco.log
  • 12. #SAISEnt4 12 Enterprise Architecture – Data, where are you? Hammer Gateway HG Job 1 HG Job 2 HG Job n Pick & Place DATA EXPLORATION Noteb ooks DATAMART APP ENGINE M ES S A G E Q U E U E F OR W A R D E RMCO mco.read mco.write mco.run I ON S BUSINESS LOGICS Dbutils +Spark HammerGateway HG Job1 HG Job2 HG Jobn DATALAKE Dat a Lak e Sto re • Where are the csv? • What do you mean with bcp out? • How can I get a copy in the cloud? • How can I update data on a regular basis? ? Write from scratch a sort of ETL tool Production ready tentative (10% of times) USB loading Quick and Dirty (90% of times) DATA LAKE Azure Data Lake Store
  • 13. #SAISEnt4 Enterprise Architecture – The Jupyter case • How could I work together with other data scientists? • How can I deal with computation spikes? • How can I attach an Apache Spark cluster to my Jupyter? • Damn, Java Heap Memory Exception: what do you mean? Hammer Gateway HG Job 1 HG Job 2 HG Job n Pick & Place HammerGateway HG Job1 HG Job2 HG Jobn DATALAKE Dat a Lak e Sto re DATAMART APP ENGINE M ES S A G E Q U E U E F OR W A R D E RMCO mco.read mco.write mco.run BUSINESS LOGICS Dbutils +Spark DATA EXPLORATION Noteb ooks DATA EXPLORATION Notebooks MCO mco.read mco.write AZURE FUNCTIONS mco.log DATA LAKE Azure Data Lake Store
  • 14. #SAISEnt4 One singleton to rule them all mco Pattern: Singleton Use: bring tokens and technical access to notebook Benefit: • enhancing security • enabling access control • reducing vendor lock-in effect mco.read Pattern: Wrapping Use: take data from data lake knowing only data names Benefit: • no one will need to know where data are or how data are stored • incremental read capability out-of-the-box • reducing time to port code in production • reduced reading time • reduce the vendor lock-in effect (to propagate a new HDFS PAAS vendor on all services is a matter of hours) mco.log Pattern: Wrapping Use: bring developer grade logging capability Benefit: • reducing debug time • enabling process audits mco.write Pattern: Wrapping Use: save data everywhere Benefit: • no one will need to know where data must be put or how • avoid dangerous behavious such as writing on a SQL with a transformation action (connection pool, my beloved friend…) MCO mco.read mco.write AZURE FUNCTIONS mco.log
  • 15. #SAISEnt4 15 Enterprise Architecture – The model is ready! • Is the code production ready? • Who is going to port the notebook to production? • Developer algorithm is wrong: it produces different numbers… • Ok, I got it! I’ll need a crontab… but where? HammerGateway HG Job1 HG Job2 HG Job n DATALAKE Dat a Lak e Sto re DATAMART APP ENGINE M ES S A G E Q U E U E F OR W A R D E RMCO mco.read mco.write mco.run I O N S BUSINESS LOGICS Dbutils +Spark DATA EXPLORATION Noteb ooks Hammer Gateway HG Job 1 HG Job 2 HG Job n Pick & Place DATA EXPLORATION Notebooks BUSINESS LOGICS Dbutils+Spark MESSAGE QUEUE FORWARDER
  • 16. #SAISEnt4 16 A For-what? What the hell? MESSAGE QUEUE FORWARDER Clean SMT Data Cycle Time Anomaly Det Super Secret Service … DATAMART Clean SMT Data service is the first in the MESSAGE QUEUE and is sent to the FORWARDER. Status is set to WIP This service is forwarded to Databricks Databricks gets data through the MCO Once data is loaded, the Spark code starts the cleaning job Cleaned data is cached in the DATAMART through the MCO. Status is set to DONE TO-DO TO-DO TO-DO … WIPDONE WIPDONE APP BUSINESS LOGICS Dbutils+Spark
  • 17. #SAISEnt4 17 A For-what? What the hell? MESSAGE QUEUE FORWARDER Clean SMT Data Cycle Time Anomaly Det Super Secret Service … DATAMART TO-DO TO-DO TO-DO … WIPDONE WIPDONE APP WIP BUSINESS LOGICS Dbutils+Spark
  • 18. #SAISEnt4 18 A For-what? What the hell? MESSAGE QUEUE FORWARDER Clean SMT Data Cycle Time Anomaly Det Super Secret Service … DATAMART TO-DO TO-DO TO-DO … WIPDONE WIPDONE APP WIP X • What if some datasets could not be moved into the cloud? • How to deal with super secret business logic? FORWARDER as the main component for cloud hybridization! Data needed to run Super Secret Service cannot be moved outside Magneti Marelli servers BUSINESS LOGICS Dbutils+Spark
  • 19. #SAISEnt4 ON PREMISE 19 A For-what? What the hell? MESSAGE QUEUE FORWARDER Clean SMT Data Cycle Time Anomaly Det Super Secret Service … DATAMART TO-DO TO-DO TO-DO … WIPDONE WIPDONE APP WIP EDW Super Secret Service is forwarded to an on premise Apache Spark cluster Apache Spark cluster gets data through the MCO Outcome is persisted on an on premise SQL Server A custom web app allow user to see job output BUSINESS LOGICS Dbutils+Spark
  • 20. #SAISEnt4 20 A For-what? What the hell? – Predictive balancing Clean PS Data RAM Intensive JOB TO-DO TO-DO WIPDONE WIP X A first service is submitted by the forwarder A second service is submited before the first finished. The Cluster is busy witht the other computation The forwarder can submit the job to any application server, so it creates a new Databricks cluster and submit to it. Cluster creation is anticipated using predictive algorithms. MESSAGE QUEUE Clean SMT Data Cycle Time Anomaly Det Super Urgent Service … TO-DO TO-DO TO-DO … WIPDONE WIPDONE WIP 300 GB RAM (90%) 2 hours long 100 GB RAM 5 minutes long Slow Services Cluster Dbutils+Spark Fast Services Cluster Dbutils+Spark
  • 21. #SAISEnt4 21 Enterprise Architecture – Don’t mind about nerd stuff HammerGateway HG Job1 HG Job2 HG Job n DATALAKE Dat a Lak e Sto re DATAMART APP ENGINE M ES S A G E Q U E U E F OR W A R D E RMCO mco.read mco.write mco.run I O N S BUSINESS LOGICS Dbutils +Spark DATA EXPLORATION Noteb ooks Hammer Gateway HG Job 1 HG Job 2 HG Job n Pick & Place BUSINESS LOGICS Dbutils+Spark Data Scientists’ presentation concerns: • How do I write a web page? • Do I need to bootstrap? • MV-what? I thought Spring was just a season! • Single sign-on? What do you mean? MESSAGE QUEUE FORWARDER DATAMART DATA EXPLORATION Notebooks APP
  • 22. #SAISEnt4 NO-COMPLEXITY DATA INGESTION AND STORAGE 22 PRESENTATION DATA SCIENTIST TOYBOX PRODUCTION Enterprise Architecture (“…and in the darkness bind them”) 1 day to add a new source1 Architecture «embeds» the guideline2 Presentation layer is drag and drop3 Service queuing ensures enterprise and managed scalability 4 Data scientists do not waste time in boring activities 5 Pick & Place MESSAGE QUEUE FORWARDER Hammer Gateway HG Job 1 HG Job 2 HG Job n DATA EXPLORATION Notebooks Datamart BUSINESS LOGICS Dbutils+Spark APP MCO mco.read mco.write AZURE FUNCTIONS mco.log DATA LAKE Azure Data Lake Store
  • 23. #SAISEnt4 “I have done the deed. Did you hear a noise?” 23 “The Guide says there is an art to flying", said Ford, "or rather a knack. The knack lies in learning how to throw yourself at the ground and miss.” Production process is well known1 Data source is clearly defined2 Need is raised by plant people3 Algorithmic challenges are clear4
  • 24. #SAISEnt4 The Surface-Mount Technology (SMT) project 24 Surface-Mount Technology PCB Preparation Assembly Line Pre Production & Assembly Line Timestamp PCB ID 1 file per Item Timestamp Temperature Humidity 1 file per day Timestamp PCB ID Soldering Paste Sensor Data Temperature … 1 file per Item NIP Pick Up Info Feeder Info DB SQL NIP Images Sensor Data Anomalies MDB NIP Images Sensor Data PCB Final Status DB SQL Lasermarker Serigraphy Automated Optical Inspection Post Printing Pick and Place Oven Automated Optical Inspection Post Reflow Machine Learning Problems 1. Machine Status Monitoring 2. Bottleneck harmonic model 3. Anomaly Recommender Engine
  • 25. #SAISEnt4 Anomaly Recommender Engine 25 Use Case Support maintenance team to prioritize standard and extraordinary maintenance activities. Benefit Reducing machine stoppage losses per year per line. Down time reduction. Description A summary dashboard shows the health of each part of the line. A drill down with details is available.
  • 26. #SAISEnt4 Much ado about nothing… ? 26 Surface-Mount Technology PCB Preparation Assembly Line Pre Production & Assembly Line Break-even point reached after 8 months Cost per line reduced by 90% after the first one Return On Investment: 12X in 3 years • Databricks and Microsoft PowerBI allow a very cost effective first project • Hammer Gateway allows cost effective ingestion • MCO enabled Data Scientists to convert notebooks in services with a very very low effort • Microsoft Azure and Databricks ensure endless scalability