Road to Enterprise Architecture for Big Data Applications: Mixing Apache Spark with Singletons, Wrapping, and Facade with Andrea Condorelli

Magneti Marelli, ICT Innovation
Road to Enterprise Architecture for Big Data
Applications
Mixing Apache Spark with singletons, wrapping, facade
London, United Kingdom
#SAISEnt4
#SAISEnt4

Company Overview
Magneti Marelli is an international company committed to the design and production of hi-tech systems and
components for the automotive sector.
AUTOMOTIVE LIGHTING
(Headlamp, Rearlamp, Lighting and Body Electronics)
ELECTRONICS
(Instrument Clusters, Infotainment & Telematics)
SUSPENSION SYSTEMS AND SHOCK ABSORBERS
(Suspension Systems, Shock Absorbers and Dynamic Systems)
PLASTIC COMPONENTS AND MODULES
(Bumper, Dashboard, Central Console, Pedals, Hand Brake Levers and Fuel System)
AFTERMARKET PARTS & SERVICES
(Mechanical, Body Work, Electrics and Electronic and Consumables)
EXHAUST SYSTEMS
(Manifolds, Catalytic converter, Diesel Particulate Filter and Mufflers)
POWERTRAIN
(Gasoline and Diesel engine control, Electric Motor, Inverter and Transmission)
Corporate Presentation 2October 6, 2018
MOTORSPORT
(Injection Systems, Electronic Control Units, Hybrid Systems, Telemetry Systems, Electric Actuators)

Magneti Marelli Worldwide Footprint
Corporate Presentation 3
PP - AC
PP - AC
PP - AC
USA PP – AC
MEXICO
BRASIL
ARGENTINA
GERMANY
POLAND
CZECH REP.
SLOVAKIA
RUSSIA
SERBIA
TURKEY
PP - R&D – ACCHINA
JAPAN
KOREA
MALAYSIA
INDIAITALY
SPAIN
PP - AC
PP – R&D - AC
PP
PP - AC
PP – R&D – AC PP – R&D - AC
PP - AC
PP - R&D – AC
PP - R&D – AC
PP
PP - AC
AC
PP
October 6, 2018
PP: Production Plant R&D: R&D Center AC: Application Center
FRANCE
ROMANIA R&D - AC

#SAISEnt4
Big Data storyline
4
BUSINESS
EXPLORATION
PHASE
PROOF OF CONCEPT
PHASE
PRODUCTION
PHASE
CUMULATIVE ROWS PROCESSED
30bn
0bn
10bn
20bn
Apr 2017 Jun 2017 Oct 2017 Jan 2018 Mar 2018 Jul 2018
Jan 2017
Big Data
group was
created
Aug 2017
Welding
Machines
POC
29th Jan 2018
Databricks
was adopted
Apr 2018
MARC 1.0
released
Jun 2018
The SMT
project
Aug 2018
The
Metalizers
project
Nov 2017
SMT POC
Sep 2017
Telemetry
POC
Feb 2017
First POC
approved,
data loaded
from USB

#SAISEnt4
The Surface-Mount Technology (SMT) project
5
Surface-Mount Technology PCB Preparation Assembly Line
Pre Production & Assembly Line

#SAISEnt4
6
Timestamp
PCB ID
1 file per Item
Timestamp
Temperature
Humidity
1 file per day
Timestamp
PCB ID
Soldering Paste
Sensor Data
Temperature
…
1 file per Item
NIP
Pick Up Info
Feeder Info
DB SQL
NIP
Images
Sensor Data
Anomalies
MDB
NIP
Images
Sensor Data
PCB Final Status
DB SQL
Lasermarker Serigraphy
Automated Optical
Inspection
Post Printing
Pick and Place Oven
Automated Optical
Inspection
Post Reflow

#SAISEnt4
7
Timestamp
PCB ID
1 file per Item
Timestamp
Temperature
Humidity
1 file per day
Timestamp
PCB ID
Soldering Paste
Sensor Data
Temperature
…
1 file per Item
NIP
Pick Up Info
Feeder Info
DB SQL
NIP
Images
Sensor Data
Anomalies
MDB
NIP
Images
Sensor Data
PCB Final Status
DB SQL
Automated Optical
Inspection
Post Printing
Pick and Place Oven
Automated Optical
Inspection
Post Reflow
Machine Learning
Problems
1. Machine Status
Monitoring
2. Bottleneck
harmonic model
3. Anomaly
Recommender
Engine

#SAISEnt4
A Dream Project
8
Production process is well known1
1
Data source is clearly defined2
2
Need is raised by plant people3
3
Algorithmic challenges are clear4
4
#SAISEnt4

#SAISEnt4
?
Becoming a Nightmare
9
Success!!!
• So… Where is the data?
• How can I read/access the data?
• How can I be supported by my data
scientist colleagues?
• How can I attach a Spark cluster to my
Jupyter notebook?
• Who is going to port the notebook to
production?
• What do you mean with production?

#SAISEnt4
Acquire Store, Transform, Enrich, Organize & Aggregate
Data Integration, Transform, Aggregate
Data Sources
Structured Data
10
Enterprise Architecture
Logical Data Warehouse
Data Marts
Traditional Enterprise Data Warehouse
Distributed Process
Other/
HadoopNoSQL
Analyze & Deliver Insight
Data
Services
Market-
place or
Datasets
Self-Service
Data
Preparation
/
Data
Access
Layer
Analytic
Capabilities
Analyze
Optimize
Forecast
Report
Plan
Discover
Collaborate
Predict
Model
Advanced
Analytics
AudioVideo
IoT Feeds
Streaming
Unstructured/Semi-
structured Data
ImageIT Log
SAP
Operational Systems
Text Doc External
IOT
MES
Stream Ingestion
Real Time
Batch/Micro-
batch/CDC
Staging, At Rest,
CDC
Streaming, In Motion
External Doc
Text
Other Sources
Master Data Management
Data Quality
MDM DQ
SQL
Data Lake (Curated, Enriched & Transformed Data)
DataLake(RawData)
SQL
Machine Learning layer
Data Science
Environment
The Technology Bazar
ARCHITECTURE != LIST OF TECHNOLOGIES AND FANCY ARROWS

#SAISEnt4 11
Keep it simple, stupid
PRESENTATIONEXPLORATION PRODUCTIONINGESTION AND STORAGE
Hammer
Gateway
HG Job 1
HG Job 2
HG Job n
DATA LAKE
Azure Data Lake
Store
DATA
EXPLORATION
Notebooks
Datamart
ARCHITECTURAL OBJECTS
MESSAGE
QUEUE
FORWARDER
BUSINESS
LOGICS
Dbutils+Spark
APP
MCO
mco.read
mco.write
AZURE
FUNCTIONS
mco.log

#SAISEnt4 12
Enterprise Architecture – Data, where are you?
Hammer
Gateway
HG Job 1
HG Job 2
HG Job n
Pick & Place
DATA
EXPLORATION
Noteb
ooks DATAMART
APP
ENGINE
M
ES
S
A
G
E
Q
U
E
U
E
F
OR
W
A
R
D
E
RMCO
mco.read
mco.write
mco.run
I
ON
S
BUSINESS
LOGICS
Dbutils
+Spark
HammerGateway
HG
Job1
HG
Job2
HG
Jobn
DATALAKE
Dat
a
Lak
e
Sto
re
• Where are the csv?
• What do you mean with bcp out?
• How can I get a copy in the cloud?
• How can I update data on a regular basis?
?
Write
from
scratch
a sort
of ETL
tool
Production ready tentative
(10% of times)
USB
loading
Quick and Dirty (90%
of times)
DATA LAKE
Azure Data Lake
Store

#SAISEnt4
Enterprise Architecture – The Jupyter case
• How could I work together with other data
scientists?
• How can I deal with computation spikes?
• How can I attach an Apache Spark cluster to
my Jupyter?
• Damn, Java Heap Memory Exception: what
do you mean?
Hammer Gateway
HG Job 1
HG Job 2
HG Job n
Pick & Place
HammerGateway
HG
Job1
HG
Job2
HG
Jobn
DATALAKE
Dat
a
Lak
e
Sto
re
DATAMART
APP
ENGINE
M
ES
S
A
G
E
Q
U
E
U
E
F
OR
W
A
R
D
E
RMCO
mco.read
mco.write
mco.run
BUSINESS
LOGICS
Dbutils
+Spark
DATA
EXPLORATION
Noteb
ooks
DATA
EXPLORATION
Notebooks
MCO
mco.read
mco.write
AZURE
FUNCTIONS
mco.log
DATA LAKE
Azure Data Lake
Store

#SAISEnt4
One singleton to rule them all
mco
Pattern: Singleton
Use: bring tokens and
technical access to notebook
Benefit:
• enhancing security
• enabling access control
• reducing vendor lock-in
effect
mco.read
Pattern: Wrapping
Use: take data from data lake
knowing only data names
Benefit:
• no one will need to know
where data are or how
data are stored
• incremental read
capability out-of-the-box
• reducing time to port code
in production
• reduced reading time
• reduce the vendor lock-in
effect (to propagate a
new HDFS PAAS vendor
on all services is a matter
of hours)
mco.log
Pattern: Wrapping
Use: bring developer grade
logging capability
Benefit:
• reducing debug time
• enabling process audits
mco.write
Pattern: Wrapping
Use: save data everywhere
Benefit:
• no one will need to know
where data must be put
or how
• avoid dangerous
behavious such as writing
on a SQL with a
transformation action
(connection pool, my
beloved friend…)
MCO
mco.read
mco.write
AZURE
FUNCTIONS
mco.log

#SAISEnt4 15
Enterprise Architecture – The model is ready!
• Is the code production ready?
• Who is going to port the notebook to
production?
• Developer algorithm is wrong: it produces
different numbers…
• Ok, I got it! I’ll need a crontab… but where?
HammerGateway
HG
Job1
HG
Job2
HG
Job
n
DATALAKE
Dat
a
Lak
e
Sto
re
DATAMART
APP
ENGINE
M
ES
S
A
G
E
Q
U
E
U
E
F
OR
W
A
R
D
E
RMCO
mco.read
mco.write
mco.run
I
O
N
S
BUSINESS
LOGICS
Dbutils
+Spark
DATA
EXPLORATION
Noteb
ooks
Hammer Gateway
HG Job 1
HG Job 2
HG Job n
Pick &
Place
DATA
EXPLORATION
Notebooks
BUSINESS
LOGICS
Dbutils+Spark
MESSAGE
QUEUE
FORWARDER

#SAISEnt4 16
A For-what? What the hell?
MESSAGE
QUEUE
FORWARDER
Clean
SMT Data
Cycle
Time
Anomaly Det
Super
Secret
Service
…
DATAMART
Clean SMT Data service is the
first in the MESSAGE QUEUE
and is sent to the
FORWARDER. Status is set to
WIP
This service is forwarded to
Databricks
Databricks gets data through
the MCO
Once data is loaded, the Spark
code starts the cleaning job
Cleaned data is cached in the
DATAMART through the MCO.
Status is set to DONE
TO-DO
TO-DO
TO-DO
…
WIPDONE
WIPDONE
APP
BUSINESS
LOGICS
Dbutils+Spark

#SAISEnt4 17
MESSAGE
QUEUE
FORWARDER
Clean
SMT Data
Cycle
Time
Anomaly Det
Super
Secret
Service
…
DATAMART
TO-DO
TO-DO
TO-DO
…
WIPDONE
WIPDONE
APP
WIP
BUSINESS
LOGICS
Dbutils+Spark

#SAISEnt4 18
MESSAGE
QUEUE
FORWARDER
Clean
SMT Data
Cycle
Time
Anomaly Det
Super
Secret
Service
…
DATAMART
TO-DO
TO-DO
TO-DO
…
WIPDONE
WIPDONE
APP
WIP
X
• What if some datasets could not be moved into the cloud?
• How to deal with super secret business logic?
FORWARDER as the main component for cloud hybridization!
Data needed to run Super
Secret Service cannot be
moved outside Magneti Marelli
servers
BUSINESS
LOGICS
Dbutils+Spark

#SAISEnt4
ON PREMISE
19
MESSAGE
QUEUE
FORWARDER
Clean
SMT Data
Cycle
Time
Anomaly Det
Super
Secret
Service
…
DATAMART
TO-DO
TO-DO
TO-DO
…
WIPDONE
WIPDONE
APP
WIP
EDW
Super Secret Service is
forwarded to an on premise
Apache Spark cluster
Apache Spark cluster gets data
through the MCO
Outcome is persisted on an on
premise SQL Server
A custom web app allow user to
see job output
BUSINESS
LOGICS
Dbutils+Spark

#SAISEnt4 20
A For-what? What the hell? – Predictive balancing
Clean
PS Data
RAM
Intensive
JOB
TO-DO
TO-DO
WIPDONE
WIP
X
A first service is submitted by
the forwarder
A second service is submited
before the first finished. The
Cluster is busy witht the other
computation
The forwarder can submit the
job to any application server, so
it creates a new Databricks
cluster and submit to it. Cluster
creation is anticipated using
predictive algorithms.
MESSAGE
QUEUE
Clean
SMT Data
Cycle
Time
Anomaly Det
Super
Urgent
Service
…
TO-DO
TO-DO
TO-DO
…
WIPDONE
WIPDONE
WIP
300 GB RAM (90%)
2 hours long
100 GB RAM
5 minutes long
Slow Services
Cluster
Dbutils+Spark
Fast Services
Cluster
Dbutils+Spark

#SAISEnt4 21
Enterprise Architecture – Don’t mind about nerd stuff
HammerGateway
HG
Job1
HG
Job2
HG
Job
n
DATALAKE
Dat
a
Lak
e
Sto
re
DATAMART
APP
ENGINE
M
ES
S
A
G
E
Q
U
E
U
E
F
OR
W
A
R
D
E
RMCO
mco.read
mco.write
mco.run
I
O
N
S
BUSINESS
LOGICS
Dbutils
+Spark
DATA
EXPLORATION
Noteb
ooks
Hammer Gateway
HG Job 1
HG Job 2
HG Job n
Pick &
Place
BUSINESS
LOGICS
Dbutils+Spark
Data Scientists’ presentation concerns:
• How do I write a web page?
• Do I need to bootstrap?
• MV-what? I thought Spring was just a
season!
• Single sign-on? What do you mean? MESSAGE
QUEUE
FORWARDER
DATAMART
DATA
EXPLORATION
Notebooks
APP

#SAISEnt4
NO-COMPLEXITY
DATA INGESTION AND STORAGE
22
PRESENTATION
DATA
SCIENTIST
TOYBOX
PRODUCTION
Enterprise Architecture (“…and in the darkness bind them”)
1 day to add a new source1
Architecture «embeds» the guideline2
Presentation layer is drag and drop3
Service queuing ensures enterprise
and managed scalability
4
Data scientists do not waste time in
boring activities
5
Pick & Place
MESSAGE
QUEUE
FORWARDER
Hammer
Gateway
HG Job 1
HG Job 2
HG Job n
DATA
EXPLORATION
Notebooks
Datamart
BUSINESS
LOGICS
Dbutils+Spark
APP
MCO
mco.read
mco.write
AZURE
FUNCTIONS
mco.log
DATA LAKE
Azure Data Lake
Store

#SAISEnt4
“I have done the deed. Did you hear a noise?”
23
“The Guide says there is an art to flying", said Ford, "or
rather a knack. The knack lies in learning how to throw
yourself at the ground and miss.”

#SAISEnt4
24
Timestamp
PCB ID
1 file per Item
Timestamp
Temperature
Humidity
1 file per day
Timestamp
PCB ID
Soldering Paste
Sensor Data
Temperature
…
1 file per Item
NIP
Pick Up Info
Feeder Info
DB SQL
NIP
Images
Sensor Data
Anomalies
MDB
NIP
Images
Sensor Data
PCB Final Status
DB SQL
Automated Optical
Inspection
Post Printing
Pick and Place Oven
Automated Optical
Inspection
Post Reflow
Machine Learning
Problems
1. Machine Status
Monitoring
2. Bottleneck
harmonic model
3. Anomaly
Recommender
Engine

#SAISEnt4
Anomaly Recommender Engine
25
Use Case
Support maintenance team to
prioritize standard and
extraordinary maintenance
activities.
Benefit
Reducing machine stoppage losses
per year per line.
Down time reduction.
Description
A summary dashboard shows the
health of each part of the line. A drill
down with details is available.

#SAISEnt4
Much ado about nothing… ?
26
Break-even point reached after
8 months
Cost per line reduced by 90%
after the first one
Return On Investment: 12X in 3
years
• Databricks and Microsoft PowerBI allow a very cost
effective first project
• Hammer Gateway allows cost effective ingestion
• MCO enabled Data Scientists to convert notebooks
in services with a very very low effort
• Microsoft Azure and Databricks ensure endless
scalability

#SAISEnt4
People behind
27
Andrea
CONDORELLI
Giovanni
FAZZI
Manuela
DETOMASO
Florindo
PALLADINO
THE TEAM
Alessandro
SICOLI
Dario
CASTELLO
Heinrich-Gerhard
SCHUERING
SUPPORTERS

Road to Enterprise Architecture for Big Data Applications: Mixing Apache Spark with Singletons, Wrapping, and Facade with Andrea Condorelli

Recommended

Recommended

More Related Content

Similar to Road to Enterprise Architecture for Big Data Applications: Mixing Apache Spark with Singletons, Wrapping, and Facade with Andrea Condorelli

Similar to Road to Enterprise Architecture for Big Data Applications: Mixing Apache Spark with Singletons, Wrapping, and Facade with Andrea Condorelli (20)

More from Databricks

More from Databricks (20)

Recently uploaded

Recently uploaded (20)

Road to Enterprise Architecture for Big Data Applications: Mixing Apache Spark with Singletons, Wrapping, and Facade with Andrea Condorelli