Software Realibility on the Big Data Era

IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
Software reliability on the Big Data ERA
with an Industry minded focus
Ángel Conde
aconde@ikerlan.es

IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
© 2021. IKERLAN. All rights reserved
About Me
2
@Neuw84
@IKERLANofficial
Ángel Conde Manjón
Data Analytics & Artificial Intelligence Team Lead @
Big Data
Artificial
Intelligence
Distributted
Systems Cloud

IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
BIG DATA “RELIABILITY” OR “FAILURE SURVIVAL”
3

IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
Distributed systems vs reliability
4
• Big Data equals to Distributed Processing System.
But……
“Can a distributed system be reliable?”
• Not really.
- Network Partitions.
- Node failure (Hardware, Software, etc).
- Clock Drift (related to consensus).
*google nowadays says otherwise….

IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
The starting paradigm shift
5
• HPC Clusters too expensive (and they fail too).
“How can we process in cheap & reliable way high amount of data? “
• makes it: MapReduce: Simplified Data Processing on Large Clusters (2004, J.
Dean).
• Open Source its implementation
is born.
The rest is history….

IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
The Map Reduce model
6
* Word Count is the Hello World in the Big Data Paradigm.

IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
All fits in memory
7
• Map Reduce is somehow “slow”, every step persisted to disk.
• Memory gets cheaper and cheaper….
• Let´s do in memory computing!
Spark: Cluster Computing with Working Sets. (M. Zaharia, 2010).

IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
Spark Lineage Model
8
• Everything is immutable.
• DATA is partitioned in replicated chunks (RDD).
• Before execution, a DAG is computed.
• DAG execution is checkpointed to failure tolerant storage.
• In case of node failure its recomputed from last checkpoint.

IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
Orchestrators
9
• An important piece.
• Abstract resources of the cluster (CPUs, GPUs, Memory).
“I want my Big Data process to run on: 200 CPUs, 512GB Ram”
• Coordinates all the works running in the cluster.
• Relaunch to other nodes in case of failure.
• As DBs they have consensus capabilities (e.g., for leadership elections).

IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
DISTRIBUTED DATABASES
10

IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
The CAP Theroem
11
* Pick two

IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
All about consensus
12
https://jvns.ca/blog/2016/11/19/a-critique-of-the-cap-theorem/

IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
The Rise of NoSQL
13
• The internet become what it is some years ago (aka Internet size problems).
• Lot of No-SQL solutions to solve internet scale problems.
o Key-Value
o Document
o Time
o Graph
• Remember, usually YOU do not have those problems.
• Avoid sharding, multi-master approaches.
• No ACID transaction support.

IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
A new approach
14
• Again, did it Spanner: Google's Globally-Distributed Database (C.
Corbettt, 2012)
• Complete control of the backbone network, being tolerant to failures.
• Atomic clocks global sync.
• Advanced Consensus protocols.

IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
The Open Source alternatives
15
*nowadays high rise of multimodal databases

IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
INDUSTRIAL INTERNET OF THINGS (INDUSTRY 4.0)
16

IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
The Industrial Internet of Things (IIoT)
17
• : investment is expected to top $60 trillion during the next 15 years.
• : could add $14.2T to the global economy by 2030.
• will touch 43% of the global economy by 2025.
• Gartner : 20 billion IoT things installed by 2024.

IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
Use cases & Key Benefits
18
+
efficiency
-
costs
• Supply-Demand matching and reduction of Time-to-market.
• Human resource optimization.
• Optimization of energy and raw material consumptions.
• Manufacturing asset optimization and OEE improvement.
• Quality Maximization.
• After sales service optimization.
• Environment health & security maximization.

IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
Key Issues in IIoT
19

IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
REAL TIME PROCESSING APPLIED TO IIOT
20

IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
Big Data & Real Time Processing
21
• A table can be seen as a snapshot of streaming data (e.g. unbounded table).
• Usually streaming aggregations requires windows.
• Results are processed at some point (e.g. window), we make a “snapshot table”.
• Those snapshots are usually stored in a tolerant failure storage system.
However…. How do we deal with late arriving data?

IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
Watermarking
22
Event
Time
Processing time
With 5 minutes triggers
12:00 12:05 12:10 12:15
11:55
12:00
12:05
12:10
12:15
• The (in)famous word count example.
5 minute watermark
(last seen event time – 5m)
11:58
(“hello”,1)
12:03
(“hello”,1)
12:08
(“hello”,1)
12:05
(“hello”,1) 12:03
(“hello”,1)
12:14
(“hello”,1)
Max event time
seen Word Count
Processing time = 12:00
“hello” 1
2
4
Event after the
watermark is not
written to the Sink

IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
HANDS ON DEMO
23

IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
Demo
24

IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
Overview
25
Digital Platform (PaaS)
MQTT - JSON
Filter and routing
Aggregates &
Raw data
Real time
processing
Cloud
UI

IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
IKERLAN
P.º José María Arizmendiarrieta, 2 - 20500 Arrasate-Mondragón
T. +34 943712400 F. +34 943796944
THANK YOU
https://github.com/Neuw84/ada_2021/
aconde@ikerlan.es
@neuw84

Software Realibility on the Big Data Era

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Software Realibility on the Big Data Era

Similar to Software Realibility on the Big Data Era (20)

More from Angel Conde Manjon

More from Angel Conde Manjon (7)

Recently uploaded

Recently uploaded (20)

Software Realibility on the Big Data Era

Editor's Notes