SlideShare a Scribd company logo
1 of 26
IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
Software reliability on the Big Data ERA
with an Industry minded focus
Ángel Conde
aconde@ikerlan.es
IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
© 2021. IKERLAN. All rights reserved
About Me
2
@Neuw84
@IKERLANofficial
Ángel Conde Manjón
Data Analytics & Artificial Intelligence Team Lead @
Big Data
Artificial
Intelligence
Distributted
Systems Cloud
IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
BIG DATA “RELIABILITY” OR “FAILURE SURVIVAL”
3
IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
© 2021. IKERLAN. All rights reserved
Distributed systems vs reliability
4
• Big Data equals to Distributed Processing System.
But……
“Can a distributed system be reliable?”
• Not really.
- Network Partitions.
- Node failure (Hardware, Software, etc).
- Clock Drift (related to consensus).
*google nowadays says otherwise….
IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
© 2021. IKERLAN. All rights reserved
The starting paradigm shift
5
• HPC Clusters too expensive (and they fail too).
“How can we process in cheap & reliable way high amount of data? “
• makes it: MapReduce: Simplified Data Processing on Large Clusters (2004, J.
Dean).
• Open Source its implementation
is born.
The rest is history….
IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
© 2021. IKERLAN. All rights reserved
The Map Reduce model
6
* Word Count is the Hello World in the Big Data Paradigm.
IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
© 2021. IKERLAN. All rights reserved
All fits in memory
7
• Map Reduce is somehow “slow”, every step persisted to disk.
• Memory gets cheaper and cheaper….
• Let´s do in memory computing!
Spark: Cluster Computing with Working Sets. (M. Zaharia, 2010).
IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
© 2021. IKERLAN. All rights reserved
Spark Lineage Model
8
• Everything is immutable.
• DATA is partitioned in replicated chunks (RDD).
• Before execution, a DAG is computed.
• DAG execution is checkpointed to failure tolerant storage.
• In case of node failure its recomputed from last checkpoint.
IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
© 2021. IKERLAN. All rights reserved
Orchestrators
9
• An important piece.
• Abstract resources of the cluster (CPUs, GPUs, Memory).
“I want my Big Data process to run on: 200 CPUs, 512GB Ram”
• Coordinates all the works running in the cluster.
• Relaunch to other nodes in case of failure.
• As DBs they have consensus capabilities (e.g., for leadership elections).
IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
DISTRIBUTED DATABASES
10
IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
© 2021. IKERLAN. All rights reserved
The CAP Theroem
11
* Pick two
IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
© 2021. IKERLAN. All rights reserved
All about consensus
12
https://jvns.ca/blog/2016/11/19/a-critique-of-the-cap-theorem/
IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
© 2021. IKERLAN. All rights reserved
The Rise of NoSQL
13
• The internet become what it is some years ago (aka Internet size problems).
• Lot of No-SQL solutions to solve internet scale problems.
o Key-Value
o Document
o Time
o Graph
• Remember, usually YOU do not have those problems.
• Avoid sharding, multi-master approaches.
• No ACID transaction support.
IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
© 2021. IKERLAN. All rights reserved
A new approach
14
• Again, did it Spanner: Google's Globally-Distributed Database (C.
Corbettt, 2012)
• Complete control of the backbone network, being tolerant to failures.
• Atomic clocks global sync.
• Advanced Consensus protocols.
IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
© 2021. IKERLAN. All rights reserved
The Open Source alternatives
15
*nowadays high rise of multimodal databases
IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
INDUSTRIAL INTERNET OF THINGS (INDUSTRY 4.0)
16
IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
© 2021. IKERLAN. All rights reserved
The Industrial Internet of Things (IIoT)
17
• : investment is expected to top $60 trillion during the next 15 years.
• : could add $14.2T to the global economy by 2030.
• will touch 43% of the global economy by 2025.
• Gartner : 20 billion IoT things installed by 2024.
IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
© 2021. IKERLAN. All rights reserved
Use cases & Key Benefits
18
+
efficiency
-
costs
• Supply-Demand matching and reduction of Time-to-market.
• Human resource optimization.
• Optimization of energy and raw material consumptions.
• Manufacturing asset optimization and OEE improvement.
• Quality Maximization.
• After sales service optimization.
• Environment health & security maximization.
IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
© 2021. IKERLAN. All rights reserved
Key Issues in IIoT
19
IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
REAL TIME PROCESSING APPLIED TO IIOT
20
IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
© 2021. IKERLAN. All rights reserved
Big Data & Real Time Processing
21
• A table can be seen as a snapshot of streaming data (e.g. unbounded table).
• Usually streaming aggregations requires windows.
• Results are processed at some point (e.g. window), we make a “snapshot table”.
• Those snapshots are usually stored in a tolerant failure storage system.
However…. How do we deal with late arriving data?
IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
© 2021. IKERLAN. All rights reserved
Watermarking
22
Event
Time
Processing time
With 5 minutes triggers
12:00 12:05 12:10 12:15
11:55
12:00
12:05
12:10
12:15
• The (in)famous word count example.
5 minute watermark
(last seen event time – 5m)
11:58
(“hello”,1)
12:03
(“hello”,1)
12:08
(“hello”,1)
12:05
(“hello”,1) 12:03
(“hello”,1)
12:14
(“hello”,1)
Max event time
seen Word Count
Processing time = 12:00
Processing time = 12:05
Processing time = 12:10
Processing time = 12:15
“hello” 1
2
4
Event after the
watermark is not
written to the Sink
IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
HANDS ON DEMO
23
IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
© 2021. IKERLAN. All rights reserved
Demo
24
IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
© 2021. IKERLAN. All rights reserved
Overview
25
Digital Platform (PaaS)
MQTT - JSON
Filter and routing
Aggregates &
Raw data
Real time
processing
Cloud
UI
IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
IKERLAN
P.º José María Arizmendiarrieta, 2 - 20500 Arrasate-Mondragón
T. +34 943712400 F. +34 943796944
THANK YOU
https://github.com/Neuw84/ada_2021/
aconde@ikerlan.es
@neuw84

More Related Content

What's hot

Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18Cloudera, Inc.
 
Cut Complexity, Cut Costs
Cut Complexity, Cut CostsCut Complexity, Cut Costs
Cut Complexity, Cut Costskelly chen
 
Server Virtualization
Server VirtualizationServer Virtualization
Server Virtualizationwebhostingguy
 
Cloudera - Enabling the IoT Revolution Driving Insights in a Connected World
Cloudera - Enabling the IoT Revolution Driving Insights in a Connected WorldCloudera - Enabling the IoT Revolution Driving Insights in a Connected World
Cloudera - Enabling the IoT Revolution Driving Insights in a Connected Worldandreas kuncoro
 
The 6th Wave of Automation: Automation of Decisions | Cloudera Analytics & Ma...
The 6th Wave of Automation: Automation of Decisions | Cloudera Analytics & Ma...The 6th Wave of Automation: Automation of Decisions | Cloudera Analytics & Ma...
The 6th Wave of Automation: Automation of Decisions | Cloudera Analytics & Ma...Cloudera, Inc.
 
The Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine LearningThe Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine LearningCloudera, Inc.
 
Cloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for AnalyticsCloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for AnalyticsCloudera, Inc.
 
VMworld vBrownBag vmtn5534e - placement of iot workload operations within a c...
VMworld vBrownBag vmtn5534e - placement of iot workload operations within a c...VMworld vBrownBag vmtn5534e - placement of iot workload operations within a c...
VMworld vBrownBag vmtn5534e - placement of iot workload operations within a c...Kenneth Moore
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Cloudera - IoT & Smart Cities
Cloudera - IoT & Smart CitiesCloudera - IoT & Smart Cities
Cloudera - IoT & Smart CitiesCloudera, Inc.
 
Database administrators (dbas) face increasing pressure to monitor databases
Database administrators (dbas) face increasing pressure to monitor databasesDatabase administrators (dbas) face increasing pressure to monitor databases
Database administrators (dbas) face increasing pressure to monitor databasesIDERA Software
 
Gregory Touretsky - Intel IT- Open Cloud Journey
Gregory Touretsky - Intel IT- Open Cloud JourneyGregory Touretsky - Intel IT- Open Cloud Journey
Gregory Touretsky - Intel IT- Open Cloud JourneyCloud Native Day Tel Aviv
 
How the Italian Market is Embracing Alternatives to Relational Databases
How the Italian Market is Embracing Alternatives to Relational DatabasesHow the Italian Market is Embracing Alternatives to Relational Databases
How the Italian Market is Embracing Alternatives to Relational DatabasesSam_Francis
 
ML-Based Data-Driven Software Development with InfluxDB 2.0
ML-Based Data-Driven Software Development with InfluxDB 2.0ML-Based Data-Driven Software Development with InfluxDB 2.0
ML-Based Data-Driven Software Development with InfluxDB 2.0InfluxData
 
Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18Cloudera, Inc.
 
BLD() Tech Conference — Data exploration with KSQL
BLD() Tech Conference — Data exploration with KSQLBLD() Tech Conference — Data exploration with KSQL
BLD() Tech Conference — Data exploration with KSQLGillis J. de Nijs
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Optimizing workload deployments to accelerate business outcomes
Optimizing workload deployments to accelerate business outcomes Optimizing workload deployments to accelerate business outcomes
Optimizing workload deployments to accelerate business outcomes Dell World
 

What's hot (20)

Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18
 
Cut Complexity, Cut Costs
Cut Complexity, Cut CostsCut Complexity, Cut Costs
Cut Complexity, Cut Costs
 
Server Virtualization
Server VirtualizationServer Virtualization
Server Virtualization
 
Cloudera - Enabling the IoT Revolution Driving Insights in a Connected World
Cloudera - Enabling the IoT Revolution Driving Insights in a Connected WorldCloudera - Enabling the IoT Revolution Driving Insights in a Connected World
Cloudera - Enabling the IoT Revolution Driving Insights in a Connected World
 
Cloudera SDX
Cloudera SDXCloudera SDX
Cloudera SDX
 
The 6th Wave of Automation: Automation of Decisions | Cloudera Analytics & Ma...
The 6th Wave of Automation: Automation of Decisions | Cloudera Analytics & Ma...The 6th Wave of Automation: Automation of Decisions | Cloudera Analytics & Ma...
The 6th Wave of Automation: Automation of Decisions | Cloudera Analytics & Ma...
 
The Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine LearningThe Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine Learning
 
Cloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for AnalyticsCloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for Analytics
 
VMworld vBrownBag vmtn5534e - placement of iot workload operations within a c...
VMworld vBrownBag vmtn5534e - placement of iot workload operations within a c...VMworld vBrownBag vmtn5534e - placement of iot workload operations within a c...
VMworld vBrownBag vmtn5534e - placement of iot workload operations within a c...
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Cloudera - IoT & Smart Cities
Cloudera - IoT & Smart CitiesCloudera - IoT & Smart Cities
Cloudera - IoT & Smart Cities
 
Database administrators (dbas) face increasing pressure to monitor databases
Database administrators (dbas) face increasing pressure to monitor databasesDatabase administrators (dbas) face increasing pressure to monitor databases
Database administrators (dbas) face increasing pressure to monitor databases
 
Gregory Touretsky - Intel IT- Open Cloud Journey
Gregory Touretsky - Intel IT- Open Cloud JourneyGregory Touretsky - Intel IT- Open Cloud Journey
Gregory Touretsky - Intel IT- Open Cloud Journey
 
How the Italian Market is Embracing Alternatives to Relational Databases
How the Italian Market is Embracing Alternatives to Relational DatabasesHow the Italian Market is Embracing Alternatives to Relational Databases
How the Italian Market is Embracing Alternatives to Relational Databases
 
ML-Based Data-Driven Software Development with InfluxDB 2.0
ML-Based Data-Driven Software Development with InfluxDB 2.0ML-Based Data-Driven Software Development with InfluxDB 2.0
ML-Based Data-Driven Software Development with InfluxDB 2.0
 
Case studies of the internet of things 062017
Case studies of the internet of things 062017Case studies of the internet of things 062017
Case studies of the internet of things 062017
 
Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18
 
BLD() Tech Conference — Data exploration with KSQL
BLD() Tech Conference — Data exploration with KSQLBLD() Tech Conference — Data exploration with KSQL
BLD() Tech Conference — Data exploration with KSQL
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Optimizing workload deployments to accelerate business outcomes
Optimizing workload deployments to accelerate business outcomes Optimizing workload deployments to accelerate business outcomes
Optimizing workload deployments to accelerate business outcomes
 

Similar to Software Realibility on the Big Data Era

Consumption Based On-Demand Private Cloud in a Box
Consumption Based On-Demand Private Cloud in a BoxConsumption Based On-Demand Private Cloud in a Box
Consumption Based On-Demand Private Cloud in a BoxRebekah Rodriguez
 
Aleksejs Nemirovskis - Manage your data using oracle BDA
Aleksejs Nemirovskis - Manage your data using oracle BDAAleksejs Nemirovskis - Manage your data using oracle BDA
Aleksejs Nemirovskis - Manage your data using oracle BDAAndrejs Vorobjovs
 
Hey IT, Meet OT with Hima Mukkamala
Hey IT, Meet OT with Hima MukkamalaHey IT, Meet OT with Hima Mukkamala
Hey IT, Meet OT with Hima Mukkamalagogo6
 
Industrial IoT and the emergence of Edge Computing Navigating the Technologic...
Industrial IoT and the emergence of Edge Computing Navigating the Technologic...Industrial IoT and the emergence of Edge Computing Navigating the Technologic...
Industrial IoT and the emergence of Edge Computing Navigating the Technologic...Roberto Siagri
 
Oracle Database 19c - poslední z rodiny 12.2 a co přináší nového
Oracle Database 19c - poslední z rodiny 12.2 a co přináší novéhoOracle Database 19c - poslední z rodiny 12.2 a co přináší nového
Oracle Database 19c - poslední z rodiny 12.2 a co přináší novéhoMarketingArrowECS_CZ
 
Green Plum IIIT- Allahabad
Green Plum IIIT- Allahabad Green Plum IIIT- Allahabad
Green Plum IIIT- Allahabad IIIT ALLAHABAD
 
Theresa Melvin, HP Enterprise - IOT/AI/ML at Hyperscale - how to go faster wi...
Theresa Melvin, HP Enterprise - IOT/AI/ML at Hyperscale - how to go faster wi...Theresa Melvin, HP Enterprise - IOT/AI/ML at Hyperscale - how to go faster wi...
Theresa Melvin, HP Enterprise - IOT/AI/ML at Hyperscale - how to go faster wi...Aerospike
 
Linaro connect 2018 keynote final updated
Linaro connect 2018 keynote final updatedLinaro connect 2018 keynote final updated
Linaro connect 2018 keynote final updatedDileep Bhandarkar
 
Cloud 2015: Connecting the Next Billion - Intel Keynote @ HP Discover 2011
Cloud 2015: Connecting the Next Billion - Intel Keynote @ HP Discover 2011Cloud 2015: Connecting the Next Billion - Intel Keynote @ HP Discover 2011
Cloud 2015: Connecting the Next Billion - Intel Keynote @ HP Discover 2011Intel IT Center
 
Power Quality in Internet Data Centers
Power Quality in Internet Data CentersPower Quality in Internet Data Centers
Power Quality in Internet Data CentersLeonardo ENERGY
 
MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1blewington
 
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...Spark Summit
 
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...Linaro
 
OpenPOWER/POWER9 Webinar from MIT and IBM
OpenPOWER/POWER9 Webinar from MIT and IBM OpenPOWER/POWER9 Webinar from MIT and IBM
OpenPOWER/POWER9 Webinar from MIT and IBM Ganesan Narayanasamy
 
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super Affordable
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super AffordableSupermicro AI Pod that’s Super Simple, Super Scalable, and Super Affordable
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super AffordableRebekah Rodriguez
 
Machine Learning and Artificial Intelligence
Machine Learning and Artificial IntelligenceMachine Learning and Artificial Intelligence
Machine Learning and Artificial IntelligenceMarketingArrowECS_CZ
 
Ci Physical Infrastructure Carousel
Ci Physical Infrastructure CarouselCi Physical Infrastructure Carousel
Ci Physical Infrastructure Carouselmkeaveney
 
2016 asl hitachi
2016 asl hitachi2016 asl hitachi
2016 asl hitachiElliot Duff
 

Similar to Software Realibility on the Big Data Era (20)

EXASXALE COMPUTING
EXASXALE COMPUTINGEXASXALE COMPUTING
EXASXALE COMPUTING
 
Consumption Based On-Demand Private Cloud in a Box
Consumption Based On-Demand Private Cloud in a BoxConsumption Based On-Demand Private Cloud in a Box
Consumption Based On-Demand Private Cloud in a Box
 
Aleksejs Nemirovskis - Manage your data using oracle BDA
Aleksejs Nemirovskis - Manage your data using oracle BDAAleksejs Nemirovskis - Manage your data using oracle BDA
Aleksejs Nemirovskis - Manage your data using oracle BDA
 
Hey IT, Meet OT with Hima Mukkamala
Hey IT, Meet OT with Hima MukkamalaHey IT, Meet OT with Hima Mukkamala
Hey IT, Meet OT with Hima Mukkamala
 
Industrial IoT and the emergence of Edge Computing Navigating the Technologic...
Industrial IoT and the emergence of Edge Computing Navigating the Technologic...Industrial IoT and the emergence of Edge Computing Navigating the Technologic...
Industrial IoT and the emergence of Edge Computing Navigating the Technologic...
 
Oracle Database 19c - poslední z rodiny 12.2 a co přináší nového
Oracle Database 19c - poslední z rodiny 12.2 a co přináší novéhoOracle Database 19c - poslední z rodiny 12.2 a co přináší nového
Oracle Database 19c - poslední z rodiny 12.2 a co přináší nového
 
Green Plum IIIT- Allahabad
Green Plum IIIT- Allahabad Green Plum IIIT- Allahabad
Green Plum IIIT- Allahabad
 
Theresa Melvin, HP Enterprise - IOT/AI/ML at Hyperscale - how to go faster wi...
Theresa Melvin, HP Enterprise - IOT/AI/ML at Hyperscale - how to go faster wi...Theresa Melvin, HP Enterprise - IOT/AI/ML at Hyperscale - how to go faster wi...
Theresa Melvin, HP Enterprise - IOT/AI/ML at Hyperscale - how to go faster wi...
 
Linaro connect 2018 keynote final updated
Linaro connect 2018 keynote final updatedLinaro connect 2018 keynote final updated
Linaro connect 2018 keynote final updated
 
Cloud 2015: Connecting the Next Billion - Intel Keynote @ HP Discover 2011
Cloud 2015: Connecting the Next Billion - Intel Keynote @ HP Discover 2011Cloud 2015: Connecting the Next Billion - Intel Keynote @ HP Discover 2011
Cloud 2015: Connecting the Next Billion - Intel Keynote @ HP Discover 2011
 
Power Quality in Internet Data Centers
Power Quality in Internet Data CentersPower Quality in Internet Data Centers
Power Quality in Internet Data Centers
 
MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1
 
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
 
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
 
Sgcp14phillips
Sgcp14phillipsSgcp14phillips
Sgcp14phillips
 
OpenPOWER/POWER9 Webinar from MIT and IBM
OpenPOWER/POWER9 Webinar from MIT and IBM OpenPOWER/POWER9 Webinar from MIT and IBM
OpenPOWER/POWER9 Webinar from MIT and IBM
 
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super Affordable
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super AffordableSupermicro AI Pod that’s Super Simple, Super Scalable, and Super Affordable
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super Affordable
 
Machine Learning and Artificial Intelligence
Machine Learning and Artificial IntelligenceMachine Learning and Artificial Intelligence
Machine Learning and Artificial Intelligence
 
Ci Physical Infrastructure Carousel
Ci Physical Infrastructure CarouselCi Physical Infrastructure Carousel
Ci Physical Infrastructure Carousel
 
2016 asl hitachi
2016 asl hitachi2016 asl hitachi
2016 asl hitachi
 

More from Angel Conde Manjon

Evolución hacia las plataformas de datos modernas, el Edge-to-cloud continuum
Evolución hacia las plataformas de datos modernas, el Edge-to-cloud continuumEvolución hacia las plataformas de datos modernas, el Edge-to-cloud continuum
Evolución hacia las plataformas de datos modernas, el Edge-to-cloud continuumAngel Conde Manjon
 
Continous Delivery and Continous Integration at IKERLAN
Continous Delivery and Continous Integration at IKERLANContinous Delivery and Continous Integration at IKERLAN
Continous Delivery and Continous Integration at IKERLANAngel Conde Manjon
 
Towards an Unified API for Spark and the IIoT
Towards an Unified API for Spark and the IIoTTowards an Unified API for Spark and the IIoT
Towards an Unified API for Spark and the IIoTAngel Conde Manjon
 
Solving the Industry 4.0. challenges on the logistics domain using Apache Mesos
Solving the Industry 4.0. challenges on the logistics domain using Apache MesosSolving the Industry 4.0. challenges on the logistics domain using Apache Mesos
Solving the Industry 4.0. challenges on the logistics domain using Apache MesosAngel Conde Manjon
 

More from Angel Conde Manjon (7)

Evolución hacia las plataformas de datos modernas, el Edge-to-cloud continuum
Evolución hacia las plataformas de datos modernas, el Edge-to-cloud continuumEvolución hacia las plataformas de datos modernas, el Edge-to-cloud continuum
Evolución hacia las plataformas de datos modernas, el Edge-to-cloud continuum
 
Continous Delivery and Continous Integration at IKERLAN
Continous Delivery and Continous Integration at IKERLANContinous Delivery and Continous Integration at IKERLAN
Continous Delivery and Continous Integration at IKERLAN
 
Towards an Unified API for Spark and the IIoT
Towards an Unified API for Spark and the IIoTTowards an Unified API for Spark and the IIoT
Towards an Unified API for Spark and the IIoT
 
Solving the Industry 4.0. challenges on the logistics domain using Apache Mesos
Solving the Industry 4.0. challenges on the logistics domain using Apache MesosSolving the Industry 4.0. challenges on the logistics domain using Apache Mesos
Solving the Industry 4.0. challenges on the logistics domain using Apache Mesos
 
Modern Java Development
Modern Java DevelopmentModern Java Development
Modern Java Development
 
Modern Software Development
Modern Software DevelopmentModern Software Development
Modern Software Development
 
Ph.D. Defense
Ph.D. Defense Ph.D. Defense
Ph.D. Defense
 

Recently uploaded

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 

Recently uploaded (20)

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 

Software Realibility on the Big Data Era

  • 1. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE Software reliability on the Big Data ERA with an Industry minded focus Ángel Conde aconde@ikerlan.es
  • 2. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2021. IKERLAN. All rights reserved About Me 2 @Neuw84 @IKERLANofficial Ángel Conde Manjón Data Analytics & Artificial Intelligence Team Lead @ Big Data Artificial Intelligence Distributted Systems Cloud
  • 3. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE BIG DATA “RELIABILITY” OR “FAILURE SURVIVAL” 3
  • 4. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2021. IKERLAN. All rights reserved Distributed systems vs reliability 4 • Big Data equals to Distributed Processing System. But…… “Can a distributed system be reliable?” • Not really. - Network Partitions. - Node failure (Hardware, Software, etc). - Clock Drift (related to consensus). *google nowadays says otherwise….
  • 5. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2021. IKERLAN. All rights reserved The starting paradigm shift 5 • HPC Clusters too expensive (and they fail too). “How can we process in cheap & reliable way high amount of data? “ • makes it: MapReduce: Simplified Data Processing on Large Clusters (2004, J. Dean). • Open Source its implementation is born. The rest is history….
  • 6. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2021. IKERLAN. All rights reserved The Map Reduce model 6 * Word Count is the Hello World in the Big Data Paradigm.
  • 7. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2021. IKERLAN. All rights reserved All fits in memory 7 • Map Reduce is somehow “slow”, every step persisted to disk. • Memory gets cheaper and cheaper…. • Let´s do in memory computing! Spark: Cluster Computing with Working Sets. (M. Zaharia, 2010).
  • 8. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2021. IKERLAN. All rights reserved Spark Lineage Model 8 • Everything is immutable. • DATA is partitioned in replicated chunks (RDD). • Before execution, a DAG is computed. • DAG execution is checkpointed to failure tolerant storage. • In case of node failure its recomputed from last checkpoint.
  • 9. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2021. IKERLAN. All rights reserved Orchestrators 9 • An important piece. • Abstract resources of the cluster (CPUs, GPUs, Memory). “I want my Big Data process to run on: 200 CPUs, 512GB Ram” • Coordinates all the works running in the cluster. • Relaunch to other nodes in case of failure. • As DBs they have consensus capabilities (e.g., for leadership elections).
  • 11. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2021. IKERLAN. All rights reserved The CAP Theroem 11 * Pick two
  • 12. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2021. IKERLAN. All rights reserved All about consensus 12 https://jvns.ca/blog/2016/11/19/a-critique-of-the-cap-theorem/
  • 13. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2021. IKERLAN. All rights reserved The Rise of NoSQL 13 • The internet become what it is some years ago (aka Internet size problems). • Lot of No-SQL solutions to solve internet scale problems. o Key-Value o Document o Time o Graph • Remember, usually YOU do not have those problems. • Avoid sharding, multi-master approaches. • No ACID transaction support.
  • 14. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2021. IKERLAN. All rights reserved A new approach 14 • Again, did it Spanner: Google's Globally-Distributed Database (C. Corbettt, 2012) • Complete control of the backbone network, being tolerant to failures. • Atomic clocks global sync. • Advanced Consensus protocols.
  • 15. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2021. IKERLAN. All rights reserved The Open Source alternatives 15 *nowadays high rise of multimodal databases
  • 16. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE INDUSTRIAL INTERNET OF THINGS (INDUSTRY 4.0) 16
  • 17. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2021. IKERLAN. All rights reserved The Industrial Internet of Things (IIoT) 17 • : investment is expected to top $60 trillion during the next 15 years. • : could add $14.2T to the global economy by 2030. • will touch 43% of the global economy by 2025. • Gartner : 20 billion IoT things installed by 2024.
  • 18. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2021. IKERLAN. All rights reserved Use cases & Key Benefits 18 + efficiency - costs • Supply-Demand matching and reduction of Time-to-market. • Human resource optimization. • Optimization of energy and raw material consumptions. • Manufacturing asset optimization and OEE improvement. • Quality Maximization. • After sales service optimization. • Environment health & security maximization.
  • 19. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2021. IKERLAN. All rights reserved Key Issues in IIoT 19
  • 20. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE REAL TIME PROCESSING APPLIED TO IIOT 20
  • 21. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2021. IKERLAN. All rights reserved Big Data & Real Time Processing 21 • A table can be seen as a snapshot of streaming data (e.g. unbounded table). • Usually streaming aggregations requires windows. • Results are processed at some point (e.g. window), we make a “snapshot table”. • Those snapshots are usually stored in a tolerant failure storage system. However…. How do we deal with late arriving data?
  • 22. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2021. IKERLAN. All rights reserved Watermarking 22 Event Time Processing time With 5 minutes triggers 12:00 12:05 12:10 12:15 11:55 12:00 12:05 12:10 12:15 • The (in)famous word count example. 5 minute watermark (last seen event time – 5m) 11:58 (“hello”,1) 12:03 (“hello”,1) 12:08 (“hello”,1) 12:05 (“hello”,1) 12:03 (“hello”,1) 12:14 (“hello”,1) Max event time seen Word Count Processing time = 12:00 Processing time = 12:05 Processing time = 12:10 Processing time = 12:15 “hello” 1 2 4 Event after the watermark is not written to the Sink
  • 24. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2021. IKERLAN. All rights reserved Demo 24
  • 25. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2021. IKERLAN. All rights reserved Overview 25 Digital Platform (PaaS) MQTT - JSON Filter and routing Aggregates & Raw data Real time processing Cloud UI
  • 26. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE IKERLAN P.º José María Arizmendiarrieta, 2 - 20500 Arrasate-Mondragón T. +34 943712400 F. +34 943796944 THANK YOU https://github.com/Neuw84/ada_2021/ aconde@ikerlan.es @neuw84

Editor's Notes

  1. Good afternoon to everybody, I´m Angel Conde from IKERLAN Technology Centre. The talk I´m presenting here is called Software Reliability on the Big Data ERA with an Industry minded focus
  2. Well I will give a brief introduction about me. I work leading the Data Analytics & Artificial Intelligence Team at Ikerlan. Ikerlan is a research centre member of the Basque Research & Technology Alliance. Those are some of the topics that I work on my day to day.
  3. Let’s start the talk with an introduction about how Big Data started with realibity in mind.
  4. The first thing that we need to be taken into account is that a Big Data system equals to a Distributed system. However, we should ask ourselves this question. Can a distributed system be reliable? Not really, we have all kind of failures. And that leaded to the famous 8 fallacies of Distributed computing.
  5. One can say that we have High Performance Computing clusters, but… they are too expensive to process the amount of data gathered by internet companies. Moreover, such systems fail too. Then… How can we process in cheap & reliable way high amount of data? Google, in, 2004 pubish a paper about an approach to processing data on large clusters. Some years later, Yahoo open sources its implementation and Hadoop is born… the rest is history.
  6. In the map reduce model we have usually some map steps chained with reduce steps. In this figure we can see the diagram for a word count. Word count is the hello world in the big data paradigm. A lot of use cases can be ported to this approach, more than you may think at first sight. We can see here that the network load on the shuffle steps seems to be important for the performance of approach. Moreover, for each step the intermediate results are stored on failure tolerant storage system
  7. Memory get cheaper and therefore the approach to do in memory computing is born. Berkeley publish a paper on one approach using this kind of paradigm an later on a lot of frameworks born using the in-memory paradigm.
  8. In spark, in order to be tolerant to failures. The first thing is that everything is inmmutable. The data is stored in a replicated way in memory. Before execution a DAG is computed trying to optimize the different steps of the computation. Moreover, the DAG steps are checkpointed as needed in order to be reliable. If no checkpoint exits, it recomputes the whole DAG. RDDs are immutable distributed collection of elements of your data that can be stored in memory or disk across a cluster of machines. The data is partitioned across machines in your cluster that can be operated in parallel with a low-level API that offers transformations and actions. RDDs are fault tolerant as they track data lineage information to rebuild lost data automatically on failure
  9. Next we are going to speak about the orchestrators. They are in charge of job scheduling, abstract cluster resources, etc. In case of node failure the try to reschedule the jobs into other nodes. As distributed databases the need consensus capabilities (e.g. who is the leader).
  10. Well, we are going to change our focus into distributed databases, those databases are distributed by nature and therefore we are going to make a brief introduction about their design.
  11. In the Distributed Databases is famous the CAP theorem, this theorem says that in a distributed system you can’t have three of those features. For example, you can have consistency and availability but not being tolerant to network partitions.
  12. This theorem seems to provide an easy reasoning about these systems. However, in some combinations,.. That does not mean very much.
  13. But how this trend started? The rise of the distributed databases was meant to solve internet size problems. There a lot of no-sql to solve internet case problems. Those, approaches provide multimaster capabilities, avoid sharding…. However, no ACID support (consistency) in the majority of the approaches. (*these can be solved by developers on client side) And the developers wanted it’s SQL back (e.g. CQL) and companies wanted ACID.
  14. Google changed the landscape again in 2012 with another paper. The thing is that you have a complete control of the backbone network. Having multiple physical paths that provide tolerant to fialures. They have in each datacenter an Atomic clock in order to have a global time sync protocol. And with advanced protocols….
  15. After the famous paper, again…. Some open source databases have already implemented some of the paper tricks.
  16. Well let’s move into the next point. Now I will introduce
  17. Let´s start with some numbers related to the Iot and the IIoT to show why this is IMPPORTant General Electric says that Iiot Investment is expected to top…. Accenture: predicts that iiot could add McKinsey estimates that will touch 43% of the global economy. About the number of things Gartner says that 20 billion things will by installed by 2020.
  18. Lets see some of the for the industry Well the benefits apply for benefits r the whole product live cycle, from its development to its end of life support. Eg. Supply demand matching and reduction time Human resource optimization Optimization of energy and raw material consumptions Manufacturing asset optimization Overall Equipment Effectiveness Quality maximization After sales …….. All of these concepts are closely related to the industry 4.0.
  19. Following let´s speak about real time processing of IIoT data. Late Data and Ordering: - We can have connectivity issues such as: wireless mobile telecommunications, low signal, etc. Protocols: - Most MQTT brokers do not implement Qos2!! - CoAP is UDP based no ordering!! wrong designed local acquisition systems Therefore, if we are doing real-time processing of IIoT data we need a tool that enables us to work easily on unordered incoming data and to build filters for duplicates easily
  20. Next I am goint to explain the concept of Event time & watermarking for late data. Watermark is a moving threshold in event-time that trails behind the maximum event-time seen by the query in the processed data
  21. Well in this demo we are using some of the Big Data open source tools: - For example: we are using Nifi(Naifai) for ingestion and routing - Kafka messaging and decoupling - Spark for real time processing - Cassandra as backend storage. - Zeppelin as our web interface - The open source broker MQTT called mosquitto.
  22. The architecture is the following: Fake Sensor Data from two machines is sent to a MQTT broker running on the cloud. This data contains machine status, temperature, etc. From there MQTT data is ingested via Nifi (naifai) and sent to two topics depending the machine status. Then we have the real time processing engine, Spark. This component makes possible to do real time analytics on incoming data and store the results on Cassandra. For the demo we will use Zeppelin as a way to interact with Spark and Cassandra providing a useful user interface for our analytics. This kind of architecture or digital platform can run on any cloud or on-premises.
  23. We have come to the end of the demo. I’d just like to thank (thenk) you for listening and let you know that all code of this demo is already on Github. Now I would be pleased to take your comments and questions.