SlideShare a Scribd company logo

Software Realibility on the Big Data Era

Keynote talk on International Conference on Reliable Software Technologies (AEiC 2021)

1 of 26
Download to read offline
IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
Software reliability on the Big Data ERA
with an Industry minded focus
Ángel Conde
aconde@ikerlan.es
IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
© 2021. IKERLAN. All rights reserved
About Me
2
@Neuw84
@IKERLANofficial
Ángel Conde Manjón
Data Analytics & Artificial Intelligence Team Lead @
Big Data
Artificial
Intelligence
Distributted
Systems Cloud
IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
BIG DATA “RELIABILITY” OR “FAILURE SURVIVAL”
3
IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
© 2021. IKERLAN. All rights reserved
Distributed systems vs reliability
4
• Big Data equals to Distributed Processing System.
But……
“Can a distributed system be reliable?”
• Not really.
- Network Partitions.
- Node failure (Hardware, Software, etc).
- Clock Drift (related to consensus).
*google nowadays says otherwise….
IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
© 2021. IKERLAN. All rights reserved
The starting paradigm shift
5
• HPC Clusters too expensive (and they fail too).
“How can we process in cheap & reliable way high amount of data? “
• makes it: MapReduce: Simplified Data Processing on Large Clusters (2004, J.
Dean).
• Open Source its implementation
is born.
The rest is history….
IKERLAN.
WHERE
TECHNOLOGY IS
AN ATTITUDE
© 2021. IKERLAN. All rights reserved
The Map Reduce model
6
* Word Count is the Hello World in the Big Data Paradigm.
Ad

Recommended

“The Five Rights of an Edge AI Computer Vision System: Right Data, Right Time...
“The Five Rights of an Edge AI Computer Vision System: Right Data, Right Time...“The Five Rights of an Edge AI Computer Vision System: Right Data, Right Time...
“The Five Rights of an Edge AI Computer Vision System: Right Data, Right Time...Edge AI and Vision Alliance
 
Momentum in Big Data, IoT and Machine Intelligence
Momentum in Big Data, IoT and Machine IntelligenceMomentum in Big Data, IoT and Machine Intelligence
Momentum in Big Data, IoT and Machine IntelligenceShamshad Ansari
 
Data Center Infrastructure Trends
Data Center Infrastructure TrendsData Center Infrastructure Trends
Data Center Infrastructure TrendsViridity Software
 
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...TheInevitableCloud
 
“The Data-Driven Engineering Revolution,” a Presentation from Edge Impulse
“The Data-Driven Engineering Revolution,” a Presentation from Edge Impulse“The Data-Driven Engineering Revolution,” a Presentation from Edge Impulse
“The Data-Driven Engineering Revolution,” a Presentation from Edge ImpulseEdge AI and Vision Alliance
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

More Related Content

What's hot

Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18Cloudera, Inc.
 
Cut Complexity, Cut Costs
Cut Complexity, Cut CostsCut Complexity, Cut Costs
Cut Complexity, Cut Costskelly chen
 
Server Virtualization
Server VirtualizationServer Virtualization
Server Virtualizationwebhostingguy
 
Cloudera - Enabling the IoT Revolution Driving Insights in a Connected World
Cloudera - Enabling the IoT Revolution Driving Insights in a Connected WorldCloudera - Enabling the IoT Revolution Driving Insights in a Connected World
Cloudera - Enabling the IoT Revolution Driving Insights in a Connected Worldandreas kuncoro
 
The 6th Wave of Automation: Automation of Decisions | Cloudera Analytics & Ma...
The 6th Wave of Automation: Automation of Decisions | Cloudera Analytics & Ma...The 6th Wave of Automation: Automation of Decisions | Cloudera Analytics & Ma...
The 6th Wave of Automation: Automation of Decisions | Cloudera Analytics & Ma...Cloudera, Inc.
 
The Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine LearningThe Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine LearningCloudera, Inc.
 
Cloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for AnalyticsCloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for AnalyticsCloudera, Inc.
 
VMworld vBrownBag vmtn5534e - placement of iot workload operations within a c...
VMworld vBrownBag vmtn5534e - placement of iot workload operations within a c...VMworld vBrownBag vmtn5534e - placement of iot workload operations within a c...
VMworld vBrownBag vmtn5534e - placement of iot workload operations within a c...Kenneth Moore
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Cloudera - IoT & Smart Cities
Cloudera - IoT & Smart CitiesCloudera - IoT & Smart Cities
Cloudera - IoT & Smart CitiesCloudera, Inc.
 
Database administrators (dbas) face increasing pressure to monitor databases
Database administrators (dbas) face increasing pressure to monitor databasesDatabase administrators (dbas) face increasing pressure to monitor databases
Database administrators (dbas) face increasing pressure to monitor databasesIDERA Software
 
Gregory Touretsky - Intel IT- Open Cloud Journey
Gregory Touretsky - Intel IT- Open Cloud JourneyGregory Touretsky - Intel IT- Open Cloud Journey
Gregory Touretsky - Intel IT- Open Cloud JourneyCloud Native Day Tel Aviv
 
How the Italian Market is Embracing Alternatives to Relational Databases
How the Italian Market is Embracing Alternatives to Relational DatabasesHow the Italian Market is Embracing Alternatives to Relational Databases
How the Italian Market is Embracing Alternatives to Relational DatabasesSam_Francis
 
ML-Based Data-Driven Software Development with InfluxDB 2.0
ML-Based Data-Driven Software Development with InfluxDB 2.0ML-Based Data-Driven Software Development with InfluxDB 2.0
ML-Based Data-Driven Software Development with InfluxDB 2.0InfluxData
 
Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18Cloudera, Inc.
 
BLD() Tech Conference — Data exploration with KSQL
BLD() Tech Conference — Data exploration with KSQLBLD() Tech Conference — Data exploration with KSQL
BLD() Tech Conference — Data exploration with KSQLGillis J. de Nijs
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Optimizing workload deployments to accelerate business outcomes
Optimizing workload deployments to accelerate business outcomes Optimizing workload deployments to accelerate business outcomes
Optimizing workload deployments to accelerate business outcomes Dell World
 

What's hot (20)

Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18
 
Cut Complexity, Cut Costs
Cut Complexity, Cut CostsCut Complexity, Cut Costs
Cut Complexity, Cut Costs
 
Server Virtualization
Server VirtualizationServer Virtualization
Server Virtualization
 
Cloudera - Enabling the IoT Revolution Driving Insights in a Connected World
Cloudera - Enabling the IoT Revolution Driving Insights in a Connected WorldCloudera - Enabling the IoT Revolution Driving Insights in a Connected World
Cloudera - Enabling the IoT Revolution Driving Insights in a Connected World
 
Cloudera SDX
Cloudera SDXCloudera SDX
Cloudera SDX
 
The 6th Wave of Automation: Automation of Decisions | Cloudera Analytics & Ma...
The 6th Wave of Automation: Automation of Decisions | Cloudera Analytics & Ma...The 6th Wave of Automation: Automation of Decisions | Cloudera Analytics & Ma...
The 6th Wave of Automation: Automation of Decisions | Cloudera Analytics & Ma...
 
The Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine LearningThe Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine Learning
 
Cloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for AnalyticsCloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for Analytics
 
VMworld vBrownBag vmtn5534e - placement of iot workload operations within a c...
VMworld vBrownBag vmtn5534e - placement of iot workload operations within a c...VMworld vBrownBag vmtn5534e - placement of iot workload operations within a c...
VMworld vBrownBag vmtn5534e - placement of iot workload operations within a c...
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Cloudera - IoT & Smart Cities
Cloudera - IoT & Smart CitiesCloudera - IoT & Smart Cities
Cloudera - IoT & Smart Cities
 
Database administrators (dbas) face increasing pressure to monitor databases
Database administrators (dbas) face increasing pressure to monitor databasesDatabase administrators (dbas) face increasing pressure to monitor databases
Database administrators (dbas) face increasing pressure to monitor databases
 
Gregory Touretsky - Intel IT- Open Cloud Journey
Gregory Touretsky - Intel IT- Open Cloud JourneyGregory Touretsky - Intel IT- Open Cloud Journey
Gregory Touretsky - Intel IT- Open Cloud Journey
 
How the Italian Market is Embracing Alternatives to Relational Databases
How the Italian Market is Embracing Alternatives to Relational DatabasesHow the Italian Market is Embracing Alternatives to Relational Databases
How the Italian Market is Embracing Alternatives to Relational Databases
 
ML-Based Data-Driven Software Development with InfluxDB 2.0
ML-Based Data-Driven Software Development with InfluxDB 2.0ML-Based Data-Driven Software Development with InfluxDB 2.0
ML-Based Data-Driven Software Development with InfluxDB 2.0
 
Case studies of the internet of things 062017
Case studies of the internet of things 062017Case studies of the internet of things 062017
Case studies of the internet of things 062017
 
Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18
 
BLD() Tech Conference — Data exploration with KSQL
BLD() Tech Conference — Data exploration with KSQLBLD() Tech Conference — Data exploration with KSQL
BLD() Tech Conference — Data exploration with KSQL
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Optimizing workload deployments to accelerate business outcomes
Optimizing workload deployments to accelerate business outcomes Optimizing workload deployments to accelerate business outcomes
Optimizing workload deployments to accelerate business outcomes
 

Similar to Software Realibility on the Big Data Era

Consumption Based On-Demand Private Cloud in a Box
Consumption Based On-Demand Private Cloud in a BoxConsumption Based On-Demand Private Cloud in a Box
Consumption Based On-Demand Private Cloud in a BoxRebekah Rodriguez
 
Aleksejs Nemirovskis - Manage your data using oracle BDA
Aleksejs Nemirovskis - Manage your data using oracle BDAAleksejs Nemirovskis - Manage your data using oracle BDA
Aleksejs Nemirovskis - Manage your data using oracle BDAAndrejs Vorobjovs
 
Hey IT, Meet OT with Hima Mukkamala
Hey IT, Meet OT with Hima MukkamalaHey IT, Meet OT with Hima Mukkamala
Hey IT, Meet OT with Hima Mukkamalagogo6
 
Industrial IoT and the emergence of Edge Computing Navigating the Technologic...
Industrial IoT and the emergence of Edge Computing Navigating the Technologic...Industrial IoT and the emergence of Edge Computing Navigating the Technologic...
Industrial IoT and the emergence of Edge Computing Navigating the Technologic...Roberto Siagri
 
Oracle Database 19c - poslední z rodiny 12.2 a co přináší nového
Oracle Database 19c - poslední z rodiny 12.2 a co přináší novéhoOracle Database 19c - poslední z rodiny 12.2 a co přináší nového
Oracle Database 19c - poslední z rodiny 12.2 a co přináší novéhoMarketingArrowECS_CZ
 
Green Plum IIIT- Allahabad
Green Plum IIIT- Allahabad Green Plum IIIT- Allahabad
Green Plum IIIT- Allahabad IIIT ALLAHABAD
 
Theresa Melvin, HP Enterprise - IOT/AI/ML at Hyperscale - how to go faster wi...
Theresa Melvin, HP Enterprise - IOT/AI/ML at Hyperscale - how to go faster wi...Theresa Melvin, HP Enterprise - IOT/AI/ML at Hyperscale - how to go faster wi...
Theresa Melvin, HP Enterprise - IOT/AI/ML at Hyperscale - how to go faster wi...Aerospike
 
Linaro connect 2018 keynote final updated
Linaro connect 2018 keynote final updatedLinaro connect 2018 keynote final updated
Linaro connect 2018 keynote final updatedDileep Bhandarkar
 
Cloud 2015: Connecting the Next Billion - Intel Keynote @ HP Discover 2011
Cloud 2015: Connecting the Next Billion - Intel Keynote @ HP Discover 2011Cloud 2015: Connecting the Next Billion - Intel Keynote @ HP Discover 2011
Cloud 2015: Connecting the Next Billion - Intel Keynote @ HP Discover 2011Intel IT Center
 
Power Quality in Internet Data Centers
Power Quality in Internet Data CentersPower Quality in Internet Data Centers
Power Quality in Internet Data CentersLeonardo ENERGY
 
MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1blewington
 
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...Spark Summit
 
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...Linaro
 
OpenPOWER/POWER9 Webinar from MIT and IBM
OpenPOWER/POWER9 Webinar from MIT and IBM OpenPOWER/POWER9 Webinar from MIT and IBM
OpenPOWER/POWER9 Webinar from MIT and IBM Ganesan Narayanasamy
 
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super Affordable
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super AffordableSupermicro AI Pod that’s Super Simple, Super Scalable, and Super Affordable
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super AffordableRebekah Rodriguez
 
Machine Learning and Artificial Intelligence
Machine Learning and Artificial IntelligenceMachine Learning and Artificial Intelligence
Machine Learning and Artificial IntelligenceMarketingArrowECS_CZ
 
Ci Physical Infrastructure Carousel
Ci Physical Infrastructure CarouselCi Physical Infrastructure Carousel
Ci Physical Infrastructure Carouselmkeaveney
 
2016 asl hitachi
2016 asl hitachi2016 asl hitachi
2016 asl hitachiElliot Duff
 

Similar to Software Realibility on the Big Data Era (20)

EXASXALE COMPUTING
EXASXALE COMPUTINGEXASXALE COMPUTING
EXASXALE COMPUTING
 
Consumption Based On-Demand Private Cloud in a Box
Consumption Based On-Demand Private Cloud in a BoxConsumption Based On-Demand Private Cloud in a Box
Consumption Based On-Demand Private Cloud in a Box
 
Aleksejs Nemirovskis - Manage your data using oracle BDA
Aleksejs Nemirovskis - Manage your data using oracle BDAAleksejs Nemirovskis - Manage your data using oracle BDA
Aleksejs Nemirovskis - Manage your data using oracle BDA
 
Hey IT, Meet OT with Hima Mukkamala
Hey IT, Meet OT with Hima MukkamalaHey IT, Meet OT with Hima Mukkamala
Hey IT, Meet OT with Hima Mukkamala
 
Industrial IoT and the emergence of Edge Computing Navigating the Technologic...
Industrial IoT and the emergence of Edge Computing Navigating the Technologic...Industrial IoT and the emergence of Edge Computing Navigating the Technologic...
Industrial IoT and the emergence of Edge Computing Navigating the Technologic...
 
Oracle Database 19c - poslední z rodiny 12.2 a co přináší nového
Oracle Database 19c - poslední z rodiny 12.2 a co přináší novéhoOracle Database 19c - poslední z rodiny 12.2 a co přináší nového
Oracle Database 19c - poslední z rodiny 12.2 a co přináší nového
 
Green Plum IIIT- Allahabad
Green Plum IIIT- Allahabad Green Plum IIIT- Allahabad
Green Plum IIIT- Allahabad
 
Theresa Melvin, HP Enterprise - IOT/AI/ML at Hyperscale - how to go faster wi...
Theresa Melvin, HP Enterprise - IOT/AI/ML at Hyperscale - how to go faster wi...Theresa Melvin, HP Enterprise - IOT/AI/ML at Hyperscale - how to go faster wi...
Theresa Melvin, HP Enterprise - IOT/AI/ML at Hyperscale - how to go faster wi...
 
Linaro connect 2018 keynote final updated
Linaro connect 2018 keynote final updatedLinaro connect 2018 keynote final updated
Linaro connect 2018 keynote final updated
 
Cloud 2015: Connecting the Next Billion - Intel Keynote @ HP Discover 2011
Cloud 2015: Connecting the Next Billion - Intel Keynote @ HP Discover 2011Cloud 2015: Connecting the Next Billion - Intel Keynote @ HP Discover 2011
Cloud 2015: Connecting the Next Billion - Intel Keynote @ HP Discover 2011
 
Power Quality in Internet Data Centers
Power Quality in Internet Data CentersPower Quality in Internet Data Centers
Power Quality in Internet Data Centers
 
MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1
 
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
 
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
 
Sgcp14phillips
Sgcp14phillipsSgcp14phillips
Sgcp14phillips
 
OpenPOWER/POWER9 Webinar from MIT and IBM
OpenPOWER/POWER9 Webinar from MIT and IBM OpenPOWER/POWER9 Webinar from MIT and IBM
OpenPOWER/POWER9 Webinar from MIT and IBM
 
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super Affordable
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super AffordableSupermicro AI Pod that’s Super Simple, Super Scalable, and Super Affordable
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super Affordable
 
Machine Learning and Artificial Intelligence
Machine Learning and Artificial IntelligenceMachine Learning and Artificial Intelligence
Machine Learning and Artificial Intelligence
 
Ci Physical Infrastructure Carousel
Ci Physical Infrastructure CarouselCi Physical Infrastructure Carousel
Ci Physical Infrastructure Carousel
 
2016 asl hitachi
2016 asl hitachi2016 asl hitachi
2016 asl hitachi
 

More from Angel Conde Manjon

Evolución hacia las plataformas de datos modernas, el Edge-to-cloud continuum
Evolución hacia las plataformas de datos modernas, el Edge-to-cloud continuumEvolución hacia las plataformas de datos modernas, el Edge-to-cloud continuum
Evolución hacia las plataformas de datos modernas, el Edge-to-cloud continuumAngel Conde Manjon
 
Continous Delivery and Continous Integration at IKERLAN
Continous Delivery and Continous Integration at IKERLANContinous Delivery and Continous Integration at IKERLAN
Continous Delivery and Continous Integration at IKERLANAngel Conde Manjon
 
Towards an Unified API for Spark and the IIoT
Towards an Unified API for Spark and the IIoTTowards an Unified API for Spark and the IIoT
Towards an Unified API for Spark and the IIoTAngel Conde Manjon
 
Solving the Industry 4.0. challenges on the logistics domain using Apache Mesos
Solving the Industry 4.0. challenges on the logistics domain using Apache MesosSolving the Industry 4.0. challenges on the logistics domain using Apache Mesos
Solving the Industry 4.0. challenges on the logistics domain using Apache MesosAngel Conde Manjon
 

More from Angel Conde Manjon (7)

Evolución hacia las plataformas de datos modernas, el Edge-to-cloud continuum
Evolución hacia las plataformas de datos modernas, el Edge-to-cloud continuumEvolución hacia las plataformas de datos modernas, el Edge-to-cloud continuum
Evolución hacia las plataformas de datos modernas, el Edge-to-cloud continuum
 
Continous Delivery and Continous Integration at IKERLAN
Continous Delivery and Continous Integration at IKERLANContinous Delivery and Continous Integration at IKERLAN
Continous Delivery and Continous Integration at IKERLAN
 
Towards an Unified API for Spark and the IIoT
Towards an Unified API for Spark and the IIoTTowards an Unified API for Spark and the IIoT
Towards an Unified API for Spark and the IIoT
 
Solving the Industry 4.0. challenges on the logistics domain using Apache Mesos
Solving the Industry 4.0. challenges on the logistics domain using Apache MesosSolving the Industry 4.0. challenges on the logistics domain using Apache Mesos
Solving the Industry 4.0. challenges on the logistics domain using Apache Mesos
 
Modern Java Development
Modern Java DevelopmentModern Java Development
Modern Java Development
 
Modern Software Development
Modern Software DevelopmentModern Software Development
Modern Software Development
 
Ph.D. Defense
Ph.D. Defense Ph.D. Defense
Ph.D. Defense
 

Recently uploaded

Apex Replay Debugger and Salesforce Platform Events.pptx
Apex Replay Debugger and Salesforce Platform Events.pptxApex Replay Debugger and Salesforce Platform Events.pptx
Apex Replay Debugger and Salesforce Platform Events.pptxmohayyudin7826
 
Introducing the New FME Community Webinar - Feb 21, 2024 (2).pdf
Introducing the New FME Community Webinar - Feb 21, 2024 (2).pdfIntroducing the New FME Community Webinar - Feb 21, 2024 (2).pdf
Introducing the New FME Community Webinar - Feb 21, 2024 (2).pdfSafe Software
 
Artificial-Intelligence-in-Marketing-Data.pdf
Artificial-Intelligence-in-Marketing-Data.pdfArtificial-Intelligence-in-Marketing-Data.pdf
Artificial-Intelligence-in-Marketing-Data.pdfIsidro Navarro
 
"Journey of Aspiration: Unveiling the Path to Becoming a Technocrat and Entre...
"Journey of Aspiration: Unveiling the Path to Becoming a Technocrat and Entre..."Journey of Aspiration: Unveiling the Path to Becoming a Technocrat and Entre...
"Journey of Aspiration: Unveiling the Path to Becoming a Technocrat and Entre...shaiyuvasv
 
"AIRe - AI Reliability Engineering", Denys Vasyliev
"AIRe - AI Reliability Engineering", Denys Vasyliev"AIRe - AI Reliability Engineering", Denys Vasyliev
"AIRe - AI Reliability Engineering", Denys VasylievFwdays
 
My sample product research idea for you!
My sample product research idea for you!My sample product research idea for you!
My sample product research idea for you!KivenRaySarsaba
 
Automate Your Master Data Processes for Shared Service Center Excellence
Automate Your Master Data Processes for Shared Service Center ExcellenceAutomate Your Master Data Processes for Shared Service Center Excellence
Automate Your Master Data Processes for Shared Service Center ExcellencePrecisely
 
Digital Transformation Strategy & Plan Templates - www.beyondthecloud.digital...
Digital Transformation Strategy & Plan Templates - www.beyondthecloud.digital...Digital Transformation Strategy & Plan Templates - www.beyondthecloud.digital...
Digital Transformation Strategy & Plan Templates - www.beyondthecloud.digital...MarcovanHurne2
 
Campotel: Telecommunications Infra and Network Builder - Company Profile
Campotel: Telecommunications Infra and Network Builder - Company ProfileCampotel: Telecommunications Infra and Network Builder - Company Profile
Campotel: Telecommunications Infra and Network Builder - Company ProfileCampotelPhilippines
 
Enhancing Productivity and Insight A Tour of JDK Tools Progress Beyond Java 17
Enhancing Productivity and Insight  A Tour of JDK Tools Progress Beyond Java 17Enhancing Productivity and Insight  A Tour of JDK Tools Progress Beyond Java 17
Enhancing Productivity and Insight A Tour of JDK Tools Progress Beyond Java 17Ana-Maria Mihalceanu
 
H3 Platform CXL Solution_Memory Fabric Forum.pptx
H3 Platform CXL Solution_Memory Fabric Forum.pptxH3 Platform CXL Solution_Memory Fabric Forum.pptx
H3 Platform CXL Solution_Memory Fabric Forum.pptxMemory Fabric Forum
 
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERNRonnelBaroc
 
Bit N Build Poland
Bit N Build PolandBit N Build Poland
Bit N Build PolandGDSC PJATK
 
Introduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVAIntroduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVARobert McDermott
 
Importance of magazines in education ppt
Importance of magazines in education pptImportance of magazines in education ppt
Importance of magazines in education pptsafnarafeek2002
 
Early Tech Adoption: Foolish or Pragmatic? - 17th ISACA South Florida WOW Con...
Early Tech Adoption: Foolish or Pragmatic? - 17th ISACA South Florida WOW Con...Early Tech Adoption: Foolish or Pragmatic? - 17th ISACA South Florida WOW Con...
Early Tech Adoption: Foolish or Pragmatic? - 17th ISACA South Florida WOW Con...Adrian Sanabria
 
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...UiPathCommunity
 
Battle of React State Managers in frontend applications
Battle of React State Managers in frontend applicationsBattle of React State Managers in frontend applications
Battle of React State Managers in frontend applicationsEvangelia Mitsopoulou
 
How to write an effective Cyber Incident Response Plan
How to write an effective Cyber Incident Response PlanHow to write an effective Cyber Incident Response Plan
How to write an effective Cyber Incident Response PlanDatabarracks
 
"Platform Engineering with Development Containers", Igor Fesenko
"Platform Engineering with Development Containers", Igor Fesenko"Platform Engineering with Development Containers", Igor Fesenko
"Platform Engineering with Development Containers", Igor FesenkoFwdays
 

Recently uploaded (20)

Apex Replay Debugger and Salesforce Platform Events.pptx
Apex Replay Debugger and Salesforce Platform Events.pptxApex Replay Debugger and Salesforce Platform Events.pptx
Apex Replay Debugger and Salesforce Platform Events.pptx
 
Introducing the New FME Community Webinar - Feb 21, 2024 (2).pdf
Introducing the New FME Community Webinar - Feb 21, 2024 (2).pdfIntroducing the New FME Community Webinar - Feb 21, 2024 (2).pdf
Introducing the New FME Community Webinar - Feb 21, 2024 (2).pdf
 
Artificial-Intelligence-in-Marketing-Data.pdf
Artificial-Intelligence-in-Marketing-Data.pdfArtificial-Intelligence-in-Marketing-Data.pdf
Artificial-Intelligence-in-Marketing-Data.pdf
 
"Journey of Aspiration: Unveiling the Path to Becoming a Technocrat and Entre...
"Journey of Aspiration: Unveiling the Path to Becoming a Technocrat and Entre..."Journey of Aspiration: Unveiling the Path to Becoming a Technocrat and Entre...
"Journey of Aspiration: Unveiling the Path to Becoming a Technocrat and Entre...
 
"AIRe - AI Reliability Engineering", Denys Vasyliev
"AIRe - AI Reliability Engineering", Denys Vasyliev"AIRe - AI Reliability Engineering", Denys Vasyliev
"AIRe - AI Reliability Engineering", Denys Vasyliev
 
My sample product research idea for you!
My sample product research idea for you!My sample product research idea for you!
My sample product research idea for you!
 
Automate Your Master Data Processes for Shared Service Center Excellence
Automate Your Master Data Processes for Shared Service Center ExcellenceAutomate Your Master Data Processes for Shared Service Center Excellence
Automate Your Master Data Processes for Shared Service Center Excellence
 
Digital Transformation Strategy & Plan Templates - www.beyondthecloud.digital...
Digital Transformation Strategy & Plan Templates - www.beyondthecloud.digital...Digital Transformation Strategy & Plan Templates - www.beyondthecloud.digital...
Digital Transformation Strategy & Plan Templates - www.beyondthecloud.digital...
 
Campotel: Telecommunications Infra and Network Builder - Company Profile
Campotel: Telecommunications Infra and Network Builder - Company ProfileCampotel: Telecommunications Infra and Network Builder - Company Profile
Campotel: Telecommunications Infra and Network Builder - Company Profile
 
Enhancing Productivity and Insight A Tour of JDK Tools Progress Beyond Java 17
Enhancing Productivity and Insight  A Tour of JDK Tools Progress Beyond Java 17Enhancing Productivity and Insight  A Tour of JDK Tools Progress Beyond Java 17
Enhancing Productivity and Insight A Tour of JDK Tools Progress Beyond Java 17
 
H3 Platform CXL Solution_Memory Fabric Forum.pptx
H3 Platform CXL Solution_Memory Fabric Forum.pptxH3 Platform CXL Solution_Memory Fabric Forum.pptx
H3 Platform CXL Solution_Memory Fabric Forum.pptx
 
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN
 
Bit N Build Poland
Bit N Build PolandBit N Build Poland
Bit N Build Poland
 
Introduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVAIntroduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVA
 
Importance of magazines in education ppt
Importance of magazines in education pptImportance of magazines in education ppt
Importance of magazines in education ppt
 
Early Tech Adoption: Foolish or Pragmatic? - 17th ISACA South Florida WOW Con...
Early Tech Adoption: Foolish or Pragmatic? - 17th ISACA South Florida WOW Con...Early Tech Adoption: Foolish or Pragmatic? - 17th ISACA South Florida WOW Con...
Early Tech Adoption: Foolish or Pragmatic? - 17th ISACA South Florida WOW Con...
 
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
 
Battle of React State Managers in frontend applications
Battle of React State Managers in frontend applicationsBattle of React State Managers in frontend applications
Battle of React State Managers in frontend applications
 
How to write an effective Cyber Incident Response Plan
How to write an effective Cyber Incident Response PlanHow to write an effective Cyber Incident Response Plan
How to write an effective Cyber Incident Response Plan
 
"Platform Engineering with Development Containers", Igor Fesenko
"Platform Engineering with Development Containers", Igor Fesenko"Platform Engineering with Development Containers", Igor Fesenko
"Platform Engineering with Development Containers", Igor Fesenko
 

Software Realibility on the Big Data Era

  • 1. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE Software reliability on the Big Data ERA with an Industry minded focus Ángel Conde aconde@ikerlan.es
  • 2. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2021. IKERLAN. All rights reserved About Me 2 @Neuw84 @IKERLANofficial Ángel Conde Manjón Data Analytics & Artificial Intelligence Team Lead @ Big Data Artificial Intelligence Distributted Systems Cloud
  • 3. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE BIG DATA “RELIABILITY” OR “FAILURE SURVIVAL” 3
  • 4. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2021. IKERLAN. All rights reserved Distributed systems vs reliability 4 • Big Data equals to Distributed Processing System. But…… “Can a distributed system be reliable?” • Not really. - Network Partitions. - Node failure (Hardware, Software, etc). - Clock Drift (related to consensus). *google nowadays says otherwise….
  • 5. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2021. IKERLAN. All rights reserved The starting paradigm shift 5 • HPC Clusters too expensive (and they fail too). “How can we process in cheap & reliable way high amount of data? “ • makes it: MapReduce: Simplified Data Processing on Large Clusters (2004, J. Dean). • Open Source its implementation is born. The rest is history….
  • 6. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2021. IKERLAN. All rights reserved The Map Reduce model 6 * Word Count is the Hello World in the Big Data Paradigm.
  • 7. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2021. IKERLAN. All rights reserved All fits in memory 7 • Map Reduce is somehow “slow”, every step persisted to disk. • Memory gets cheaper and cheaper…. • Let´s do in memory computing! Spark: Cluster Computing with Working Sets. (M. Zaharia, 2010).
  • 8. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2021. IKERLAN. All rights reserved Spark Lineage Model 8 • Everything is immutable. • DATA is partitioned in replicated chunks (RDD). • Before execution, a DAG is computed. • DAG execution is checkpointed to failure tolerant storage. • In case of node failure its recomputed from last checkpoint.
  • 9. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2021. IKERLAN. All rights reserved Orchestrators 9 • An important piece. • Abstract resources of the cluster (CPUs, GPUs, Memory). “I want my Big Data process to run on: 200 CPUs, 512GB Ram” • Coordinates all the works running in the cluster. • Relaunch to other nodes in case of failure. • As DBs they have consensus capabilities (e.g., for leadership elections).
  • 11. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2021. IKERLAN. All rights reserved The CAP Theroem 11 * Pick two
  • 12. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2021. IKERLAN. All rights reserved All about consensus 12 https://jvns.ca/blog/2016/11/19/a-critique-of-the-cap-theorem/
  • 13. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2021. IKERLAN. All rights reserved The Rise of NoSQL 13 • The internet become what it is some years ago (aka Internet size problems). • Lot of No-SQL solutions to solve internet scale problems. o Key-Value o Document o Time o Graph • Remember, usually YOU do not have those problems. • Avoid sharding, multi-master approaches. • No ACID transaction support.
  • 14. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2021. IKERLAN. All rights reserved A new approach 14 • Again, did it Spanner: Google's Globally-Distributed Database (C. Corbettt, 2012) • Complete control of the backbone network, being tolerant to failures. • Atomic clocks global sync. • Advanced Consensus protocols.
  • 15. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2021. IKERLAN. All rights reserved The Open Source alternatives 15 *nowadays high rise of multimodal databases
  • 16. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE INDUSTRIAL INTERNET OF THINGS (INDUSTRY 4.0) 16
  • 17. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2021. IKERLAN. All rights reserved The Industrial Internet of Things (IIoT) 17 • : investment is expected to top $60 trillion during the next 15 years. • : could add $14.2T to the global economy by 2030. • will touch 43% of the global economy by 2025. • Gartner : 20 billion IoT things installed by 2024.
  • 18. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2021. IKERLAN. All rights reserved Use cases & Key Benefits 18 + efficiency - costs • Supply-Demand matching and reduction of Time-to-market. • Human resource optimization. • Optimization of energy and raw material consumptions. • Manufacturing asset optimization and OEE improvement. • Quality Maximization. • After sales service optimization. • Environment health & security maximization.
  • 19. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2021. IKERLAN. All rights reserved Key Issues in IIoT 19
  • 20. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE REAL TIME PROCESSING APPLIED TO IIOT 20
  • 21. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2021. IKERLAN. All rights reserved Big Data & Real Time Processing 21 • A table can be seen as a snapshot of streaming data (e.g. unbounded table). • Usually streaming aggregations requires windows. • Results are processed at some point (e.g. window), we make a “snapshot table”. • Those snapshots are usually stored in a tolerant failure storage system. However…. How do we deal with late arriving data?
  • 22. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2021. IKERLAN. All rights reserved Watermarking 22 Event Time Processing time With 5 minutes triggers 12:00 12:05 12:10 12:15 11:55 12:00 12:05 12:10 12:15 • The (in)famous word count example. 5 minute watermark (last seen event time – 5m) 11:58 (“hello”,1) 12:03 (“hello”,1) 12:08 (“hello”,1) 12:05 (“hello”,1) 12:03 (“hello”,1) 12:14 (“hello”,1) Max event time seen Word Count Processing time = 12:00 Processing time = 12:05 Processing time = 12:10 Processing time = 12:15 “hello” 1 2 4 Event after the watermark is not written to the Sink
  • 24. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2021. IKERLAN. All rights reserved Demo 24
  • 25. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2021. IKERLAN. All rights reserved Overview 25 Digital Platform (PaaS) MQTT - JSON Filter and routing Aggregates & Raw data Real time processing Cloud UI
  • 26. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE IKERLAN P.º José María Arizmendiarrieta, 2 - 20500 Arrasate-Mondragón T. +34 943712400 F. +34 943796944 THANK YOU https://github.com/Neuw84/ada_2021/ aconde@ikerlan.es @neuw84

Editor's Notes

  1. Good afternoon to everybody, I´m Angel Conde from IKERLAN Technology Centre. The talk I´m presenting here is called Software Reliability on the Big Data ERA with an Industry minded focus
  2. Well I will give a brief introduction about me. I work leading the Data Analytics & Artificial Intelligence Team at Ikerlan. Ikerlan is a research centre member of the Basque Research & Technology Alliance. Those are some of the topics that I work on my day to day.
  3. Let’s start the talk with an introduction about how Big Data started with realibity in mind.
  4. The first thing that we need to be taken into account is that a Big Data system equals to a Distributed system. However, we should ask ourselves this question. Can a distributed system be reliable? Not really, we have all kind of failures. And that leaded to the famous 8 fallacies of Distributed computing.
  5. One can say that we have High Performance Computing clusters, but… they are too expensive to process the amount of data gathered by internet companies. Moreover, such systems fail too. Then… How can we process in cheap & reliable way high amount of data? Google, in, 2004 pubish a paper about an approach to processing data on large clusters. Some years later, Yahoo open sources its implementation and Hadoop is born… the rest is history.
  6. In the map reduce model we have usually some map steps chained with reduce steps. In this figure we can see the diagram for a word count. Word count is the hello world in the big data paradigm. A lot of use cases can be ported to this approach, more than you may think at first sight. We can see here that the network load on the shuffle steps seems to be important for the performance of approach. Moreover, for each step the intermediate results are stored on failure tolerant storage system
  7. Memory get cheaper and therefore the approach to do in memory computing is born. Berkeley publish a paper on one approach using this kind of paradigm an later on a lot of frameworks born using the in-memory paradigm.
  8. In spark, in order to be tolerant to failures. The first thing is that everything is inmmutable. The data is stored in a replicated way in memory. Before execution a DAG is computed trying to optimize the different steps of the computation. Moreover, the DAG steps are checkpointed as needed in order to be reliable. If no checkpoint exits, it recomputes the whole DAG. RDDs are immutable distributed collection of elements of your data that can be stored in memory or disk across a cluster of machines. The data is partitioned across machines in your cluster that can be operated in parallel with a low-level API that offers transformations and actions. RDDs are fault tolerant as they track data lineage information to rebuild lost data automatically on failure
  9. Next we are going to speak about the orchestrators. They are in charge of job scheduling, abstract cluster resources, etc. In case of node failure the try to reschedule the jobs into other nodes. As distributed databases the need consensus capabilities (e.g. who is the leader).
  10. Well, we are going to change our focus into distributed databases, those databases are distributed by nature and therefore we are going to make a brief introduction about their design.
  11. In the Distributed Databases is famous the CAP theorem, this theorem says that in a distributed system you can’t have three of those features. For example, you can have consistency and availability but not being tolerant to network partitions.
  12. This theorem seems to provide an easy reasoning about these systems. However, in some combinations,.. That does not mean very much.
  13. But how this trend started? The rise of the distributed databases was meant to solve internet size problems. There a lot of no-sql to solve internet case problems. Those, approaches provide multimaster capabilities, avoid sharding…. However, no ACID support (consistency) in the majority of the approaches. (*these can be solved by developers on client side) And the developers wanted it’s SQL back (e.g. CQL) and companies wanted ACID.
  14. Google changed the landscape again in 2012 with another paper. The thing is that you have a complete control of the backbone network. Having multiple physical paths that provide tolerant to fialures. They have in each datacenter an Atomic clock in order to have a global time sync protocol. And with advanced protocols….
  15. After the famous paper, again…. Some open source databases have already implemented some of the paper tricks.
  16. Well let’s move into the next point. Now I will introduce
  17. Let´s start with some numbers related to the Iot and the IIoT to show why this is IMPPORTant General Electric says that Iiot Investment is expected to top…. Accenture: predicts that iiot could add McKinsey estimates that will touch 43% of the global economy. About the number of things Gartner says that 20 billion things will by installed by 2020.
  18. Lets see some of the for the industry Well the benefits apply for benefits r the whole product live cycle, from its development to its end of life support. Eg. Supply demand matching and reduction time Human resource optimization Optimization of energy and raw material consumptions Manufacturing asset optimization Overall Equipment Effectiveness Quality maximization After sales …….. All of these concepts are closely related to the industry 4.0.
  19. Following let´s speak about real time processing of IIoT data. Late Data and Ordering: - We can have connectivity issues such as: wireless mobile telecommunications, low signal, etc. Protocols: - Most MQTT brokers do not implement Qos2!! - CoAP is UDP based no ordering!! wrong designed local acquisition systems Therefore, if we are doing real-time processing of IIoT data we need a tool that enables us to work easily on unordered incoming data and to build filters for duplicates easily
  20. Next I am goint to explain the concept of Event time & watermarking for late data. Watermark is a moving threshold in event-time that trails behind the maximum event-time seen by the query in the processed data
  21. Well in this demo we are using some of the Big Data open source tools: - For example: we are using Nifi(Naifai) for ingestion and routing - Kafka messaging and decoupling - Spark for real time processing - Cassandra as backend storage. - Zeppelin as our web interface - The open source broker MQTT called mosquitto.
  22. The architecture is the following: Fake Sensor Data from two machines is sent to a MQTT broker running on the cloud. This data contains machine status, temperature, etc. From there MQTT data is ingested via Nifi (naifai) and sent to two topics depending the machine status. Then we have the real time processing engine, Spark. This component makes possible to do real time analytics on incoming data and store the results on Cassandra. For the demo we will use Zeppelin as a way to interact with Spark and Cassandra providing a useful user interface for our analytics. This kind of architecture or digital platform can run on any cloud or on-premises.
  23. We have come to the end of the demo. I’d just like to thank (thenk) you for listening and let you know that all code of this demo is already on Github. Now I would be pleased to take your comments and questions.