SlideShare a Scribd company logo
Methods and Practices for
guaranteed failure in Big Data
PANTELIS NASIKAS
1
Delusions and Pitfalls
● Who needs Schemas afterall
● My SuperNoYesSQLDB scales on reads AND writes
like..forever!
● The network and all peers will always be there!
● Resource management , not really a concern for my script
● ETL/Jobs management , BYO-DAG tool
Who needs schemas afterall
● Always Define schemas and versioning process irrespective of the serialization
● Schema registries indispensable
● Metadata management , respect the end users and engineers
● Means to implement data governance
● Builds trust across teams
My SuperNoYesSQLDB scales on reads AND writes
● Understand model that the Database was built for
● Not all APIs created equally
● Partitioning and Replication as key design elements keeping in mind failures -
extensions - rebalancing
● Data consistency on high speed writes...spooky, fsync ( when really? )
● Transactions to keep data and denormalized views in sync ? alternative options
The network and all peers will always be here!
● The Network IS reliable , P.Bailis & K. Kingsbury
● Hardware IS NOT (always) reliable
● Prepare for failure
● Test systems under failing hardware/software
● Learn your APIs (eg what happens when partitions fail , or move ? )
● where is your place in CAP ? Are you CP or CA ?
● what is acceptable for your case ?
● How does my database recover after failure ?
● Can this introduce new problems ?
ETL/Jobs management , BYO-DAG tool
● Data Pipelines and Lineage
● Who generates what , at what time
● Failure! Who is next? What does rescheduling mean
to dependents ?
● How can i really find dependents ?
● Let me build that API! It’s just software after all.
● Build testable pipelines w/o any need for production
data...
Resource management , not really a concern
● Every job / task / stream / long running service should be constrained and ideally
isolated
● Database / Filesystem access from 3rd party as well
● Understand your jobs’ requirements ( communication cost model, partitioning and
shuffling effect on cpu/memory/network)
● You don’t want to preempt you multi-terabyte batch job just before the end
● Orchestrate small well defined tasks
● Do not assume large resource allocations
8
Pantelis Nasikas
pantelis.nasikas@agileactors.com
Thank you,

More Related Content

Similar to Voxxed Athens 2018 - Methods and Practices for Guaranteed Failure in Big Data

Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systems
Zhenxiao Luo
 
Building real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark StreamingBuilding real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark Streaming
datamantra
 
What drives Innovation? Innovations And Technological Solutions for the Distr...
What drives Innovation? Innovations And Technological Solutions for the Distr...What drives Innovation? Innovations And Technological Solutions for the Distr...
What drives Innovation? Innovations And Technological Solutions for the Distr...Stefano Fago
 
Ideas spracklen-final
Ideas spracklen-finalIdeas spracklen-final
Ideas spracklen-final
supportlogic
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dan Lynn
 
Apache Cassandra at Target - Cassandra Summit 2014
Apache Cassandra at Target - Cassandra Summit 2014Apache Cassandra at Target - Cassandra Summit 2014
Apache Cassandra at Target - Cassandra Summit 2014
Dan Cundiff
 
Building a Database for the End of the World
Building a Database for the End of the WorldBuilding a Database for the End of the World
Building a Database for the End of the World
jhugg
 
Path Dependent Development (PyCon AU)
Path Dependent Development (PyCon AU)Path Dependent Development (PyCon AU)
Path Dependent Development (PyCon AU)
ncoghlan_dev
 
Proper Care and Feeding of a MySQL Database for Busy Linux Administrators
Proper Care and Feeding of a MySQL Database for Busy Linux AdministratorsProper Care and Feeding of a MySQL Database for Busy Linux Administrators
Proper Care and Feeding of a MySQL Database for Busy Linux Administrators
Dave Stokes
 
OSMC 2019 | How to improve database Observability by Charles Judith
OSMC 2019 | How to improve database Observability by Charles JudithOSMC 2019 | How to improve database Observability by Charles Judith
OSMC 2019 | How to improve database Observability by Charles Judith
NETWAYS
 
The Proper Care and Feeding of a MySQL Database for Busy Linux Admins -- SCaL...
The Proper Care and Feeding of a MySQL Database for Busy Linux Admins -- SCaL...The Proper Care and Feeding of a MySQL Database for Busy Linux Admins -- SCaL...
The Proper Care and Feeding of a MySQL Database for Busy Linux Admins -- SCaL...
Dave Stokes
 
AMW43 - Unba.se, Distributed database for human interaction
AMW43 - Unba.se, Distributed database for human interactionAMW43 - Unba.se, Distributed database for human interaction
AMW43 - Unba.se, Distributed database for human interaction
Daniel Norman
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadKrivoy Rog IT Community
 
Scalable, good, cheap
Scalable, good, cheapScalable, good, cheap
Scalable, good, cheap
Marc Cluet
 
Liferay portals in real projects
Liferay portals  in real projectsLiferay portals  in real projects
Liferay portals in real projects
IBACZ
 
Choosing the right parallel compute architecture
Choosing the right parallel compute architecture Choosing the right parallel compute architecture
Choosing the right parallel compute architecture
corehard_by
 
High performance computing tutorial, with checklist and tips to optimize clus...
High performance computing tutorial, with checklist and tips to optimize clus...High performance computing tutorial, with checklist and tips to optimize clus...
High performance computing tutorial, with checklist and tips to optimize clus...
Pradeep Redddy Raamana
 
Path dependent-development (PyCon India)
Path dependent-development (PyCon India)Path dependent-development (PyCon India)
Path dependent-development (PyCon India)
ncoghlan_dev
 
CouchConf SF 2012 Lightning Talk - Operational Excellence
CouchConf SF 2012 Lightning Talk - Operational ExcellenceCouchConf SF 2012 Lightning Talk - Operational Excellence
CouchConf SF 2012 Lightning Talk - Operational ExcellenceLaine Campbell
 
Presto
PrestoPresto
Presto
Knoldus Inc.
 

Similar to Voxxed Athens 2018 - Methods and Practices for Guaranteed Failure in Big Data (20)

Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systems
 
Building real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark StreamingBuilding real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark Streaming
 
What drives Innovation? Innovations And Technological Solutions for the Distr...
What drives Innovation? Innovations And Technological Solutions for the Distr...What drives Innovation? Innovations And Technological Solutions for the Distr...
What drives Innovation? Innovations And Technological Solutions for the Distr...
 
Ideas spracklen-final
Ideas spracklen-finalIdeas spracklen-final
Ideas spracklen-final
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
 
Apache Cassandra at Target - Cassandra Summit 2014
Apache Cassandra at Target - Cassandra Summit 2014Apache Cassandra at Target - Cassandra Summit 2014
Apache Cassandra at Target - Cassandra Summit 2014
 
Building a Database for the End of the World
Building a Database for the End of the WorldBuilding a Database for the End of the World
Building a Database for the End of the World
 
Path Dependent Development (PyCon AU)
Path Dependent Development (PyCon AU)Path Dependent Development (PyCon AU)
Path Dependent Development (PyCon AU)
 
Proper Care and Feeding of a MySQL Database for Busy Linux Administrators
Proper Care and Feeding of a MySQL Database for Busy Linux AdministratorsProper Care and Feeding of a MySQL Database for Busy Linux Administrators
Proper Care and Feeding of a MySQL Database for Busy Linux Administrators
 
OSMC 2019 | How to improve database Observability by Charles Judith
OSMC 2019 | How to improve database Observability by Charles JudithOSMC 2019 | How to improve database Observability by Charles Judith
OSMC 2019 | How to improve database Observability by Charles Judith
 
The Proper Care and Feeding of a MySQL Database for Busy Linux Admins -- SCaL...
The Proper Care and Feeding of a MySQL Database for Busy Linux Admins -- SCaL...The Proper Care and Feeding of a MySQL Database for Busy Linux Admins -- SCaL...
The Proper Care and Feeding of a MySQL Database for Busy Linux Admins -- SCaL...
 
AMW43 - Unba.se, Distributed database for human interaction
AMW43 - Unba.se, Distributed database for human interactionAMW43 - Unba.se, Distributed database for human interaction
AMW43 - Unba.se, Distributed database for human interaction
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High load
 
Scalable, good, cheap
Scalable, good, cheapScalable, good, cheap
Scalable, good, cheap
 
Liferay portals in real projects
Liferay portals  in real projectsLiferay portals  in real projects
Liferay portals in real projects
 
Choosing the right parallel compute architecture
Choosing the right parallel compute architecture Choosing the right parallel compute architecture
Choosing the right parallel compute architecture
 
High performance computing tutorial, with checklist and tips to optimize clus...
High performance computing tutorial, with checklist and tips to optimize clus...High performance computing tutorial, with checklist and tips to optimize clus...
High performance computing tutorial, with checklist and tips to optimize clus...
 
Path dependent-development (PyCon India)
Path dependent-development (PyCon India)Path dependent-development (PyCon India)
Path dependent-development (PyCon India)
 
CouchConf SF 2012 Lightning Talk - Operational Excellence
CouchConf SF 2012 Lightning Talk - Operational ExcellenceCouchConf SF 2012 Lightning Talk - Operational Excellence
CouchConf SF 2012 Lightning Talk - Operational Excellence
 
Presto
PrestoPresto
Presto
 

More from Voxxed Athens

Voxxed Athens 2018 - Eventing, Serverless, and the Extensible Enterprise
Voxxed Athens 2018 - Eventing, Serverless, and the Extensible EnterpriseVoxxed Athens 2018 - Eventing, Serverless, and the Extensible Enterprise
Voxxed Athens 2018 - Eventing, Serverless, and the Extensible Enterprise
Voxxed Athens
 
Voxxed Athens 2018 - Let’s Get Chatty with Conversational Interface with Java...
Voxxed Athens 2018 - Let’s Get Chatty with Conversational Interface with Java...Voxxed Athens 2018 - Let’s Get Chatty with Conversational Interface with Java...
Voxxed Athens 2018 - Let’s Get Chatty with Conversational Interface with Java...
Voxxed Athens
 
Voxxed Athens 2018 - IBM Watson Machine Learning – Build and train AI models ...
Voxxed Athens 2018 - IBM Watson Machine Learning – Build and train AI models ...Voxxed Athens 2018 - IBM Watson Machine Learning – Build and train AI models ...
Voxxed Athens 2018 - IBM Watson Machine Learning – Build and train AI models ...
Voxxed Athens
 
Voxxed Athens 2018 - We're going to talk about no sql, you can't join
Voxxed Athens 2018 - We're going to talk about no sql, you can't joinVoxxed Athens 2018 - We're going to talk about no sql, you can't join
Voxxed Athens 2018 - We're going to talk about no sql, you can't join
Voxxed Athens
 
Voxxed Athens 2018 - The secret for high quality software: Listen to your people
Voxxed Athens 2018 - The secret for high quality software: Listen to your peopleVoxxed Athens 2018 - The secret for high quality software: Listen to your people
Voxxed Athens 2018 - The secret for high quality software: Listen to your people
Voxxed Athens
 
Voxxed Athens 2018 - A scalable maritime platform providing services through...
Voxxed Athens 2018 -  A scalable maritime platform providing services through...Voxxed Athens 2018 -  A scalable maritime platform providing services through...
Voxxed Athens 2018 - A scalable maritime platform providing services through...
Voxxed Athens
 
Voxxed Athens 2018 - UX design and back-ends: When the back-end meets the user
Voxxed Athens 2018 - UX design and back-ends: When the back-end meets the userVoxxed Athens 2018 - UX design and back-ends: When the back-end meets the user
Voxxed Athens 2018 - UX design and back-ends: When the back-end meets the user
Voxxed Athens
 
Voxxed Athens 2018 - Your Local Meet-up: Your Path to Crafts(wo)manship
Voxxed Athens 2018 - Your Local Meet-up: Your Path to Crafts(wo)manshipVoxxed Athens 2018 - Your Local Meet-up: Your Path to Crafts(wo)manship
Voxxed Athens 2018 - Your Local Meet-up: Your Path to Crafts(wo)manship
Voxxed Athens
 
Voxxed Athens 2018 - The quantum computers are coming
Voxxed Athens 2018 - The quantum computers are comingVoxxed Athens 2018 - The quantum computers are coming
Voxxed Athens 2018 - The quantum computers are coming
Voxxed Athens
 
Voxxed Athens 2018 - Serverless by Design
Voxxed Athens 2018 - Serverless by DesignVoxxed Athens 2018 - Serverless by Design
Voxxed Athens 2018 - Serverless by Design
Voxxed Athens
 
Voxxed Athens 2018 - Getting real with progressive web apps in 2018
Voxxed Athens 2018 - Getting real with progressive web apps in 2018Voxxed Athens 2018 - Getting real with progressive web apps in 2018
Voxxed Athens 2018 - Getting real with progressive web apps in 2018
Voxxed Athens
 
Voxxed Athens 2018 - Why Kotlin?
Voxxed Athens 2018 - Why Kotlin?Voxxed Athens 2018 - Why Kotlin?
Voxxed Athens 2018 - Why Kotlin?
Voxxed Athens
 
Voxxed Athens 2018 - Java EE is dead Long live jakarta EE!
Voxxed Athens 2018 - Java EE is dead Long live jakarta EE!Voxxed Athens 2018 - Java EE is dead Long live jakarta EE!
Voxxed Athens 2018 - Java EE is dead Long live jakarta EE!
Voxxed Athens
 
Voxxed Athens 2018 - How WebAssembly is changing the Web and what it means to...
Voxxed Athens 2018 - How WebAssembly is changing the Web and what it means to...Voxxed Athens 2018 - How WebAssembly is changing the Web and what it means to...
Voxxed Athens 2018 - How WebAssembly is changing the Web and what it means to...
Voxxed Athens
 
Voxxed Athens 2018 - Going agile with kanban
Voxxed Athens 2018 - Going agile with kanbanVoxxed Athens 2018 - Going agile with kanban
Voxxed Athens 2018 - Going agile with kanban
Voxxed Athens
 
Voxxed Athens 2018 - Elasticsearch (R)Evolution — You Know, for Search...
Voxxed Athens 2018 - Elasticsearch (R)Evolution — You Know, for Search...Voxxed Athens 2018 - Elasticsearch (R)Evolution — You Know, for Search...
Voxxed Athens 2018 - Elasticsearch (R)Evolution — You Know, for Search...
Voxxed Athens
 
Voxxed Athens 2018 - Clean Code with Java9+
Voxxed Athens 2018 - Clean Code with Java9+Voxxed Athens 2018 - Clean Code with Java9+
Voxxed Athens 2018 - Clean Code with Java9+
Voxxed Athens
 
Voxxed Athens 2018 - Graph databases & data integration
Voxxed Athens 2018 - Graph databases & data integrationVoxxed Athens 2018 - Graph databases & data integration
Voxxed Athens 2018 - Graph databases & data integration
Voxxed Athens
 

More from Voxxed Athens (18)

Voxxed Athens 2018 - Eventing, Serverless, and the Extensible Enterprise
Voxxed Athens 2018 - Eventing, Serverless, and the Extensible EnterpriseVoxxed Athens 2018 - Eventing, Serverless, and the Extensible Enterprise
Voxxed Athens 2018 - Eventing, Serverless, and the Extensible Enterprise
 
Voxxed Athens 2018 - Let’s Get Chatty with Conversational Interface with Java...
Voxxed Athens 2018 - Let’s Get Chatty with Conversational Interface with Java...Voxxed Athens 2018 - Let’s Get Chatty with Conversational Interface with Java...
Voxxed Athens 2018 - Let’s Get Chatty with Conversational Interface with Java...
 
Voxxed Athens 2018 - IBM Watson Machine Learning – Build and train AI models ...
Voxxed Athens 2018 - IBM Watson Machine Learning – Build and train AI models ...Voxxed Athens 2018 - IBM Watson Machine Learning – Build and train AI models ...
Voxxed Athens 2018 - IBM Watson Machine Learning – Build and train AI models ...
 
Voxxed Athens 2018 - We're going to talk about no sql, you can't join
Voxxed Athens 2018 - We're going to talk about no sql, you can't joinVoxxed Athens 2018 - We're going to talk about no sql, you can't join
Voxxed Athens 2018 - We're going to talk about no sql, you can't join
 
Voxxed Athens 2018 - The secret for high quality software: Listen to your people
Voxxed Athens 2018 - The secret for high quality software: Listen to your peopleVoxxed Athens 2018 - The secret for high quality software: Listen to your people
Voxxed Athens 2018 - The secret for high quality software: Listen to your people
 
Voxxed Athens 2018 - A scalable maritime platform providing services through...
Voxxed Athens 2018 -  A scalable maritime platform providing services through...Voxxed Athens 2018 -  A scalable maritime platform providing services through...
Voxxed Athens 2018 - A scalable maritime platform providing services through...
 
Voxxed Athens 2018 - UX design and back-ends: When the back-end meets the user
Voxxed Athens 2018 - UX design and back-ends: When the back-end meets the userVoxxed Athens 2018 - UX design and back-ends: When the back-end meets the user
Voxxed Athens 2018 - UX design and back-ends: When the back-end meets the user
 
Voxxed Athens 2018 - Your Local Meet-up: Your Path to Crafts(wo)manship
Voxxed Athens 2018 - Your Local Meet-up: Your Path to Crafts(wo)manshipVoxxed Athens 2018 - Your Local Meet-up: Your Path to Crafts(wo)manship
Voxxed Athens 2018 - Your Local Meet-up: Your Path to Crafts(wo)manship
 
Voxxed Athens 2018 - The quantum computers are coming
Voxxed Athens 2018 - The quantum computers are comingVoxxed Athens 2018 - The quantum computers are coming
Voxxed Athens 2018 - The quantum computers are coming
 
Voxxed Athens 2018 - Serverless by Design
Voxxed Athens 2018 - Serverless by DesignVoxxed Athens 2018 - Serverless by Design
Voxxed Athens 2018 - Serverless by Design
 
Voxxed Athens 2018 - Getting real with progressive web apps in 2018
Voxxed Athens 2018 - Getting real with progressive web apps in 2018Voxxed Athens 2018 - Getting real with progressive web apps in 2018
Voxxed Athens 2018 - Getting real with progressive web apps in 2018
 
Voxxed Athens 2018 - Why Kotlin?
Voxxed Athens 2018 - Why Kotlin?Voxxed Athens 2018 - Why Kotlin?
Voxxed Athens 2018 - Why Kotlin?
 
Voxxed Athens 2018 - Java EE is dead Long live jakarta EE!
Voxxed Athens 2018 - Java EE is dead Long live jakarta EE!Voxxed Athens 2018 - Java EE is dead Long live jakarta EE!
Voxxed Athens 2018 - Java EE is dead Long live jakarta EE!
 
Voxxed Athens 2018 - How WebAssembly is changing the Web and what it means to...
Voxxed Athens 2018 - How WebAssembly is changing the Web and what it means to...Voxxed Athens 2018 - How WebAssembly is changing the Web and what it means to...
Voxxed Athens 2018 - How WebAssembly is changing the Web and what it means to...
 
Voxxed Athens 2018 - Going agile with kanban
Voxxed Athens 2018 - Going agile with kanbanVoxxed Athens 2018 - Going agile with kanban
Voxxed Athens 2018 - Going agile with kanban
 
Voxxed Athens 2018 - Elasticsearch (R)Evolution — You Know, for Search...
Voxxed Athens 2018 - Elasticsearch (R)Evolution — You Know, for Search...Voxxed Athens 2018 - Elasticsearch (R)Evolution — You Know, for Search...
Voxxed Athens 2018 - Elasticsearch (R)Evolution — You Know, for Search...
 
Voxxed Athens 2018 - Clean Code with Java9+
Voxxed Athens 2018 - Clean Code with Java9+Voxxed Athens 2018 - Clean Code with Java9+
Voxxed Athens 2018 - Clean Code with Java9+
 
Voxxed Athens 2018 - Graph databases & data integration
Voxxed Athens 2018 - Graph databases & data integrationVoxxed Athens 2018 - Graph databases & data integration
Voxxed Athens 2018 - Graph databases & data integration
 

Recently uploaded

Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
Jen Stirrup
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
Globus
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 

Recently uploaded (20)

Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 

Voxxed Athens 2018 - Methods and Practices for Guaranteed Failure in Big Data

  • 1. Methods and Practices for guaranteed failure in Big Data PANTELIS NASIKAS 1
  • 2. Delusions and Pitfalls ● Who needs Schemas afterall ● My SuperNoYesSQLDB scales on reads AND writes like..forever! ● The network and all peers will always be there! ● Resource management , not really a concern for my script ● ETL/Jobs management , BYO-DAG tool
  • 3. Who needs schemas afterall ● Always Define schemas and versioning process irrespective of the serialization ● Schema registries indispensable ● Metadata management , respect the end users and engineers ● Means to implement data governance ● Builds trust across teams
  • 4. My SuperNoYesSQLDB scales on reads AND writes ● Understand model that the Database was built for ● Not all APIs created equally ● Partitioning and Replication as key design elements keeping in mind failures - extensions - rebalancing ● Data consistency on high speed writes...spooky, fsync ( when really? ) ● Transactions to keep data and denormalized views in sync ? alternative options
  • 5. The network and all peers will always be here! ● The Network IS reliable , P.Bailis & K. Kingsbury ● Hardware IS NOT (always) reliable ● Prepare for failure ● Test systems under failing hardware/software ● Learn your APIs (eg what happens when partitions fail , or move ? ) ● where is your place in CAP ? Are you CP or CA ? ● what is acceptable for your case ? ● How does my database recover after failure ? ● Can this introduce new problems ?
  • 6. ETL/Jobs management , BYO-DAG tool ● Data Pipelines and Lineage ● Who generates what , at what time ● Failure! Who is next? What does rescheduling mean to dependents ? ● How can i really find dependents ? ● Let me build that API! It’s just software after all. ● Build testable pipelines w/o any need for production data...
  • 7. Resource management , not really a concern ● Every job / task / stream / long running service should be constrained and ideally isolated ● Database / Filesystem access from 3rd party as well ● Understand your jobs’ requirements ( communication cost model, partitioning and shuffling effect on cpu/memory/network) ● You don’t want to preempt you multi-terabyte batch job just before the end ● Orchestrate small well defined tasks ● Do not assume large resource allocations