What we see in the modern data store world is a race between different approaches to achieve a distributed and resilient storage of data. Most applications need a stateful layer which holds the data. There are at least three necessary ingredients which are everything else than trivial to combine and of course even more challenging when heading for an acceptable performance. Over the past years there has been significant progress in respect in both the science and practical implementations of such data stores. In his talk Max Neunhoeffer will introduce the audience to some of the needed ingredients, address the difficulties of their interplay and show four modern approaches of distributed open-source data stores.
Topics are:
– Challenges in developing a distributed, resilient data store
– Consensus, distributed transactions, distributed query optimization and execution
– The inner workings of ArangoDB, Cassandra, Cockroach and RethinkDB
The talk will touch complex and difficult computer science, but will at the same time be accessible to and enjoyable by a wide range of developers.
The computer science behind a modern disributed data storeJ On The Beach
What we see in the modern data store world is a race between different approaches to achieve a distributed and resilient storage of data. Every application needs a stateful layer which holds the data. There are at least three necessary components which are everything else than trivial to combine, and, of course, even more challenging when heading for an acceptable performance.
Over the past years there has been significant progress in both the science and practical implementations of such data stores. In his talk Max Neunhoeffer will introduce the audience to some of the needed ingredients, address the difficulties of their interplay and show four modern approaches of distributed open-source data stores (ArangoDB, Cassandra, Cockroach and RethinkDB).
The Computer Science Behind a modern Distributed DatabaseArangoDB Database
What we see in the modern data store world is a race between different approaches to achieve a distributed and resilient storage of data. Every application needs a stateful layer which holds the data. There are several different necessary components which are anything but trivial to combine, and, of course, even more challenging when attempting to optimize for performance. Over the past years there has been significant progress in both the science and practical implementations of such data stores. In this talk Dan Larkin-York will introduce the audience to some of the challenges, address the difficulties of their interplay, and cover key approaches taken by some of the industry’s leaders (ArangoDB, Cassandra, CockroachDB, MarkLogic, and more).
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...datascienceiqss
It would be useful to be able to discover what kinds of data are contained in the myriad general-purpose public data repositories. It would be even better if it were possible to query that data and/or have that data conform to a particular context-dependent data format. This was the ambition of the Data FAIRport project. I will be demonstrating the "strawman" demonstration of a fully-functional Data FAIRport, where the meta/data in a public repository can be "projected" into one of a number of different context-dependent formats, such that it can be cross-queried in combination with the (potentially "projected") data from other repositories.
The computational requirements of next generation sequencing is placing a huge demand on IT organisations .
Building compute clusters is now a well understood and relatively straightforward problem. However, NGS sequencing applications require large amounts of storage, and high IO rates.
This talk details our approach for providing storage for next-gen sequencing applications.
Talk given at BIO-IT World, Europe, 2009.
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015Mark Wilkinson
A discussion and demonstration of a functional Data FAIRport, using W3C's Linked Data Platform, Ruben Verborgh's Linked Data Fragments, and Hydra's hypermedia controlled vocabularies. This is the output of the "Skunkworks" working group of the larger Data FAIRport project (http://datafairport.org).
The computer science behind a modern disributed data storeJ On The Beach
What we see in the modern data store world is a race between different approaches to achieve a distributed and resilient storage of data. Every application needs a stateful layer which holds the data. There are at least three necessary components which are everything else than trivial to combine, and, of course, even more challenging when heading for an acceptable performance.
Over the past years there has been significant progress in both the science and practical implementations of such data stores. In his talk Max Neunhoeffer will introduce the audience to some of the needed ingredients, address the difficulties of their interplay and show four modern approaches of distributed open-source data stores (ArangoDB, Cassandra, Cockroach and RethinkDB).
The Computer Science Behind a modern Distributed DatabaseArangoDB Database
What we see in the modern data store world is a race between different approaches to achieve a distributed and resilient storage of data. Every application needs a stateful layer which holds the data. There are several different necessary components which are anything but trivial to combine, and, of course, even more challenging when attempting to optimize for performance. Over the past years there has been significant progress in both the science and practical implementations of such data stores. In this talk Dan Larkin-York will introduce the audience to some of the challenges, address the difficulties of their interplay, and cover key approaches taken by some of the industry’s leaders (ArangoDB, Cassandra, CockroachDB, MarkLogic, and more).
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...datascienceiqss
It would be useful to be able to discover what kinds of data are contained in the myriad general-purpose public data repositories. It would be even better if it were possible to query that data and/or have that data conform to a particular context-dependent data format. This was the ambition of the Data FAIRport project. I will be demonstrating the "strawman" demonstration of a fully-functional Data FAIRport, where the meta/data in a public repository can be "projected" into one of a number of different context-dependent formats, such that it can be cross-queried in combination with the (potentially "projected") data from other repositories.
The computational requirements of next generation sequencing is placing a huge demand on IT organisations .
Building compute clusters is now a well understood and relatively straightforward problem. However, NGS sequencing applications require large amounts of storage, and high IO rates.
This talk details our approach for providing storage for next-gen sequencing applications.
Talk given at BIO-IT World, Europe, 2009.
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015Mark Wilkinson
A discussion and demonstration of a functional Data FAIRport, using W3C's Linked Data Platform, Ruben Verborgh's Linked Data Fragments, and Hydra's hypermedia controlled vocabularies. This is the output of the "Skunkworks" working group of the larger Data FAIRport project (http://datafairport.org).
UnConference for Georgia Southern Computer Science March 31, 2015Christopher Curtin
I presented to the Georgia Southern Computer Science ACM group. Rather than one topic for 90 minutes, I decided to do an UnConference. I presented them a list of 8-9 topics, let them vote on what to talk about, then repeated.
Each presentation was ~8 minutes, (Except Career) and was by no means an attempt to explain the full concept or technology. Only to wake up their interest.
Basho and Riak at GOTO Stockholm: "Don't Use My Database."Basho Technologies
What are common use cases for NoSQL? When should I avoid NoSQL? When is RDBMS just fine?
This presentation, delivered at the GOTO NoSQL Roadshow events in London and Stockholm in November of 2011 by Basho co-founder and COO, Antony Falco, take a no-BS look at the tradeoffs one must make to gain the advantages offered by distributed databases like Riak.
Software and the Concurrency Revolution : NotesSubhajit Sahu
Highlighted notes of article while studying Concurrent Data Structures, CSE:
Software and the Concurrency Revolution
Herb Sutter
Software Architect, Microsoft
Software Development Consultant, www.gotw.ca/training
Herb Sutter is a prominent C++ expert. He is also a book author and was a columnist for Dr. Dobb's Journal. He joined Microsoft in 2002 as a platform evangelist for Visual C++ .NET, rising to lead software architect for C++/CLI.
Containers and Developer Defined Data Centers - Evan Powell - Keynote in Bang...CodeOps Technologies LLP
DevOps and Containers go hand in hand. DevOps industry is expected to benefit significantly benefit from the container eco-system and technology. This keynote talks about the challenges and opportunities around deploying containers into production use cases.
Cache Consistency – Requirements and its packet processing Performance implic...Michelle Holley
The audio starts at 9:00 due to a glitch while recording.
The topic can be as well stated as “All Processor Accesses are not created equally” – H/W and S/W Synergy – Methods and Mechanism:
- The developers in the audience, throughout the class, will wear different hats – first as a hard design engineer and look at platform design choices.
- Then they will wear the hat of driver developer and see the requirements and assumptions of driver on the hardware behavior.
- Then they will wear the hat of system architect and venture out to find out choices as how s/w and h/w can be in synergy and communicate – thereby potential optimum implementation can be made.
About the presenter: M Jay (Muthurajan Jayakumar) has worked with the DPDK team since 2009. He joined Intel in 1991 and has been in various roles and divisions: 64-bit CPU front side bus architect, 64 bit HAL developer, among others, before he joined the DPDK team. M Jay holds 21 US patents, both individually and jointly, all issued while working at Intel. M Jay was awarded the Intel Achievement Award in 2016, Intel's highest honor based on innovation and results.
Modern data lakes are now built on cloud storage, helping organizations leverage the scale and economics of object storage while simplifying overall data storage and analysis flow
Доклад посвящен архитектуре Tarantool и ее эволюции. Кирилл рассказал, почему важно располагать базу данных и сервер приложений в одном адресном пространстве, почему Tarantool сделали однопоточным и зачем базе-в-памяти нужен механизм хранения данных на диске. Затем Кирилл рассказал про последние наработки команды создателей Tarantool: зачем мы добавили синтаксис SQL и как это может решить ваши задачи.
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeAftab Hussain
Understanding variable roles in code has been found to be helpful by students
in learning programming -- could variable roles help deep neural models in
performing coding tasks? We do an exploratory study.
- These are slides of the talk given at InteNSE'23: The 1st International Workshop on Interpretability and Robustness in Neural Software Engineering, co-located with the 45th International Conference on Software Engineering, ICSE 2023, Melbourne Australia
More Related Content
Similar to OSDC 2018 | The Computer science behind a modern distributed data store by Max Neunhöffer
UnConference for Georgia Southern Computer Science March 31, 2015Christopher Curtin
I presented to the Georgia Southern Computer Science ACM group. Rather than one topic for 90 minutes, I decided to do an UnConference. I presented them a list of 8-9 topics, let them vote on what to talk about, then repeated.
Each presentation was ~8 minutes, (Except Career) and was by no means an attempt to explain the full concept or technology. Only to wake up their interest.
Basho and Riak at GOTO Stockholm: "Don't Use My Database."Basho Technologies
What are common use cases for NoSQL? When should I avoid NoSQL? When is RDBMS just fine?
This presentation, delivered at the GOTO NoSQL Roadshow events in London and Stockholm in November of 2011 by Basho co-founder and COO, Antony Falco, take a no-BS look at the tradeoffs one must make to gain the advantages offered by distributed databases like Riak.
Software and the Concurrency Revolution : NotesSubhajit Sahu
Highlighted notes of article while studying Concurrent Data Structures, CSE:
Software and the Concurrency Revolution
Herb Sutter
Software Architect, Microsoft
Software Development Consultant, www.gotw.ca/training
Herb Sutter is a prominent C++ expert. He is also a book author and was a columnist for Dr. Dobb's Journal. He joined Microsoft in 2002 as a platform evangelist for Visual C++ .NET, rising to lead software architect for C++/CLI.
Containers and Developer Defined Data Centers - Evan Powell - Keynote in Bang...CodeOps Technologies LLP
DevOps and Containers go hand in hand. DevOps industry is expected to benefit significantly benefit from the container eco-system and technology. This keynote talks about the challenges and opportunities around deploying containers into production use cases.
Cache Consistency – Requirements and its packet processing Performance implic...Michelle Holley
The audio starts at 9:00 due to a glitch while recording.
The topic can be as well stated as “All Processor Accesses are not created equally” – H/W and S/W Synergy – Methods and Mechanism:
- The developers in the audience, throughout the class, will wear different hats – first as a hard design engineer and look at platform design choices.
- Then they will wear the hat of driver developer and see the requirements and assumptions of driver on the hardware behavior.
- Then they will wear the hat of system architect and venture out to find out choices as how s/w and h/w can be in synergy and communicate – thereby potential optimum implementation can be made.
About the presenter: M Jay (Muthurajan Jayakumar) has worked with the DPDK team since 2009. He joined Intel in 1991 and has been in various roles and divisions: 64-bit CPU front side bus architect, 64 bit HAL developer, among others, before he joined the DPDK team. M Jay holds 21 US patents, both individually and jointly, all issued while working at Intel. M Jay was awarded the Intel Achievement Award in 2016, Intel's highest honor based on innovation and results.
Modern data lakes are now built on cloud storage, helping organizations leverage the scale and economics of object storage while simplifying overall data storage and analysis flow
Доклад посвящен архитектуре Tarantool и ее эволюции. Кирилл рассказал, почему важно располагать базу данных и сервер приложений в одном адресном пространстве, почему Tarantool сделали однопоточным и зачем базе-в-памяти нужен механизм хранения данных на диске. Затем Кирилл рассказал про последние наработки команды создателей Tarantool: зачем мы добавили синтаксис SQL и как это может решить ваши задачи.
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeAftab Hussain
Understanding variable roles in code has been found to be helpful by students
in learning programming -- could variable roles help deep neural models in
performing coding tasks? We do an exploratory study.
- These are slides of the talk given at InteNSE'23: The 1st International Workshop on Interpretability and Robustness in Neural Software Engineering, co-located with the 45th International Conference on Software Engineering, ICSE 2023, Melbourne Australia
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxrickgrimesss22
Discover the essential features to incorporate in your Winzo clone app to boost business growth, enhance user engagement, and drive revenue. Learn how to create a compelling gaming experience that stands out in the competitive market.
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
May Marketo Masterclass, London MUG May 22 2024.pdfAdele Miller
Can't make Adobe Summit in Vegas? No sweat because the EMEA Marketo Engage Champions are coming to London to share their Summit sessions, insights and more!
This is a MUG with a twist you don't want to miss.
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
Unlocking Business Potential: Tailored Technology Solutions by Prosigns
Discover how Prosigns, a leading technology solutions provider, partners with businesses to drive innovation and success. Our presentation showcases our comprehensive range of services, including custom software development, web and mobile app development, AI & ML solutions, blockchain integration, DevOps services, and Microsoft Dynamics 365 support.
Custom Software Development: Prosigns specializes in creating bespoke software solutions that cater to your unique business needs. Our team of experts works closely with you to understand your requirements and deliver tailor-made software that enhances efficiency and drives growth.
Web and Mobile App Development: From responsive websites to intuitive mobile applications, Prosigns develops cutting-edge solutions that engage users and deliver seamless experiences across devices.
AI & ML Solutions: Harnessing the power of Artificial Intelligence and Machine Learning, Prosigns provides smart solutions that automate processes, provide valuable insights, and drive informed decision-making.
Blockchain Integration: Prosigns offers comprehensive blockchain solutions, including development, integration, and consulting services, enabling businesses to leverage blockchain technology for enhanced security, transparency, and efficiency.
DevOps Services: Prosigns' DevOps services streamline development and operations processes, ensuring faster and more reliable software delivery through automation and continuous integration.
Microsoft Dynamics 365 Support: Prosigns provides comprehensive support and maintenance services for Microsoft Dynamics 365, ensuring your system is always up-to-date, secure, and running smoothly.
Learn how our collaborative approach and dedication to excellence help businesses achieve their goals and stay ahead in today's digital landscape. From concept to deployment, Prosigns is your trusted partner for transforming ideas into reality and unlocking the full potential of your business.
Join us on a journey of innovation and growth. Let's partner for success with Prosigns.
OpenMetadata Community Meeting - 5th June 2024OpenMetadata
The OpenMetadata Community Meeting was held on June 5th, 2024. In this meeting, we discussed about the data quality capabilities that are integrated with the Incident Manager, providing a complete solution to handle your data observability needs. Watch the end-to-end demo of the data quality features.
* How to run your own data quality framework
* What is the performance impact of running data quality frameworks
* How to run the test cases in your own ETL pipelines
* How the Incident Manager is integrated
* Get notified with alerts when test cases fail
Watch the meeting recording here - https://www.youtube.com/watch?v=UbNOje0kf6E
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Software Engineering, Software Consulting, Tech Lead, Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Transaction, Spring MVC, OpenShift Cloud Platform, Kafka, REST, SOAP, LLD & HLD.
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
In this slide, we show the simulation example and the way to compile this solver.
In this solver, the Helmholtz equation can be solved by helmholtzFoam. Also, the Helmholtz equation with uniformly dispersed bubbles can be simulated by helmholtzBubbleFoam.
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Crescat
Crescat is industry-trusted event management software, built by event professionals for event professionals. Founded in 2017, we have three key products tailored for the live event industry.
Crescat Event for concert promoters and event agencies. Crescat Venue for music venues, conference centers, wedding venues, concert halls and more. And Crescat Festival for festivals, conferences and complex events.
With a wide range of popular features such as event scheduling, shift management, volunteer and crew coordination, artist booking and much more, Crescat is designed for customisation and ease-of-use.
Over 125,000 events have been planned in Crescat and with hundreds of customers of all shapes and sizes, from boutique event agencies through to international concert promoters, Crescat is rigged for success. What's more, we highly value feedback from our users and we are constantly improving our software with updates, new features and improvements.
If you plan events, run a venue or produce festivals and you're looking for ways to make your life easier, then we have a solution for you. Try our software for free or schedule a no-obligation demo with one of our product specialists today at crescat.io
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
Listen to the keynote address and hear about the latest developments from Rachana Ananthakrishnan and Ian Foster who review the updates to the Globus Platform and Service, and the relevance of Globus to the scientific community as an automation platform to accelerate scientific discovery.
5. Resilience and Consensus
The Problem
A modern data store is distributed, because it needs to scale out and/or
be resilient.
6. Resilience and Consensus
The Problem
A modern data store is distributed, because it needs to scale out and/or
be resilient.
Different parts of the system need to agree on things.
7. Resilience and Consensus
The Problem
A modern data store is distributed, because it needs to scale out and/or
be resilient.
Different parts of the system need to agree on things.
Consensus is the art to achieve this as well as possible in software.
This is relatively easy, if things are good, but very hard, if:
8. Resilience and Consensus
The Problem
A modern data store is distributed, because it needs to scale out and/or
be resilient.
Different parts of the system need to agree on things.
Consensus is the art to achieve this as well as possible in software.
This is relatively easy, if things are good, but very hard, if:
the network has outages,
the network has dropped or delayed or duplicated packets,
disks fail (and come back with corrupt data),
machines fail (and come back with old data),
racks fail (and come back with or without data).
9. Resilience and Consensus
The Problem
A modern data store is distributed, because it needs to scale out and/or
be resilient.
Different parts of the system need to agree on things.
Consensus is the art to achieve this as well as possible in software.
This is relatively easy, if things are good, but very hard, if:
the network has outages,
the network has dropped or delayed or duplicated packets,
disks fail (and come back with corrupt data),
machines fail (and come back with old data),
racks fail (and come back with or without data).
(And we have not even talked about malicious attacks and enemy action.)
10. Paxos and Raft
Traditionally, one uses the Paxos Consensus Protocol (1998).
More recently, Raft (2013) has been proposed.
Paxos is a challenge to understand and to implement efficiently.
Various variants exist.
Raft is designed to be understandable.
11. Paxos and Raft
Traditionally, one uses the Paxos Consensus Protocol (1998).
More recently, Raft (2013) has been proposed.
Paxos is a challenge to understand and to implement efficiently.
Various variants exist.
Raft is designed to be understandable.
My advice:
First try to understand Paxos for some time (do not implement it!), then
enjoy the beauty of Raft,
12. Paxos and Raft
Traditionally, one uses the Paxos Consensus Protocol (1998).
More recently, Raft (2013) has been proposed.
Paxos is a challenge to understand and to implement efficiently.
Various variants exist.
Raft is designed to be understandable.
My advice:
First try to understand Paxos for some time (do not implement it!), then
enjoy the beauty of Raft, but do not implement it either!
13. Paxos and Raft
Traditionally, one uses the Paxos Consensus Protocol (1998).
More recently, Raft (2013) has been proposed.
Paxos is a challenge to understand and to implement efficiently.
Various variants exist.
Raft is designed to be understandable.
My advice:
First try to understand Paxos for some time (do not implement it!), then
enjoy the beauty of Raft, but do not implement it either!
Use some battle-tested implementation you trust!
14. Paxos and Raft
Traditionally, one uses the Paxos Consensus Protocol (1998).
More recently, Raft (2013) has been proposed.
Paxos is a challenge to understand and to implement efficiently.
Various variants exist.
Raft is designed to be understandable.
My advice:
First try to understand Paxos for some time (do not implement it!), then
enjoy the beauty of Raft, but do not implement it either!
Use some battle-tested implementation you trust!
But most importantly: DO NOT TRY TO INVENT YOUR OWN!
15. Raft in a slide
An odd number of servers each keep a persisted log of events.
16. Raft in a slide
An odd number of servers each keep a persisted log of events.
Everything is replicated to everybody.
17. Raft in a slide
An odd number of servers each keep a persisted log of events.
Everything is replicated to everybody.
They democratically elect a leader with absolute majority.
18. Raft in a slide
An odd number of servers each keep a persisted log of events.
Everything is replicated to everybody.
They democratically elect a leader with absolute majority.
Only the leader may append to the replicated log.
19. Raft in a slide
An odd number of servers each keep a persisted log of events.
Everything is replicated to everybody.
They democratically elect a leader with absolute majority.
Only the leader may append to the replicated log.
An append only counts when a majority has persisted and confirmed it.
20. Raft in a slide
An odd number of servers each keep a persisted log of events.
Everything is replicated to everybody.
They democratically elect a leader with absolute majority.
Only the leader may append to the replicated log.
An append only counts when a majority has persisted and confirmed it.
Very smart logic to ensure a unique leader and automatic recovery from failure.
21. Raft in a slide
An odd number of servers each keep a persisted log of events.
Everything is replicated to everybody.
They democratically elect a leader with absolute majority.
Only the leader may append to the replicated log.
An append only counts when a majority has persisted and confirmed it.
Very smart logic to ensure a unique leader and automatic recovery from failure.
It is all a lot of fun to get right, but it is proven to work.
22. Raft in a slide
An odd number of servers each keep a persisted log of events.
Everything is replicated to everybody.
They democratically elect a leader with absolute majority.
Only the leader may append to the replicated log.
An append only counts when a majority has persisted and confirmed it.
Very smart logic to ensure a unique leader and automatic recovery from failure.
It is all a lot of fun to get right, but it is proven to work.
One puts a key/value store on top, the log contains the changes.
25. Sorting
The Problem
Data stores need indexes. In practice, we need to sort things.
Most published algorithms are rubbish on modern hardware.
26. Sorting
The Problem
Data stores need indexes. In practice, we need to sort things.
Most published algorithms are rubbish on modern hardware.
The problem is no longer the
comparison computations but the data movement
.
27. Sorting
The Problem
Data stores need indexes. In practice, we need to sort things.
Most published algorithms are rubbish on modern hardware.
The problem is no longer the
comparison computations but the data movement
.
Since the time when I was a kid and have played with an Apple IIe,
compute power in one core has increased by ×20000
a single memory access by ×40
and now we have 32 cores in a CPU
this means computation has outpaced memory access by ×1280!
28. Idea for a parallel sorting algorithm: Merge Sort
Min−Heap:
sorted
merged
29. Idea for a parallel sorting algorithm: Merge Sort
Min−Heap:
sorted
merged
Nearly all comparisons hit the L2 cache!
30. Log structured merge trees (LSM-trees)
The Problem
People rightfully expect from a data store, that it
can hold more data than the available RAM,
works well with SSDs and spinning rust,
allows fast bulk inserts into large data sets, and
provides fast reads in a hot set that fits into RAM.
31. Log structured merge trees (LSM-trees)
The Problem
People rightfully expect from a data store, that it
can hold more data than the available RAM,
works well with SSDs and spinning rust,
allows fast bulk inserts into large data sets, and
provides fast reads in a hot set that fits into RAM.
Traditional B-tree based structures often fail to deliver with the last 2.
32. Log structured merge trees (LSM-trees)
(Source: http://www.benstopford.com/2015/02/14/log-structured-merge-trees/, Author: Ben Stopford, License: Creative Commons)
33. Log structured merge trees (LSM-trees)
LSM-trees — summary
writes first go into memtables,
all files are sorted and immutable,
compaction happens in the background,
merge sort can be used,
all writes use sequential I/O,
Bloom filters or Cuckoo filters for fast reads,
=⇒ good write throughput and reasonable read performance,
used in ArangoDB, BigTable, Cassandra, HBase, InfluxDB, LevelDB,
MongoDB, RocksDB, SQLite4 and WiredTiger, etc.
34. Hybrid Logical Clocks (HLC)
The Problem
Clocks in different nodes of distributed systems are not in sync.
35. Hybrid Logical Clocks (HLC)
The Problem
Clocks in different nodes of distributed systems are not in sync.
general relativity poses fundamental obstructions to synchronizity,
in practice, clock skew happens,
Google can use atomic clocks,
even with NTP (network time protocol) we have to live with ≈ 20ms.
36. Hybrid Logical Clocks (HLC)
The Problem
Clocks in different nodes of distributed systems are not in sync.
general relativity poses fundamental obstructions to synchronizity,
in practice, clock skew happens,
Google can use atomic clocks,
even with NTP (network time protocol) we have to live with ≈ 20ms.
Therefore, we cannot compare time stamps from different nodes!
37. Hybrid Logical Clocks (HLC)
The Problem
Clocks in different nodes of distributed systems are not in sync.
general relativity poses fundamental obstructions to synchronizity,
in practice, clock skew happens,
Google can use atomic clocks,
even with NTP (network time protocol) we have to live with ≈ 20ms.
Therefore, we cannot compare time stamps from different nodes!
Why would this help?
establish “happened after” relationship between events,
e.g. for conflict resolution, log sorting, detecting network delays,
time to live could be implemented easily.
38. Hybrid Logical Clocks (HLC)
The Idea
Every computer has a local clock, and we use NTP to synchronize.
39. Hybrid Logical Clocks (HLC)
The Idea
Every computer has a local clock, and we use NTP to synchronize.
If two events on different machines are linked by causality, the cause
should have a smaller time stamp than the effect.
40. Hybrid Logical Clocks (HLC)
The Idea
Every computer has a local clock, and we use NTP to synchronize.
If two events on different machines are linked by causality, the cause
should have a smaller time stamp than the effect.
causality ⇐⇒ a message is sent
Send a time stamp with every message. The HLC always returns a value
> max(local clock, largest time stamp ever seen).
41. Hybrid Logical Clocks (HLC)
The Idea
Every computer has a local clock, and we use NTP to synchronize.
If two events on different machines are linked by causality, the cause
should have a smaller time stamp than the effect.
causality ⇐⇒ a message is sent
Send a time stamp with every message. The HLC always returns a value
> max(local clock, largest time stamp ever seen).
Causality is preserved, time can “catch up” with logical time eventually.
http://muratbuffalo.blogspot.com.es/2014/07/hybrid-logical-clocks.html
42. Distributed ACID Transactions
Atomic either happens in its entirety or not at all
Consistent reading sees a consistent state, writing preser-
vers consistency
Isolated concurrent transactions do not see each other
Durable committed writes are preserved after shut-
down and crashes
43. Distributed ACID Transactions
Atomic either happens in its entirety or not at all
Consistent reading sees a consistent state, writing preser-
vers consistency
Isolated concurrent transactions do not see each other
Durable committed writes are preserved after shut-
down and crashes
(All relatively doable when transactions happen one after another!)
44. Distributed ACID Transactions
The Problem
In a distributed system:
How to make sure, that all nodes agree on whether
the transaction has happened? (Atomicity)
45. Distributed ACID Transactions
The Problem
In a distributed system:
How to make sure, that all nodes agree on whether
the transaction has happened? (Atomicity)
How to create a consistent snapshot across nodes? (Consistency)
46. Distributed ACID Transactions
The Problem
In a distributed system:
How to make sure, that all nodes agree on whether
the transaction has happened? (Atomicity)
How to create a consistent snapshot across nodes? (Consistency)
How to hide ongoing activities until commit? (Isolation)
47. Distributed ACID Transactions
The Problem
In a distributed system:
How to make sure, that all nodes agree on whether
the transaction has happened? (Atomicity)
How to create a consistent snapshot across nodes? (Consistency)
How to hide ongoing activities until commit? (Isolation)
How to handle lost nodes? (Durability)
48. Distributed ACID Transactions
The Problem
In a distributed system:
How to make sure, that all nodes agree on whether
the transaction has happened? (Atomicity)
How to create a consistent snapshot across nodes? (Consistency)
How to hide ongoing activities until commit? (Isolation)
How to handle lost nodes? (Durability)
We have to take replication, resilience and failover into account.
49. Distributed ACID Transactions
WITHOUT
Distributed databases without ACID transactions:
ArangoDB, BigTable, Couchbase, Datastax, Dynamo, Elastic, HBase,
MongoDB, RethinkDB, Riak, and lots more . . .
WITH
Distributed databases with ACID transactions:
(ArangoDB,) CockroachDB, FoundationDB, Spanner
50. Distributed ACID Transactions
WITHOUT
Distributed databases without ACID transactions:
ArangoDB, BigTable, Couchbase, Datastax, Dynamo, Elastic, HBase,
MongoDB, RethinkDB, Riak, and lots more . . .
WITH
Distributed databases with ACID transactions:
(ArangoDB,) CockroachDB, FoundationDB, Spanner
=⇒ Very few distributed engines promise ACID, because this is hard!
52. Distributed ACID Transactions
Basic Idea
Use Multi Version Concurrency Control (MVCC), i.e. multiple revisions of
a data item are kept.
Do writes and replication decentrally and distributed, without them
becoming visible from other transactions.
53. Distributed ACID Transactions
Basic Idea
Use Multi Version Concurrency Control (MVCC), i.e. multiple revisions of
a data item are kept.
Do writes and replication decentrally and distributed, without them
becoming visible from other transactions.
Then have some place, where there is a switch, which decides when the
transaction becomes visible.
54. Distributed ACID Transactions
Basic Idea
Use Multi Version Concurrency Control (MVCC), i.e. multiple revisions of
a data item are kept.
Do writes and replication decentrally and distributed, without them
becoming visible from other transactions.
Then have some place, where there is a switch, which decides when the
transaction becomes visible. These “switches” need to
be persisted somewhere (durability),
scale out (no bottleneck for commit/abort),
be replicated (no single point of failure),
be resilient in case of fail-over (fault-tolerance).
55. Distributed ACID Transactions
Basic Idea
Use Multi Version Concurrency Control (MVCC), i.e. multiple revisions of
a data item are kept.
Do writes and replication decentrally and distributed, without them
becoming visible from other transactions.
Then have some place, where there is a switch, which decides when the
transaction becomes visible. These “switches” need to
be persisted somewhere (durability),
scale out (no bottleneck for commit/abort),
be replicated (no single point of failure),
be resilient in case of fail-over (fault-tolerance).
Transaction visibility needs to be implemented (MVCC), time stamps play
a crucial role.