Shared data systems try hardly to satisfy data consistency, system availability and tolerance to network partitions.
In a distributed system it is impossible to simultaneously provide all these guarantees at any given moment in time.
The purpose of the talk is to show the mechanism used by data storage systems such as Dynamo and BigTable in order to satisfy two guarantees at a time.
A talk on static code analysis tools such as jshint, jscs, and eslint and how to use them to write good (stylish) code. Also introducing tools to enforce using the correct style via editorconfig or js-beautify to minimize efforts to write good code.
Coherence and consistency models in multiprocessor architectureUniversity of Pisa
Cache coherence and consistency model in multiprocessor architecture. These slide show the introduction of multiprocessor and cache multilevel and then describe the basic mechanism of coherence and consistency protocols. In particular the protocols describe are the following: snooping and directory protocols for the coherence part and sequential protocol for the consistency part. There are also example of (in)consistency and (in)coherence.
A talk on static code analysis tools such as jshint, jscs, and eslint and how to use them to write good (stylish) code. Also introducing tools to enforce using the correct style via editorconfig or js-beautify to minimize efforts to write good code.
Coherence and consistency models in multiprocessor architectureUniversity of Pisa
Cache coherence and consistency model in multiprocessor architecture. These slide show the introduction of multiprocessor and cache multilevel and then describe the basic mechanism of coherence and consistency protocols. In particular the protocols describe are the following: snooping and directory protocols for the coherence part and sequential protocol for the consistency part. There are also example of (in)consistency and (in)coherence.
A presentation on validity and reliability assessment of questionnaire in research. Also includes types of validity and reliability and steps in achieving the same.
Apache Spark: the next big thing? - StampedeCon 2014StampedeCon
Apache Spark: the next big thing? - StampedeCon 2014
Steven Borrelli
It’s been called the leading candidate to replace Hadoop MapReduce. Apache Spark uses fast in-memory processing and a simpler programming model to speed up analytics and has become one of the hottest technologies in Big Data.
In this talk we’ll discuss:
What is Apache Spark and what is it good for?
Spark’s Resilient Distributed Datasets
Spark integration with Hadoop, Hive and other tools
Real-time processing using Spark Streaming
The Spark shell and API
Machine Learning and Graph processing on Spark
Data Modelling is an important tool in the toolbox of a developer. By building and communicating a shared understanding of the domain they're working with, their applications and APIs are more useable and maintainable. However, as you scale up your technical teams, how do you keep these benefits whilst avoiding time-consuming meetings every time something new comes along? This talk reminds ourselves of key data modelling technique and how our use of Kafka changes and informs them. It then examines how these patterns change as more teams join your organisation and how Kafka comes into its own in this world.
MongoDB Europe 2016 - Using MongoDB to Build a Fast and Scalable Content Repo...MongoDB
MongoDB can be used in the Nuxeo Platform as a replacement for more traditional SQL databases. Nuxeo's content repository, which is the cornerstone of this open source enterprise content management platform, integrates completely with MongoDB for data storage. This presentation will explain the motivation for using MongoDB and will emphasize the different implementation choices driven by the very nature of a NoSQL datastore like MongoDB. Learn how Nuxeo integrated MongoDB into the platform which resulted in increased performance (including actual benchmarks) and better response to some use cases.
Svelte (adjective): Attractively thin, graceful, and stylishThe Software House
Bartosz Magier: Jak napisać w pełni reaktywną aplikację szybko i ładnie? Czy taka aplikacja będzie działać wydajnie? Przekonaj się, czym jest nowy JavaScriptowy framework Svelte, co oferuje i czy rzeczywiście jest taki „graceful and stylish” jak obiecuje jego nazwa.
A 2015 presentation to introduce users to Java profiling. The Yourkit Profiler is used for concrete examples. The following topics are covered:
1) When to profile
2) Profiler sampling
3) Profiler instrumentation
4) Where to Start
5) Macro vs micro benchmarking
PyLadies Talk: Learn to love the command line!Blanca Mancilla
This talks aims to uncover some of the magic powers of scripting and the command line.
I'll share with you some of my experience using the shell to schedule backups of a git repository or to find strings in files of unknown name and location.
And then you might see that it is a tough love!
php[world] 2016 - You Don’t Need Node.js - Async Programming in PHPAdam Englander
Asynchronous frameworks allow developers to build stateful protocols and Internet of Things applications without threading and forking. Python, Ruby, and Node.js have had asynchronous frameworks for over ten years. PHP is now starting to catch up with Icicle.io. Learn the basics concepts of event based programming, and how the event loop allows a single thread to process all the requests for your application.
A presentation on validity and reliability assessment of questionnaire in research. Also includes types of validity and reliability and steps in achieving the same.
Apache Spark: the next big thing? - StampedeCon 2014StampedeCon
Apache Spark: the next big thing? - StampedeCon 2014
Steven Borrelli
It’s been called the leading candidate to replace Hadoop MapReduce. Apache Spark uses fast in-memory processing and a simpler programming model to speed up analytics and has become one of the hottest technologies in Big Data.
In this talk we’ll discuss:
What is Apache Spark and what is it good for?
Spark’s Resilient Distributed Datasets
Spark integration with Hadoop, Hive and other tools
Real-time processing using Spark Streaming
The Spark shell and API
Machine Learning and Graph processing on Spark
Data Modelling is an important tool in the toolbox of a developer. By building and communicating a shared understanding of the domain they're working with, their applications and APIs are more useable and maintainable. However, as you scale up your technical teams, how do you keep these benefits whilst avoiding time-consuming meetings every time something new comes along? This talk reminds ourselves of key data modelling technique and how our use of Kafka changes and informs them. It then examines how these patterns change as more teams join your organisation and how Kafka comes into its own in this world.
MongoDB Europe 2016 - Using MongoDB to Build a Fast and Scalable Content Repo...MongoDB
MongoDB can be used in the Nuxeo Platform as a replacement for more traditional SQL databases. Nuxeo's content repository, which is the cornerstone of this open source enterprise content management platform, integrates completely with MongoDB for data storage. This presentation will explain the motivation for using MongoDB and will emphasize the different implementation choices driven by the very nature of a NoSQL datastore like MongoDB. Learn how Nuxeo integrated MongoDB into the platform which resulted in increased performance (including actual benchmarks) and better response to some use cases.
Svelte (adjective): Attractively thin, graceful, and stylishThe Software House
Bartosz Magier: Jak napisać w pełni reaktywną aplikację szybko i ładnie? Czy taka aplikacja będzie działać wydajnie? Przekonaj się, czym jest nowy JavaScriptowy framework Svelte, co oferuje i czy rzeczywiście jest taki „graceful and stylish” jak obiecuje jego nazwa.
A 2015 presentation to introduce users to Java profiling. The Yourkit Profiler is used for concrete examples. The following topics are covered:
1) When to profile
2) Profiler sampling
3) Profiler instrumentation
4) Where to Start
5) Macro vs micro benchmarking
PyLadies Talk: Learn to love the command line!Blanca Mancilla
This talks aims to uncover some of the magic powers of scripting and the command line.
I'll share with you some of my experience using the shell to schedule backups of a git repository or to find strings in files of unknown name and location.
And then you might see that it is a tough love!
php[world] 2016 - You Don’t Need Node.js - Async Programming in PHPAdam Englander
Asynchronous frameworks allow developers to build stateful protocols and Internet of Things applications without threading and forking. Python, Ruby, and Node.js have had asynchronous frameworks for over ten years. PHP is now starting to catch up with Icicle.io. Learn the basics concepts of event based programming, and how the event loop allows a single thread to process all the requests for your application.
Zend con 2016 - Asynchronous Prorgamming in PHPAdam Englander
Asynchronous frameworks allow developers to build stateful protocol and Internet of Things applications without threading and forking. Python, Ruby, and Node.js have had asynchronous frameworks for over ten years. PHP is now starting to catch up with Icicle.io. Learn the basic concepts of event-based programming and how the event loop allows a single thread to process all the requests for an application.
Hardware fails, applications fail, our code... well, it fails too (at least mine). To prevent software failure we test. Hardware failures are inevitable, so we write code that tolerates them, then we test. From tests we gather metrics and act upon them by improving parts that perform inadequately. Measuring right things at right places in an application is as much about good engineering practices and maintaining SLAs as it is about end user experience and may differentiate successful product from a failure.
In order to act on performance metrics such as max latency and consistent response times we need to know their accurate value. The problem with such metrics is that when using popular tools we get results that are not only inaccurate but also too optimistic.
During my presentation I will simulate services that require monitoring and show how gathered metrics differ from real numbers. All this while using what currently seems to be most popular metric pipeline - Graphite together with metrics.dropwizard.io library - and get completely false results. We will learn to tune it and get much better accuracy. We will use JMeter to measure latency and observe how falsely reassuring the results are. Finally I will show how HdrHistogram helps in gathering reliable metrics. We will also run tests measuring performance of different metric classes.
Azure makes substantial infrastructure capabilities available to you with just a click of a mouse, but this isn't the virtualization stack you are used to. HOSTING VP of Product Sean Brunton, will discuss the gotchas and nuances of the current Azure compute services to help you find success for production applications on Azure
Managing codebases and projects takes time, and time usually means money (especially with development resources). Using some of the methods discussed, we can help make ourselves and our teams more productive as we move from project to project, which saves time, money, and costly research time. We'll cover code complexity, reusability, and the dreaded 'refactoring' question.
Probabilistic algorithms for fun and pseudorandom profitTyler Treat
There's an increasing demand for real-time data ingestion and processing. Systems like Apache Kafka, Samza, and Storm have become popular for this reason. This type of high-volume, online data processing presents an interesting set of new challenges, namely, how do we drink from the firehose without getting drenched? Explore some of the fundamental primitives used in stream processing and, specifically, how we can use probabilistic methods to solve the problem.
An Intro to NoSQL Databases -- NoSQL databases will not become the new dominators. Relational will still be popular, and used in the majority of situations. They, however, will no longer be the automatic choice. (source : http://martinfowler.com/)
Similar to Consistency, Availability, Partition: Make Your Choice (20)
Relational databases have been the center of the world for many years although they suffer from a prefixed schema you have to adhere to. Now you have a choice: using a NoSQL database.
OrientDB is a NoSQL, multimodel and amazingly fast database since it can store 220,000 records per second on common hardware. This talk will show you some graph theory and the main advantages of using a graph database such as OrientDB.
Testing applications is one of the most important thing in a developer’s toolbox.
Sometime writing tests is not so straightforward and as time goes by you notice that you have to maintain your application and your tests too.
The purpose of this talk is to show how to test your domain at different layers with different test frameworks, in order to ease the development process and transform your codebase in a stronghold.
Every time you choose how to store data in your database, a lot of things happen under the hood.
Making the best choice is even more important in those applications that aim to high performance.
The purpose of the talk is to show how indexes work and how slightly changing their combinations can impact on the performance of your application.
Choosing the right way to process data might become a strategic and non trivial decision for many kind of applications. Especially in applications where an high percentage of the time is spent elaborating information behind the scenes.
There are different message queuing systems designed to manage and process data asynchronously. Using simple messages, it becomes possible to define many types of queue patterns, from the straightforward solution to the more complex one like routing, publisher/subscriber and topic.
The purpose of the talk is to show how to approach the different scenarios in php, adding value to your application.
How to decouple, how to develop, how to choose the correct technology, few DDD concepts, and why BDD is so important.
How to Think...
http://rome.codemotionworld.com/2014/wp-content/themes/codemotion/detail-talk.php?detail=75
These slides afford in shallow depth the index management question. There are some example on how your choice can change your relation in terms of I/O accesses
Everything you always wanted to know about forms* *but were afraid to askAndrea Giuliano
La componente dei Form di Symfony2 rende possibile la costruzione di diverse tipologie di form in modo del tutto semplice. La sua architettura flessibile e altamente scalabile permette di poter gestire strutture adatte ad ogni tipo di esigenza. Tuttavia, conoscere come utilizzare appieno tutta la sua potenza non è banale. In questo talk verrà trattato in profondità la componente Form di Symfony2, mostrando i suoi meccanismi di base e come utilizzarli per estenderli ed introdurre la propria logica di business, così da costruire form cuciti a misura delle tue necessità.
Scrivere codice pulito che funzioni.Pur sembrando un'apparente contraddizione che si cela nelle difficoltà della programmazione,lo sviluppo guidato dai test risponde a questa sfida con un paradosso:scrivere test prima dell'implementazione. Lo sviluppo software guidato dai test è una pratica della metodologia Agile che elimina le paure nella scrittura di codice e porta gli sviluppatori ad incrementare drasticamente la qualità delle loro applicazioni. Durante il talk verrà mostrato come l'approccio alla pratica cambi il proprio modo di programmare,rendendolo più divertente,affidabile e proficuo.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
3. W H AT A D I S T R I B U T E D S Y S T E M I S
“A distributed system is a software system in which
components located on networked computers communicate
and coordinate their actions by passing messages”
4. D I S T R I B U T E D S Y S T E M S
E X A M P L E S
5. D I S T R I B U T E D S Y S T E M S
R E P L I C AT I O N
6. R E P L I C AT E D S E R V I C E
P R O P E R T I E S
CONSISTENCY
AVAILABILITY
7. C O N S I S T E N C Y
The result of operations will be predictable
8. C O N S I S T E N C Y
Strong consistency
all replicas return the same value for the same object
9. C O N S I S T E N C Y
Strong consistency
all replicas return the same value for the same object
Weak consistency
different replicas can return different values for the same object
11. S T R O N G V S W E A K
C O N S I S T E N C Y
Strong consistency
Atomic, consistent, isolated, durable database
Weak consistency
Basically Available Soft-state Eventual consistency database
12. E X A M P L E
C O N S I S T E N C Y
put(price, 10)
13. E X A M P L E
C O N S I S T E N C Y
get(price)
price = 10
17. PA R T I T I O N T O L E R A N C E
continue to operate even in presence of partitions
18. PA R T I T I O N T O L E R A N C E
Network failure
groups at each side of a faulty entity network (switch, backbone)
Process failure
system split in two groups: correct nodes and crashed node
19. C A P T H E O R E M
“Of three properties of shared-data systems
(data consistency, system availability and
tolerance to network partitions) only two can
be achieved at any given moment in time.”
20. T H E P R O O F
C A P T H E O R E M
put(price, 10)
get(price)
price = 0
price = 0 price = 0
price = 0
no response
not consistent
not available
t2
t1
partition 1
partition 2
25. R E Q U I R E M E N T S
D Y N A M O
“customers should be able to view and add items
to their shopping cart even if disks are failing,
network routes are flapping, or data centers are
being destroyed by tornados.”
26. R E Q U I R E M E N T S
D Y N A M O
“customers should be able to view and add items
to their shopping cart even if disks are failing,
network routes are flapping, or data centers are
being destroyed by tornados.”
➡ reliable
➡ high scalable
➡ always available
27. S I M P L E I N T E R FA C E
D Y N A M O
get(key)
returns the object associated with the key and returns a
single object or a list of objects with conflicting versions
along with a context.
put(key, context, object)
determines where the replicas of the object should be
placed based on the associated key. The context
includes information such as the version of the object.
28. R E P L I C AT I O N : T H E C H O I C E
D Y N A M O
Synchronous replica coordination
‣ strong consistency
‣ availability tradeoff
Optimistic replication technique
‣ high availability
‣ conflicts probability
29. C O N F L I C T S : W H E N
D Y N A M O
At write time
‣ writes rejection probability
At read time
‣ “always writable” datastore
30. C O N F L I C T S : W H O
D Y N A M O
The data store
‣ e.g. “last write win” policy
The application
‣ resolution as implementation detail
31. A R I N G T O R U L E T H E M A L L
D Y N A M O
32. PA R T I T I O N I N G : T H E R I N G
D Y N A M O
A
B
C
DE
F
G
DATA
hash
33. R E P L I C AT I O N
D Y N A M O
A
B
C
DE
F
G
N = 3 D will store keys in the range (A, B], (B, C], (C, D]
DATA
hash
34. D ATA V E R S I O N I N G
D Y N A M O
put()
may return before the update has been propagated to
all replicas.
get()
subsequent get() may return an object that does not
have the latest update
36. R E C O N C I L I AT I O N
D Y N A M O
Syntactic reconciliation
‣ new version subsumes the previous
Semantic reconciliation
‣ conflicting versions of the same object
38. V E C T O R C L O C K
D Y N A M O
Definition
‣ list of (node, counter) pairs
39. V E C T O R C L O C K
D Y N A M O
Definition
‣ list of (node, counter) pairs
D1
[Sx,1]
write
handled by Sx
40. V E C T O R C L O C K
D Y N A M O
Definition
‣ list of (node, counter) pairs
D1
[Sx,1]
D2
[Sx,2]
write
handled by Sx
write
handled by Sx
41. V E C T O R C L O C K
D Y N A M O
Definition
‣ list of (node, counter) pairs
D1
[Sx,1]
D2
[Sx,2]
D3
[Sx,2], [Sy,1]
write
handled by Sx
write
handled by Sx
handled by Sywrite
42. V E C T O R C L O C K
D Y N A M O
Definition
‣ list of (node, counter) pairs
D1
[Sx,1]
D2
[Sx,2]
D3
[Sx,2], [Sy,1]
D4
[Sx,2], [Sz,1]
write
handled by Sx
write
handled by Sx
write
handled by Sy
write
handled by Sz
43. V E C T O R C L O C K
D Y N A M O
Definition
‣ list of (node, counter) pairs
D1
[Sx,1]
D2
[Sx,2]
D3
[Sx,2], [Sy,1]
D4
[Sx,2], [Sz,1]
D5 [Sx,3], [Sy,1], [Sz,1]
write
handled by Sx
write
handled by Sx
write
handled by Sy
write
handled by Sz
reconciled and
written by Sx
44. P U T ( ) A N D G E T ( )
D Y N A M O
R
‣ minimum number of nodes that must partecipate
in a read operation.
W
‣ minimum number of nodes that must participate
in a successful write operation
45. P U T ( ) A N D G E T ( )
D Y N A M O
put()
‣ the coordinator generates the vector clock for the new version and
writes the new version locally
‣ the new version is sent to N nodes
‣ the write is successful if W-1 nodes respond
get()
‣ the coordinator requests all existing versions of data
‣ the coordinator waits for R responses before returning the result
‣ the coordinator returns all the version causally unrelated
‣ the divergent versions are reconciled and written back
46. S L O P P Y Q U O R U M
D Y N A M O
A
B
C
DE
F
G
N = 3
47. W H Y I S A P ?
D Y N A M O
‣ requests served even if some replicas are not available
‣ if some node is down the write is stored to another node
‣ consistency conflicts resolved at read time or in the
background
‣ eventually, all the replicas will converge
‣ concurrent read/write operation can make distinct clients
see distinct versions of the same key
49. R E Q U I R E M E N T S
G O O G L E B I G TA B L E
‣ scale to petabyte of data
‣ thousand of machines
‣ high availability
‣ high performance
50. D ATA M O D E L
G O O G L E B I G TA B L E
‣ sparse, distributed, persistent multi-dimensional
sorted map
(row: string, column: string, time: int64) string
51. R O W S
G O O G L E B I G TA B L E
‣ arbitrary strings
‣ read/write operations are atomic
‣ data is maintained in lexicographic order by row key
‣ each row range is called a tablet
maps.google.com com.google.maps
52. C O L U M N S
G O O G L E B I G TA B L E
‣ columns keys are grouped into sets: column families
‣ a column family must be created before data can be
stored under any column key in that family
‣ column key named as family:qualifier
‣ access control and both disk and memory
accounting are performed at the column-family level
53. T I M E S TA M P S
G O O G L E B I G TA B L E
C O N T E N T S :
c o m . e x a m p l e
< h t m l > …
< h t m l > …
t 1
t 2
54. D ATA M O D E L : E X A M P L E
G O O G L E B I G TA B L E
L A N G U A G E : C O N T E N T S : A N C H O R : C N N S I . C O M A N C H R : M Y L O O K . C A
c o m . e x a m p l e e n
< ! D O C T Y P E
h t m l P U B L I C
…
c o m . c n n . w w w e n
< ! D O C T Y P E
h t m l P U B L I C
…
“ c n n " “ c n n . c o m ”
c o m . c n n . w w w / f o o e n
< ! D O C T Y P E
h t m l P U B L I C
…
column familiesrow keys
sortedrows
55. D I F F E R E N C E S W I T H R D B M S
G O O G L E B I G TA B L E
R D B M S B I G TA B L E
q u e r y l a n g u a g e s p e c i f i c a p i
j o i n s n o re f e re n t i a l i n t e g r i t y
e x p l i c i t s o r t i n g
s o r t i n g d e f i n e d a p r i o r i
i n t h e c o l u m n f a m i l y
56. A R C H I T E C T U R E
G O O G L E B I G TA B L E
Google File System (GFS)
‣ store data files and logs
Google SSTable
‣ store BigTable data
Chubby
‣ high-available distributed lock service
57. C O M P O N E N T S
G O O G L E B I G TA B L E
library
‣ linked into every client
one master server
‣ assigning tablets to tablet server
‣ detecting the addition and expiration of tablet servers
‣ balancing tablet-server load
‣ garbaging collection of files in GFS
‣ handling schema changes
many tablet servers
‣ manages 10 to 100 tablets
‣ handles read and write requests to the tablets
‣ splits tablets that have grown too large
58. C O M P O N E N T S
G O O G L E B I G TA B L E
Master server
Client
Tablet server Tablet server Tablet server
Metadata
read/write
59. S TA R T U P A N D G R O W T H
G O O G L E B I G TA B L E
Chubby file
Root tablet
1st Metadata tablet
other
metadata
tablets
UserTableN
UserTable1
…
…
…
…
…
…
…
…
…
…
…
60. TA B L E T A S S I G N M E N T
G O O G L E B I G TA B L E
tablet server
‣ when started, creates and acquires a lock in Chubby
master
‣ grabs a unique master lock in Chubby
‣ scans Chubby to find live tablet servers
‣ asks each tablet server to discover its tablets
‣ scans the Metadata table to learn the full set of tablets
‣ builds a set of unassigned tablet server, for future tablet
assignment
61. W H Y I S C P ?
G O O G L E B I G TA B L E
‣ master death cause services no longer functioning
‣ tablet server death cause tablets unavailable
‣ Chubby death cause BigTable inability to execute
synchronization operations and to serve client requests
‣ Google File System is a CP system
62. $ W H O A M I
Andrea Giuliano
@bit_shark
www.andreagiuliano.it
64. G. DeCandia et al. “Dynamo: Amazon’s Highly Available Key-value Store”
F. Chang et al. “Bigtable: A Distributed Storage System for Structured Data”
Assets:
https://farm1.staticflickr.com/41/86744006_0026864df8_b_d.jpg
https://farm9.staticflickr.com/8305/7883634326_4e51a1a320_b_d.jpg
https://farm5.staticflickr.com/4145/4958650244_65b2eddffc_b_d.jpg
https://farm4.staticflickr.com/3677/10023456065_e54212c52e_b_d.jpg
https://farm4.staticflickr.com/3076/2871264822_261dafa44c_o_d.jpg
https://farm1.staticflickr.com/7/6111406_30005bdae5_b_d.jpg
https://farm4.staticflickr.com/3928/15416585502_92d5e608c7_b_d.jpg
https://farm8.staticflickr.com/7046/6873109431_d3b5199f7d_b_d.jpg
https://farm4.staticflickr.com/3007/2835755867_c530b0e0c6_o_d.jpg
https://farm3.staticflickr.com/2788/4202444169_2079db9580_o_d.jpg
https://farm1.staticflickr.com/55/129619657_907b480c7c_b_d.jpg
https://farm5.staticflickr.com/4046/4368269562_b3e05e3f06_b_d.jpg
https://farm8.staticflickr.com/7344/12137775834_d0cecc5004_k_d.jpg
https://farm5.staticflickr.com/4073/4895191036_1cb9b58d75_b_d.jpg
https://farm4.staticflickr.com/3144/3025249284_b77dec2d29_o_d.jpg
https://www.flickr.com/photos/avardwoolaver/7137096221
R E F E R E N C E S