Overview of Cassandra architecture. Learn about how data is read and written into a Cassandra cluster. Internal gossip protocol. Some key data structure Cassandra uses like bloom filters, consistent hashing.
Flink is a framework for distributed stream and batch data processing. It defines workflows as directed acyclic graphs (DAGs) of tasks that can be distributed across worker nodes for scalability. It supports both bounded and unbounded streams, maintains fault tolerance through snapshots, and is capable of exactly-once processing through stream alignment during checkpoints. The master coordinates scheduling while workers execute tasks, and stateful operations are handled through windows that collect events over time or size thresholds.
A coordination service like Zookeeper helps distributed applications coordinate by providing common services like synchronization, configuration sharing, naming, and leader election. Zookeeper uses an ensemble of servers running as a cluster. It stores data in a hierarchical namespace of znodes. Clients can read and write znodes, set watches on znodes to get notified of changes, and rely on Zookeeper to handle session and server failures in a transparent way. Some common usage recipes for Zookeeper include barriers for synchronization, cluster management using ephemeral znodes, queues using sequential znodes, locks for mutual exclusion, and leader election.
The document discusses various approaches for debugging distributed systems, including offline and online methods. Offline methods discussed are logging using libraries like liblog, pervasive debuggers, time travel using virtual machines, and model checking using tools like MaceMC. Online methods discussed are D3S and CrystalBall. The document provides details on how several of these approaches work, including liblog which intercepts system calls to log data, time-travel virtual machines which take checkpoints to replay execution, and how model checkers represent distributed systems as state machines to check properties.
This document discusses stateful streaming data pipelines using Apache Apex. It introduces Apache Apex and describes its key components like tuples, operators, and the directed acyclic graph (DAG) structure. It then discusses challenges around checkpointing large operator state and introduces managed state and spillable data structures as solutions. Managed state incrementally checkpoints state to disk and allows configuring memory thresholds. Spillable data structures decouple data from serialization and provide map, list, and set interfaces to stored data. Examples demonstrate building complex data structures on top of managed state.
1. The document provides solutions to problems regarding database replication.
2. For a read-only replicated database, availability improves as more replicas are added. However, for an update-only replicated database, availability can decrease if the replication protocol requires updating all replicas for a transaction to commit.
3. The replication protocol described, where transactions execute on one server and propagate updates to the other server within the transaction boundary using two-phase locking and two-phase commit, does not provide one-copy serializability. A history is provided as a counterexample.
Flink Forward Berlin 2017: Tzu-Li (Gordon) Tai - Managing State in Apache FlinkFlink Forward
Over the past year, we've seen users build entire event-driven applications such as social networks on top of Apache Flink (Drivetribe.com), elevating the importance of state management in Flink to a whole new level. Users are placing more and more data as state in Flink, using it as a replacement to conventional databases. With such mission-critical data entrusted to Flink, we need to provide similar capabilities of a database to users. One of such capabilities is being flexible in how data is persisted and represented. Specifically, how can I change how my state is serialized and stored, or even the schema of my state, as business logic changes over time? In this talk, we'll provide details on the latest state management features in Flink that allows users to do exactly that. We'll talk about how Flink manages the state for you, how it provides flexibility to the user to adapt to evolving state serialization formats and schemas, and the best practices when working with it.
Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...Flink Forward
The document discusses Flink's use of internal data structures to efficiently support checkpointing. It describes how Flink uses RocksDB, a log-structured merge tree database, as a backend to support asynchronous and incremental checkpoints. RocksDB allows checkpoints to be taken with low overhead by creating immutable snapshots of the on-disk data structures. It also facilitates incremental checkpoints by efficiently detecting state changes between checkpoints based on the creation and deletion of immutable sorted string tables. The document provides an example to illustrate how RocksDB integrates with the distributed filesystem and job manager to support incremental checkpointing. It also discusses how Flink uses a copy-on-write hash map approach with the heap state backend to support asynchronous checkpoints while detecting state
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward
This talk shares experiences from deploying and tuning Flink steam processing applications for very large scale. We share lessons learned from users, contributors, and our own experiments about running demanding streaming jobs at scale. The talk will explain what aspects currently render a job as particularly demanding, show how to configure and tune a large scale Flink job, and outline what the Flink community is working on to make the out-of-the-box for experience as smooth as possible. We will, for example, dive into - analyzing and tuning checkpointing - selecting and configuring state backends - understanding common bottlenecks - understanding and configuring network parameters
Flink is a framework for distributed stream and batch data processing. It defines workflows as directed acyclic graphs (DAGs) of tasks that can be distributed across worker nodes for scalability. It supports both bounded and unbounded streams, maintains fault tolerance through snapshots, and is capable of exactly-once processing through stream alignment during checkpoints. The master coordinates scheduling while workers execute tasks, and stateful operations are handled through windows that collect events over time or size thresholds.
A coordination service like Zookeeper helps distributed applications coordinate by providing common services like synchronization, configuration sharing, naming, and leader election. Zookeeper uses an ensemble of servers running as a cluster. It stores data in a hierarchical namespace of znodes. Clients can read and write znodes, set watches on znodes to get notified of changes, and rely on Zookeeper to handle session and server failures in a transparent way. Some common usage recipes for Zookeeper include barriers for synchronization, cluster management using ephemeral znodes, queues using sequential znodes, locks for mutual exclusion, and leader election.
The document discusses various approaches for debugging distributed systems, including offline and online methods. Offline methods discussed are logging using libraries like liblog, pervasive debuggers, time travel using virtual machines, and model checking using tools like MaceMC. Online methods discussed are D3S and CrystalBall. The document provides details on how several of these approaches work, including liblog which intercepts system calls to log data, time-travel virtual machines which take checkpoints to replay execution, and how model checkers represent distributed systems as state machines to check properties.
This document discusses stateful streaming data pipelines using Apache Apex. It introduces Apache Apex and describes its key components like tuples, operators, and the directed acyclic graph (DAG) structure. It then discusses challenges around checkpointing large operator state and introduces managed state and spillable data structures as solutions. Managed state incrementally checkpoints state to disk and allows configuring memory thresholds. Spillable data structures decouple data from serialization and provide map, list, and set interfaces to stored data. Examples demonstrate building complex data structures on top of managed state.
1. The document provides solutions to problems regarding database replication.
2. For a read-only replicated database, availability improves as more replicas are added. However, for an update-only replicated database, availability can decrease if the replication protocol requires updating all replicas for a transaction to commit.
3. The replication protocol described, where transactions execute on one server and propagate updates to the other server within the transaction boundary using two-phase locking and two-phase commit, does not provide one-copy serializability. A history is provided as a counterexample.
Flink Forward Berlin 2017: Tzu-Li (Gordon) Tai - Managing State in Apache FlinkFlink Forward
Over the past year, we've seen users build entire event-driven applications such as social networks on top of Apache Flink (Drivetribe.com), elevating the importance of state management in Flink to a whole new level. Users are placing more and more data as state in Flink, using it as a replacement to conventional databases. With such mission-critical data entrusted to Flink, we need to provide similar capabilities of a database to users. One of such capabilities is being flexible in how data is persisted and represented. Specifically, how can I change how my state is serialized and stored, or even the schema of my state, as business logic changes over time? In this talk, we'll provide details on the latest state management features in Flink that allows users to do exactly that. We'll talk about how Flink manages the state for you, how it provides flexibility to the user to adapt to evolving state serialization formats and schemas, and the best practices when working with it.
Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...Flink Forward
The document discusses Flink's use of internal data structures to efficiently support checkpointing. It describes how Flink uses RocksDB, a log-structured merge tree database, as a backend to support asynchronous and incremental checkpoints. RocksDB allows checkpoints to be taken with low overhead by creating immutable snapshots of the on-disk data structures. It also facilitates incremental checkpoints by efficiently detecting state changes between checkpoints based on the creation and deletion of immutable sorted string tables. The document provides an example to illustrate how RocksDB integrates with the distributed filesystem and job manager to support incremental checkpointing. It also discusses how Flink uses a copy-on-write hash map approach with the heap state backend to support asynchronous checkpoints while detecting state
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward
This talk shares experiences from deploying and tuning Flink steam processing applications for very large scale. We share lessons learned from users, contributors, and our own experiments about running demanding streaming jobs at scale. The talk will explain what aspects currently render a job as particularly demanding, show how to configure and tune a large scale Flink job, and outline what the Flink community is working on to make the out-of-the-box for experience as smooth as possible. We will, for example, dive into - analyzing and tuning checkpointing - selecting and configuring state backends - understanding common bottlenecks - understanding and configuring network parameters
Locks are used in distributed systems to coordinate access to shared resources and ensure consistency. There are different types of locks like read/write locks that can be granted. A distributed lock manager implements locking and allows processes to acquire locks on resources in a hierarchy. This prevents issues like lost updates and deadlocks. Examples of distributed lock managers include Chubby, ZooKeeper and Redis.
This document discusses the Pulsar connector for Apache Flink 1.14. It provides an overview of StreamNative, which offers both stream storage with Apache Pulsar and stream processing with Flink. It then covers the timeline of contributions to the Pulsar connector for Flink and how it has evolved. Finally, it describes the design of the new Pulsar source connector for Flink that uses the FLIP-27 source interface, including how it handles Pulsar subscription modes and implements split enumeration, reading, and processing in a way that supports both batch and streaming workloads.
Cassandra uses a coordinator node to handle read and write requests from clients. The coordinator determines which replica nodes contain the requested data and sends the request to those nodes. For reads, the coordinator may send direct read requests to some replicas and digest requests to others depending on the consistency level. If inconsistencies are found, a read repair will update out-of-date replicas in the background. The consistency level controls how many replicas must respond for the operation to be considered successful and whether read repairs are done synchronously or asynchronously. Eventual consistency is achieved when the consistency level is set low enough that stale reads may occur.
GopherCon 2017 - Writing Networking Clients in Go: The Design & Implementati...wallyqs
The NATS Go client is the canonical implementation of a client for the NATS Messaging System, and from the beginning it was designed for high performance. In this talk, we will cover its APIs and dissect how the client internal engine works to get the most out of Go to achieve maximum throughput.
Cassandra is a distributed database that provides consistency, availability, and partition tolerance. It uses consistent hashing to distribute data across nodes and achieve high availability even if some nodes fail. Cassandra uses virtual nodes and replication to improve data distribution and allow adding or removing nodes without complex rebalancing.
The document discusses scaling state management in Apache Flink streaming applications to very large state. It describes how Flink uses state sharding and increasing operator parallelism to scale stateful computation. For fault tolerance, it discusses scaling checkpointing by making checkpoints asynchronous and less frequent, and scaling recovery by replicating state so fewer operators need recovery. It presents work in progress on incremental checkpointing and recovery to further optimize state management for large, stateful streaming applications.
The document discusses various topics related to processes and threads including thread usage in nondistributed systems, multithreaded server models, the X-Window system, client-side software for distribution transparency, object adapters, code migration in heterogeneous systems, and software agents in distributed systems. It provides details on thread implementation, reasons for migrating code, models for code migration, and agent communication languages. Key concepts covered include context switching, multithreaded servers, binding of clients to servers, object registration and activation policies, maintaining a migration stack, and FIPA ACL message types and examples.
Logging Last Resource Optimization for Distributed Transactions in Oracle We...Gera Shegalov
Logging Last Resource Optimization for Distributed Transactions in Oracle WebLogic Server describes optimizing distributed transactions by designating the last resource as a non-XA "logging last resource" (LLR). This allows skipping XA protocol calls for the last resource, reducing latency. The transaction manager logs to the LLR table, which acts as the combined transaction log. If the LLR commit succeeds, other resources are committed; if it fails, the transaction aborts globally. This provides the same ACID guarantees with lower overhead compared to a standard two-phase commit protocol.
Flink Forward SF 2017: Till Rohrmann - Redesigning Apache Flink’s Distributed...Flink Forward
As stream processing engines become more and more popular and are used in different environments, the demand to support different deployment scenarios increases. Depending on the user's infrastructure, a stream processor might be run on a bare metal cluster in standalone mode, deployed via Apache Yarn and Mesos, or run in a containerized environment. In order to fulfill the requirements of different deployment options and to provide enough flexibility for the future, the Flink community has recently started to redesign Flink's distributed architecture. This talk will explain the limitations of the old architecture and how they are solved with the new design. We will present the new building blocks of a Flink cluster and demonstrate, using the example of Flink's Mesos and Docker support, how they can be combined to run Flink nearly everywhere.
This document summarizes an Elixir concurrency workshop that covers processes, the actor model, sending and receiving messages, spawning processes, linking processes, and the OTP behaviors GenServer and Supervisor. It includes code examples of spawning processes, sending/receiving messages, implementing a basic GenServer, and exercises to implement a stack as a GenServer and refactor a server-client example to use GenServer.
The document discusses data modeling goals and examples for Cassandra. It provides guidance on keeping related data together on disk, avoiding normalization, and modeling time series data. Examples covered include mapping time series data points to Cassandra rows and columns, querying time slices, bucketing data, and eventually consistent transaction logging to provide atomicity. The document aims to help with common Cassandra modeling questions and patterns.
Transactions and Concurrency Control in distributed systems. Transaction properties, classification, and transaction implementation. Flat, Nested, and Distributed transactions. Inconsistent Retrievals, Lost Update, Dirty Read, and Premature Writes Problem
Building your own Distributed System The easy way - Cassandra Summit EU 2014Kévin LOVATO
Although Cassandra is well known for its ability to scale and handle heavy load, the team at Abc Arbitrage has preferred to expose its capacity to act as a distributed system.
In this presentation, Kévin Lovato, Software Engineer, will focus on the creation of their home-made Service Bus's Directory which relies on Cassandra to behave as a full-fledged distributed system.
Everything You Thought You Already Knew About OrchestrationLaura Frank Tacho
Do you understand how quorum, consensus, leader election, and different scheduling algorithms can impact your running application? Could you explain these concepts to the rest of your team? Come learn about the algorithms that power all modern container orchestration platforms, and walk away with actionable steps to keep your highly available services highly available.
Chapter 12 transactions and concurrency controlAbDul ThaYyal
This document provides an overview and summary of key concepts related to transactions and concurrency control in distributed systems:
- Transactions allow a sequence of operations to be atomic and isolated despite crashes or concurrent operations. They ensure objects remain in a consistent state.
- Concurrency control techniques like locking and timestamp ordering ensure transactions are isolated and avoid problems like lost updates or inconsistent retrievals that could occur without synchronization.
- Transactions must commit atomically so their effects are durable even after crashes, or abort with no effect. Serializability ensures transactions have an effect equivalent to running serially one at a time.
MaxScale uses an asynchronous and multi-threaded architecture to route client queries to backend database servers. Each thread creates its own epoll instance to monitor file descriptors for I/O events, avoiding locking between threads. Listening sockets are added to a global epoll file descriptor that notifies threads when clients connect, allowing connections to be distributed evenly across threads. This architecture improves performance over the previous single epoll instance approach.
Seattle Cassandra Meetup - Cassandra 1.2 - Eddie Satterlybtoddb
The document summarizes the new features in Cassandra 1.2, including concurrent schema changes, virtual nodes, off-heap storage of metadata, improved disk failure handling, query profiling and tracing tools, atomic batches, enhancements to CQL3 including ALTER KEYSPACE, and the introduction of collections for storing multiple items in a column.
This document provides an overview of Cassandra, including its data model, APIs, architecture, partitioning, replication, consistency, failure handling, and local persistence. Cassandra is a distributed database modeled after Amazon's Dynamo and Google's Bigtable. It uses a gossip-based protocol for cluster management and provides tunable consistency levels.
Cassandra & Python - Springfield MO User GroupAdam Hutson
Adam Hutson gave an overview of Cassandra and how to use it with Python. Key points include:
- Cassandra is a distributed database with no single point of failure and linear scalability. It favors availability over consistency.
- The Python driver allows connecting to Cassandra clusters and executing queries using prepared statements, batches, and custom consistency levels.
- Best practices include reusing a single session object, specifying keyspaces, authorizing connections, and shutting down clusters to avoid resource leaks.
Locks are used in distributed systems to coordinate access to shared resources and ensure consistency. There are different types of locks like read/write locks that can be granted. A distributed lock manager implements locking and allows processes to acquire locks on resources in a hierarchy. This prevents issues like lost updates and deadlocks. Examples of distributed lock managers include Chubby, ZooKeeper and Redis.
This document discusses the Pulsar connector for Apache Flink 1.14. It provides an overview of StreamNative, which offers both stream storage with Apache Pulsar and stream processing with Flink. It then covers the timeline of contributions to the Pulsar connector for Flink and how it has evolved. Finally, it describes the design of the new Pulsar source connector for Flink that uses the FLIP-27 source interface, including how it handles Pulsar subscription modes and implements split enumeration, reading, and processing in a way that supports both batch and streaming workloads.
Cassandra uses a coordinator node to handle read and write requests from clients. The coordinator determines which replica nodes contain the requested data and sends the request to those nodes. For reads, the coordinator may send direct read requests to some replicas and digest requests to others depending on the consistency level. If inconsistencies are found, a read repair will update out-of-date replicas in the background. The consistency level controls how many replicas must respond for the operation to be considered successful and whether read repairs are done synchronously or asynchronously. Eventual consistency is achieved when the consistency level is set low enough that stale reads may occur.
GopherCon 2017 - Writing Networking Clients in Go: The Design & Implementati...wallyqs
The NATS Go client is the canonical implementation of a client for the NATS Messaging System, and from the beginning it was designed for high performance. In this talk, we will cover its APIs and dissect how the client internal engine works to get the most out of Go to achieve maximum throughput.
Cassandra is a distributed database that provides consistency, availability, and partition tolerance. It uses consistent hashing to distribute data across nodes and achieve high availability even if some nodes fail. Cassandra uses virtual nodes and replication to improve data distribution and allow adding or removing nodes without complex rebalancing.
The document discusses scaling state management in Apache Flink streaming applications to very large state. It describes how Flink uses state sharding and increasing operator parallelism to scale stateful computation. For fault tolerance, it discusses scaling checkpointing by making checkpoints asynchronous and less frequent, and scaling recovery by replicating state so fewer operators need recovery. It presents work in progress on incremental checkpointing and recovery to further optimize state management for large, stateful streaming applications.
The document discusses various topics related to processes and threads including thread usage in nondistributed systems, multithreaded server models, the X-Window system, client-side software for distribution transparency, object adapters, code migration in heterogeneous systems, and software agents in distributed systems. It provides details on thread implementation, reasons for migrating code, models for code migration, and agent communication languages. Key concepts covered include context switching, multithreaded servers, binding of clients to servers, object registration and activation policies, maintaining a migration stack, and FIPA ACL message types and examples.
Logging Last Resource Optimization for Distributed Transactions in Oracle We...Gera Shegalov
Logging Last Resource Optimization for Distributed Transactions in Oracle WebLogic Server describes optimizing distributed transactions by designating the last resource as a non-XA "logging last resource" (LLR). This allows skipping XA protocol calls for the last resource, reducing latency. The transaction manager logs to the LLR table, which acts as the combined transaction log. If the LLR commit succeeds, other resources are committed; if it fails, the transaction aborts globally. This provides the same ACID guarantees with lower overhead compared to a standard two-phase commit protocol.
Flink Forward SF 2017: Till Rohrmann - Redesigning Apache Flink’s Distributed...Flink Forward
As stream processing engines become more and more popular and are used in different environments, the demand to support different deployment scenarios increases. Depending on the user's infrastructure, a stream processor might be run on a bare metal cluster in standalone mode, deployed via Apache Yarn and Mesos, or run in a containerized environment. In order to fulfill the requirements of different deployment options and to provide enough flexibility for the future, the Flink community has recently started to redesign Flink's distributed architecture. This talk will explain the limitations of the old architecture and how they are solved with the new design. We will present the new building blocks of a Flink cluster and demonstrate, using the example of Flink's Mesos and Docker support, how they can be combined to run Flink nearly everywhere.
This document summarizes an Elixir concurrency workshop that covers processes, the actor model, sending and receiving messages, spawning processes, linking processes, and the OTP behaviors GenServer and Supervisor. It includes code examples of spawning processes, sending/receiving messages, implementing a basic GenServer, and exercises to implement a stack as a GenServer and refactor a server-client example to use GenServer.
The document discusses data modeling goals and examples for Cassandra. It provides guidance on keeping related data together on disk, avoiding normalization, and modeling time series data. Examples covered include mapping time series data points to Cassandra rows and columns, querying time slices, bucketing data, and eventually consistent transaction logging to provide atomicity. The document aims to help with common Cassandra modeling questions and patterns.
Transactions and Concurrency Control in distributed systems. Transaction properties, classification, and transaction implementation. Flat, Nested, and Distributed transactions. Inconsistent Retrievals, Lost Update, Dirty Read, and Premature Writes Problem
Building your own Distributed System The easy way - Cassandra Summit EU 2014Kévin LOVATO
Although Cassandra is well known for its ability to scale and handle heavy load, the team at Abc Arbitrage has preferred to expose its capacity to act as a distributed system.
In this presentation, Kévin Lovato, Software Engineer, will focus on the creation of their home-made Service Bus's Directory which relies on Cassandra to behave as a full-fledged distributed system.
Everything You Thought You Already Knew About OrchestrationLaura Frank Tacho
Do you understand how quorum, consensus, leader election, and different scheduling algorithms can impact your running application? Could you explain these concepts to the rest of your team? Come learn about the algorithms that power all modern container orchestration platforms, and walk away with actionable steps to keep your highly available services highly available.
Chapter 12 transactions and concurrency controlAbDul ThaYyal
This document provides an overview and summary of key concepts related to transactions and concurrency control in distributed systems:
- Transactions allow a sequence of operations to be atomic and isolated despite crashes or concurrent operations. They ensure objects remain in a consistent state.
- Concurrency control techniques like locking and timestamp ordering ensure transactions are isolated and avoid problems like lost updates or inconsistent retrievals that could occur without synchronization.
- Transactions must commit atomically so their effects are durable even after crashes, or abort with no effect. Serializability ensures transactions have an effect equivalent to running serially one at a time.
MaxScale uses an asynchronous and multi-threaded architecture to route client queries to backend database servers. Each thread creates its own epoll instance to monitor file descriptors for I/O events, avoiding locking between threads. Listening sockets are added to a global epoll file descriptor that notifies threads when clients connect, allowing connections to be distributed evenly across threads. This architecture improves performance over the previous single epoll instance approach.
Seattle Cassandra Meetup - Cassandra 1.2 - Eddie Satterlybtoddb
The document summarizes the new features in Cassandra 1.2, including concurrent schema changes, virtual nodes, off-heap storage of metadata, improved disk failure handling, query profiling and tracing tools, atomic batches, enhancements to CQL3 including ALTER KEYSPACE, and the introduction of collections for storing multiple items in a column.
This document provides an overview of Cassandra, including its data model, APIs, architecture, partitioning, replication, consistency, failure handling, and local persistence. Cassandra is a distributed database modeled after Amazon's Dynamo and Google's Bigtable. It uses a gossip-based protocol for cluster management and provides tunable consistency levels.
Cassandra & Python - Springfield MO User GroupAdam Hutson
Adam Hutson gave an overview of Cassandra and how to use it with Python. Key points include:
- Cassandra is a distributed database with no single point of failure and linear scalability. It favors availability over consistency.
- The Python driver allows connecting to Cassandra clusters and executing queries using prepared statements, batches, and custom consistency levels.
- Best practices include reusing a single session object, specifying keyspaces, authorizing connections, and shutting down clusters to avoid resource leaks.
DeepSort is a 'scalable and efficiency-optimized distributed general sorting engine.’ DeepSort enables a fluent data flow that shares the limited memory space and minimizes data movement, which makes it to be highly efficient at a large scale.
Drivers connect applications to Cassandra clusters and maintain connections to nodes. They probe clusters to discover nodes, token ranges, and latency. Drivers are data-aware and can route queries to appropriate replicas or fail over if needed. Cassandra clusters can span multiple data centers for redundancy, workload separation, and geographic distribution of data and queries. Configuration files like cassandra.yaml and cassandra-env.sh are used to configure memory, data storage, caching, and other settings. Cassandra clusters should be provisioned on commodity servers using tools like cassandra-stress to test workloads and estimate needed nodes.
Lab Seminar 2009 12 01 Message Drop Reduction And Movementtharindanv
The document describes a proposed cluster-based ad hoc network for a home environment using description-based addressing and routing. Key points:
- Nodes are clustered based on their physical locations in the home. Cluster heads route messages between clusters.
- Descriptions like "kitchen television" are used for addressing instead of IP addresses. Routing uses partial name matches to route in the general direction of the destination.
- A directional routing cache stores partial names of other clusters to route between clusters not directly connected. It is divided proportionally between outgoing links.
- Simulations show this approach achieves 100% delivery success rate with low message misdirection compared to a single non-directional cache. Introdu
Handling Data in Mega Scale Web SystemsVineet Gupta
The document discusses several challenges faced by large-scale web companies in managing enormous and rapidly growing amounts of data. It provides examples of architectures developed by companies like Google, Amazon, Facebook and others to distribute data and queries across thousands of servers. Key approaches discussed include distributed databases, data partitioning, replication, and eventual consistency.
Counting and sorting are basic tasks that distributed systems rely on. The document discusses different approaches for distributed counting and sorting, including software combining trees, counting networks, and sorting networks. Counting networks like bitonic and periodic networks have depth of O(log2w) where w is the network width. Sorting networks can sort in the same time complexity by exploiting an isomorphism between counting and sorting networks. Sample sorting is also discussed as a way to sort large datasets across multiple threads.
The document discusses several networking concepts:
- The OSI reference model created standardized layers for networking.
- Subnetting networks optimizes performance by dividing large networks into smaller logical subnets. Methods for determining subnet masks and allocating addresses are covered.
- RAID (redundant array of independent disks) systems provide data redundancy and improved performance through techniques like disk striping, mirroring, and parity. The various RAID levels trade off these properties differently.
- Active Directory and DNS are protocols for centralized network management and translation of names to network addresses respectively. DHCP dynamically allocates IP addresses to devices on a network. IOPS is a metric for storage performance.
The key difference between distributed and uniprocessor systems is interprocess communication in distributed systems. The OSI model defines layers for networking including physical, data link, network, transport, and application layers. Remote Procedure Call (RPC) allows calling procedures on remote systems similarly to local calls by marshalling parameters and results. Group communication enables one-to-many and one-to-all communication using multicast and broadcast. Asynchronous Transfer Mode (ATM) networks use fixed size cells over virtual circuits to efficiently support both constant and bursty network traffic.
Lab Seminar 2009 06 17 Description Based Ad Hoc Networkstharindanv
This document proposes a description-based addressing and routing approach for cluster-based ad-hoc networks in home environments. It clusters nodes based on their physical locations and uses description-based identifiers and routing. Messages are routed towards destination descriptions stored in directional buffers. Simulation results showed this approach achieved over 90% delivery success initially, and close to 100% after upgrading to directional buffers to prevent wrong-direction routing. Future work includes studying node mobility and optimizing buffer sizes.
Grokking Techtalk #40: Consistency and Availability tradeoff in database clusterGrokking VN
Những năm gần đây, cùng với sự bùng nổ của các startup cùng các loại công nghệ như máy học, lượng dữ liệu phát sinh cần thu thập và xử lý trong các hệ thống ngày càng tăng cao.
Chính vì vậy, đối với các hệ thống lớn thì việc lưu trữ và xử lý dữ liệu trên một node database đã không đáp ứng được nữa, đòi hỏi phải sử dụng nhiều node kết nối với nhau để hình thành database cluster.
Đối với các database cluster nói riêng và hệ thống Distributed System nói chung, có khá nhiều chủ đề thú vị để đào sâu. Trong buổi thảo luận này, chúng ta sẽ giới hạn trong việc khảo sát về cách ba hệ thống Redis, Elastic Search và Cassandra tổ chức cluster cũng như sự trade-off giữa tính nhất quán (consistency) và khả năng đáp ứng (availability) của ba hệ thống này.
- Speaker: Lộc Võ - Lead Software Engineer @ Grab
OSPF is an intra-domain routing protocol that uses a link-state algorithm to calculate the shortest path to destinations within an autonomous system. It divides an autonomous system into areas to limit routing updates and allows for route summarization between areas. OSPF uses hello packets to discover neighbors, database description packets to exchange routing information, link-state request packets to request updates, and link-state acknowledgment packets to acknowledge receipt of updates.
Pacemaker is a high availability cluster resource manager that can be used to provide high availability for MySQL databases. It monitors MySQL instances and replicates data between nodes using replication. If the primary MySQL node fails, Pacemaker detects the failure and fails over to the secondary node, bringing the MySQL service back online without downtime. Pacemaker manages shared storage and virtual IP failover to ensure connections are direct to the active MySQL node. It is important to monitor replication state and lag to ensure data consistency between nodes.
Windows Server 2003 supports two types of clusters: server clusters and network load balancing (NLB) clusters. Server clusters provide high availability for applications like databases and messaging servers, connecting each node to shared storage. NLB clusters balance load for applications like web servers, assigning each node a cloned data set. Designing an effective cluster involves considering availability needs, scaling options, and using tools like the Network Load Balancing Manager and Cluster Administrator.
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingDibyendu Bhattacharya
My presentation at recently concluded Apache Big Data Conference Europe about the Reliable Low Level Kafka Spark Consumer I developed and an use case of real time indexing to Apache Blur using this consumer
- OSPF is a link-state routing protocol that was developed in 1991 as an improvement over the distance vector routing protocol RIP. It is based on the Bellman-Ford algorithm.
- OSPF networks can be divided into sub-domains called areas. Areas limit the scope of route information distribution and reduce the number of routes that need to be propagated. All routers within an area must be connected.
- The backbone area, with an ID of 0.0.0.0, acts as a hub that connects all other areas and distributes routing information between them. It must remain continuously connected.
Distributed databases allow data to be stored across multiple computers or sites connected through a network. The data is logically interrelated but physically distributed. A distributed database management system (DDBMS) makes the distribution transparent to users and allows sites to operate autonomously while participating in global applications. Key aspects of DDBMS include distributed transactions, concurrency control, data fragmentation and replication, distributed query processing, and ensuring transparency of the distribution.
This document discusses NoSQL databases and how they differ from traditional relational databases. NoSQL databases are designed for large scale data storage needs and do not require a fixed schema. They prioritize high performance, availability, and scalability over strict consistency. The document then describes key aspects of NoSQL databases like their use of non-SQL queries, flexible data models, and eventual consistency.
Link state routing protocols work by having each node independently construct a map of the network connectivity and calculate the best path to every destination. Each node shares connectivity information with its neighbors by flooding link state advertisements. This allows each node to independently calculate the shortest path tree and routing table for the network. Key steps include determining neighbors, distributing link state information through flooding, running Dijkstra's algorithm to calculate the shortest path tree from each node to every other, and using this to populate each node's routing table.
Link state routing protocols work by having each node independently construct a map of the network connectivity and calculate the best path to every destination. Each node sends link state advertisements containing its connections to all other nodes. As these advertisements are flooded through the network, each node builds an identical map and independently calculates the shortest paths using an algorithm like Dijkstra's. The routing table is then filled in based on the first node along the shortest path from the root node to each destination in the shortest path tree.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Building RAG with self-deployed Milvus vector database and Snowpark Container...Zilliz
This talk will give hands-on advice on building RAG applications with an open-source Milvus database deployed as a docker container. We will also introduce the integration of Milvus with Snowpark Container Services.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
20 Comprehensive Checklist of Designing and Developing a WebsitePixlogix Infotech
Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Zilliz
Join us to introduce Milvus Lite, a vector database that can run on notebooks and laptops, share the same API with Milvus, and integrate with every popular GenAI framework. This webinar is perfect for developers seeking easy-to-use, well-integrated vector databases for their GenAI apps.
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Cassandra Architecture
1.
2. Distributed
Peer to Peer
Client
● There is no leader/follower.
● Each node is aware of keys held by other nodes and coordinates with that node to fetch the
data.
● Depending on the replication factor & consistency level the coordinator talks to one of more
nodes before returning the response to the client.
● Every table defines a partition key.
● Data is distributed across the various nodes in the cluster using the hash on the partition
key. Uses Consistent hashing algo.
● Partitions are replicated across multiple nodes to prevent single point of failure.
3. Replication copies of the data across
multiple nodes within/across the DCs.
Replication Factor (RF) denotes the no of
copies.
Set at the keyspace level.
Snitch: Is a strategy to identify the DC and
Rack the node belongs to. This identity
can be manually shared across all nodes
or via Gossiping.
Coordinator is aware of the RF/keyspace
and coordinates the writes upto that factor
to the various nodes within/across DCs.
Hinted Handoff - While the replica node is
down the coordinator will delay the
transmission to that node by persisting
that data locally. It can retransmits it once
that replica node is back online.
Cassandra configuration sets the duration
for holding such data before handoff.
Replication & Consistency Consistency is an agreeable factor across
the nodes that ensures the acceptance of
a read/write.
Consistency can be set for both
read/writes.
Consistency levels (CL) can be set from
low to high (ONE, LOCAL_QUOROUM,
QUORUM, ALL)
CL is a trade off b/w consistency and
availability.
Read Repair: Coordinator performs a read
repair on some/all of the replicas that
have trailing versions. Depending on the
CL this can be done async during a read
request.
4. Gossip Each node stores info about itself and
every other node in its Knowledge base.
Each node initiates the gossip every
second with 2 or 3 other nodes to share
its knowledge base.
Knowledge Base:
Each node increments its heartbeat
version every second.
When it receives a gossip from other
node, it checks each nodes heart beat
version and updates if it had received the
latest version.
Optimization to reduce message
bandwidth during gossiping
Gossip is initiated with a SYN to the
receiving node.
SYN: Just a digest - no AppState included
Receiving node ACKs back to the sender.
ACK: Digest for the trailing versions &
detailed (includes AppState) for leading
versions.
Sender updates the trailing versions and
Acks back with the detailed info for the
requested trailing versions on the other
end.
EndPt State: <IP of a node>
HeartBeat State:
Generation: 10
Version: 34
Application State;
Status::
Norma/Removed/Arrived…
DataCenter:
Rack:
Load:
Severity:
….
EndPt State: <IP of a
node>...
Knowledge Base
5. Mem table
Commit log
Client
Write Path Client writes to both commit log and
memtable. In the event of the node
failures, the memtable can be constructed
from the commit log.
Commit log is append only, does not
maintain any order.
Memtable is partitioned by partition key
and ordered by clustering columns.
Eventually memtable grows out of size
and is flushed to disk (SSTable). SSTable
is immutable so with each flush a new
SSTable file is created.
SSTable holds each partition
Compaction is a process of merging
numerous sstable files into one. It relies
on timestamp of each row to resolve dups.
SSTable 1
SSTable 1
SSTable 1
SSTable
Compaction
Flushing
Disk
Memory
23, USA 4
23, USA 8
23, Mexico 7
55, Korea 9
23, USA 5
55, Korea 9
23, Mexico 7
23, USA 4
23, USA 5
23, USA 8
23, Mexico 7
23, USA 4
55, Korea 9
23, USA 5
23, USA 8
55, China 20
55, China 40
55, Korea 9
23, Mexico 7
23, USA 4
23, USA 5
23, USA 8
Replica Node
Coordinator
Bloom Filters
6. Read Path
Mem table
Client
SSTable 1
SSTable 1
SSTable 1
SSTable
Compaction
Flushing
DiskMemory
Partition
Index
Summary
Index
Key
Cache
(LRU)
Order of search during a Read:
Coordinator node calls one of the replica
node for the requested partition key.
Replica Node first looks in the Mem table.
If not found, follows the below path until
the key is found.
Bloom filters help determine two things.
The key doesn’t exist in the sstable or the
key may exist in the sstable.
Key Cache, An LRU cache with partition
key & value is the offset of the partition in
the SSTable file.
Summary Index is range based index for
the keys in the partition index and their
offset.
Partition Index is the indexed lookup on
the partition key and the offset of the
partition in the SSTable file.
Replica Node
Coordinator
Bloom Filters
Bloom Filters
Bloom Filters
9. Consistent Hashing
Given a set of key/value pairs, hashing is strategy to
spread each pair evenly as possible, so that we can fetch
them in almost constant time by their key.
Consistent hashing is one such hashing strategy to spread
the keys in a distributed env.
The hash of keys are hypothetically spread on ring. The
position the key takes on the ring can be anywhere b/w 0 -
360 based on hash of key (mostly mod on the hash).
The stores/server that hosts these key are also given a
position on the ring (e.g., A, B, C…)
The key is stored on the server that is found first, while
traversing the ring in anti-clockwise direction from the keys
position.
E.g., key Steve @ 352.3 finds server C @ 81.7
If we maintain a sorted list of server and their position, a
quick binary search will point us to the server where the
key can be found eliminating the need to query all servers.
Keys can be replicated on succeeding servers to avoid
SPF (Single point of failures).
10. Consistent Hashing
Although the keys are spread over several servers, the
distribution may not be even due to the uneven clustering
of the key in real world (names starting with a certain
alphabet may be more common).
In such scenarios, to overcome the load on an individual
server, we define virtual servers. What this means is we
provide multiple positions for the same server simulating
multiple instances of the same server across the ring.
With ref to the pic here, the refined sorted list of servers
will now have virtual instances of servers a1, a2, b2, c3
etc... Thereby distributed the load on C to B and A as well.
11. Bloom Filters
It's a probabilistic data structure to determine if an element is present in the set of not.
It consists of a set of n bits & a collection of independent hash functions. Each of which return a no between 0 to n-1 representing one of
the nth bit.
Writes:
A key is run thru the collection of hash functions. The resulting nth bit is flipped on to mark the elements presence.
Reads:
A key is run thru the collection of hash functions. Iff all the resulting nth bit is turned on, we can ensure that the key MAY be present in the
underlying set. Even if one of them is not flipped on, we can GUARANTEE that the key is not present.