[Download the slide to get the entire talk in the form of presentation note embedded in the ppt] Apache ZooKeeper is the chosen leader in distributed coordination. In this talk, I have explored the atomic elements of Apache ZooKeeper, how it fits everything together and some of its popular use cases. For ZooKeeper simplicity is the key and as a consumer of the API, our imagination enables us to push the limits of the ZooKeeper world.
Zookeeper is a distributed coordination service that provides naming, configuration, synchronization, and group services. It allows distributed processes to coordinate with each other through a shared hierarchical namespace of data registers called znodes. Zookeeper follows a leader-elected consensus protocol to guarantee atomic broadcast of state updates from the leader to followers. It uses a hierarchical namespace of znodes similar to a file system to store configuration data and other application-defined metadata. Zookeeper provides services like leader election, group membership, synchronization, and configuration management that are essential for distributed systems.
Apache Zookeeper Explained: Tutorial, Use Cases and Zookeeper Java API ExamplesBinu George
This document provides an overview of Apache Zookeeper, a distributed coordination service. It discusses Zookeeper's data model, use of ZAB protocol for consensus, and common use cases such as naming, configuration management, and leader election. It also provides a brief Java client API example.
ZooKeeper is an open-source coordination service for distributed applications that provides common services like naming, configuration management, synchronization, and groups. It uses a hierarchical data model where data is stored as znodes that can be configured with watches, versions, access control, and more. Common uses include distributed queues, leader election, and group membership. Recipes demonstrate how to implement queues and group membership using ZooKeeper.
Basically everything you need to get started on your Zookeeper training, and setup apache Hadoop high availability with QJM setup with automatic failover.
So we're running Apache ZooKeeper. Now What? By Camille Fournier Hakka Labs
The ZooKeeper framework was originally built at Yahoo! to make it easy for the company’s applications to access configuration information in a robust and easy-to-understand way, but it has since grown to offer a lot of features that help coordinate work across distributed clusters. Apache Zookeeper became a de-facto standard for coordination service and used by Storm, Hadoop, HBase, ElasticSearch and other distributed computing frameworks.
[Download the slide to get the entire talk in the form of presentation note embedded in the ppt] Apache ZooKeeper is the chosen leader in distributed coordination. In this talk, I have explored the atomic elements of Apache ZooKeeper, how it fits everything together and some of its popular use cases. For ZooKeeper simplicity is the key and as a consumer of the API, our imagination enables us to push the limits of the ZooKeeper world.
Zookeeper is a distributed coordination service that provides naming, configuration, synchronization, and group services. It allows distributed processes to coordinate with each other through a shared hierarchical namespace of data registers called znodes. Zookeeper follows a leader-elected consensus protocol to guarantee atomic broadcast of state updates from the leader to followers. It uses a hierarchical namespace of znodes similar to a file system to store configuration data and other application-defined metadata. Zookeeper provides services like leader election, group membership, synchronization, and configuration management that are essential for distributed systems.
Apache Zookeeper Explained: Tutorial, Use Cases and Zookeeper Java API ExamplesBinu George
This document provides an overview of Apache Zookeeper, a distributed coordination service. It discusses Zookeeper's data model, use of ZAB protocol for consensus, and common use cases such as naming, configuration management, and leader election. It also provides a brief Java client API example.
ZooKeeper is an open-source coordination service for distributed applications that provides common services like naming, configuration management, synchronization, and groups. It uses a hierarchical data model where data is stored as znodes that can be configured with watches, versions, access control, and more. Common uses include distributed queues, leader election, and group membership. Recipes demonstrate how to implement queues and group membership using ZooKeeper.
Basically everything you need to get started on your Zookeeper training, and setup apache Hadoop high availability with QJM setup with automatic failover.
So we're running Apache ZooKeeper. Now What? By Camille Fournier Hakka Labs
The ZooKeeper framework was originally built at Yahoo! to make it easy for the company’s applications to access configuration information in a robust and easy-to-understand way, but it has since grown to offer a lot of features that help coordinate work across distributed clusters. Apache Zookeeper became a de-facto standard for coordination service and used by Storm, Hadoop, HBase, ElasticSearch and other distributed computing frameworks.
Distributed system coordination by zookeeper and introduction to kazoo python...Jimmy Lai
Zookeeper is a coordination tool to let people build distributed systems easier. In this slides, the author summarizes the usage of zookeeper and provides Kazoo Python library as example.
This is Apache ZooKeeper session.
ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
By the end of this presentation you should be fairly clear about Apache ZooKeeper.
To watch the video or know more about the course, please visit
http://www.knowbigdata.com/page/big-data-and-hadoop-online-instructor-led-training
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2kvXlPd
This CloudxLab Introduction to Apache ZooKeeper tutorial helps you to understand ZooKeeper in detail. Below are the topics covered in this tutorial:
1) Data Model
2) Znode Types
3) Persistent Znode
4) Sequential Znode
5) Architecture
6) Election & Majority Demo
7) Why Do We Need Majority?
8) Guarantees - Sequential consistency, Atomicity, Single system image, Durability, Timeliness
9) ZooKeeper APIs
10) Watches & Triggers
11) ACLs - Access Control Lists
12) Usecases
13) When Not to Use ZooKeeper
ZooKeeper is a highly available, scalable, distributed configuration, consensus, group membership, leader election, naming and coordination service. It provides a hierarchical namespace and basic operations like create, delete, and read data. It is useful for building distributed applications and services like queues. Future releases will focus on monitoring improvements, read-only mode, and failure detection models. The community is working on features like children for ephemeral nodes and viewing session information.
ZooKeeper is a coordination service for distributed systems that allows nodes to perform tasks correctly. It provides features like atomic operations, order guarantees, and high availability. ZooKeeper uses the Zab protocol to elect a leader and achieve consensus. It stores data in a hierarchical structure of znodes that can be persistent or ephemeral. Software load balancers can use ZooKeeper to store cluster configuration and direct traffic. The Apache Curator library provides a higher-level API for common ZooKeeper patterns like leader election and locks.
This document discusses ZooKeeper, an open-source server that enables distributed coordination. It provides instructions for installing ZooKeeper, describes ZooKeeper's data tree and API, and exercises for interacting with ZooKeeper including creating znodes, using watches, and setting up an ensemble across multiple servers.
The document provides an overview and summary of Curator, a client library for Apache ZooKeeper. Curator aims to simplify ZooKeeper development by providing a friendlier API, handling retries, and implementing common patterns ("recipes") like leader election and locks. It consists of a client, framework, and recipes components. The framework handles connection management and retries, while recipes implement distributed primitives. Details about common recipes like locks and leader election are provided.
Centralized Application Configuration with Spring and Apache ZookeeperRyan Gardner
From talk given at Spring One 2gx Dallas, 2014
Application configuration is an evolution. It starts as a hard-coded strings in your application and hopefully progresses to something external, such as a file or system property that can be changed without deployment. But what happens when other enterprise concerns enter the mix, such as audit requirements or access control around who can make changes? How do you maintain the consistency of values across too many application servers to manage at one time from a terminal window? The next step in the application configuration evolution is centralized configuration that can be accessed by your applications as they move through your various environments on their way to production. Such a service transfers the ownership of configuration from the last developer who touched the code to a well-versed application owner who is responsible for the configuration of the application across all environments. At Dealer.com, we have created one such solution that relies on Apache ZooKeeper to handle the storage and coordination of the configuration data and Spring to handle to the retrieval, creation and registration of configured objects in each application. The end result is a transparent framework that provides the same configured objects that could have been created using a Spring configuration, configuration file and property value wiring. This talk will cover both the why and how of our solution, with a focus on how we leveraged the powerful attributes of both Apache ZooKeeper and Spring to rid our application of local configuration files and provide a consistent mechanism for application configuration in our enterprise.
This document provides a guide for developing distributed applications that use ZooKeeper. It discusses ZooKeeper's data model including znodes, ephemeral nodes, and sequence nodes. It describes ZooKeeper sessions, watches, consistency guarantees, and available bindings. It provides an overview of common ZooKeeper operations like connecting, reads, writes, and handling watches. It also discusses program structure, common problems, and troubleshooting. The guide is intended to help developers understand key ZooKeeper concepts and how to integrate ZooKeeper coordination services into their distributed applications.
ZooKeeper is a distributed coordination service that allows distributed applications to synchronize data and configuration information. It uses a data model of directories and files, called znodes, that can contain small amounts of structured data. ZooKeeper maintains data consistency through a leader election process and quorum-based consensus algorithm called Paxos. It provides applications with synchronization primitives and configuration maintenance in a highly-available and reliable way.
ZooKeeper is an open-source coordination service for distributed applications that provides common services like leader election, configuration management, and locks in a simple interface to help distributed processes coordinate actions and share information. It provides guarantees around consistency, reliability, and timeliness to applications using its hierarchical data model and APIs. Popular distributed systems like Hadoop and Kafka use ZooKeeper for tasks such as cluster management, metadata storage, and detecting node failures.
This document outlines a presentation on developing distributed applications with Akka and Akka Cluster. It introduces Akka as a toolkit for building highly concurrent, distributed, and fault tolerant applications. It discusses concurrency paradigms like actors, dataflow, and software transactional memory. Live demos are presented showing actors, Akka remoting and clustering, and consistent replicated data types. The presentation emphasizes building distributed systems with Akka's actor model and using features like routers, deployment, and CRDTs to manage distributed state.
The document discusses the .NET driver for Cassandra. It provides an overview of the driver and how to connect to Cassandra and execute queries from .NET applications. Key points covered include how to connect to a cluster, execute queries using simple and prepared statements, handle paging of large result sets, and map query results to .NET objects. Examples are provided showing common operations like creating a session, executing queries, and updating data using batches in a .NET application connecting to Cassandra.
This document discusses Knewton's use of ZooKeeper and PettingZoo to implement distributed machine learning on a Python cluster. It begins by explaining what ZooKeeper is and how it provides services for distributed synchronization. It then discusses the state of ZooKeeper libraries for Python, including incomplete bindings and lack of high-level recipes. PettingZoo is introduced as Knewton's library that implements common ZooKeeper recipes for Python, allowing their machine learning models to be sharded and scaled across multiple machines. Distributed discovery, distributed bags, leader queues, and role matching are highlighted as key recipes that enable dynamic reconfiguration and load balancing of their distributed system.
This talk covers why Apache Zookeeper is a good fit for coordinating processes in a distributed environment, prior Python attempts at a client and the current state of the art Python client library, how unifying development efforts to merge several Python client libraries has paid off, features available to Python processes, and how to gracefully handle failures in a set of distributed processes.
This document discusses integrating the Python driver for Cassandra into Python applications. It covers connecting to Cassandra, executing queries, prepared statements, asynchronous queries, object mapping with cqlengine, and best practices for application development including using virtual environments. The presentation aims to make working with Cassandra from Python straightforward and high performing.
The document discusses continuous deployment and practices at Disqus for releasing code frequently. It emphasizes shipping code as soon as it is ready after it has been reviewed, passes automated tests, and some level of QA. It also discusses keeping development simple, integrating code changes through automated testing, using metrics for reporting, and doing progressive rollouts of new features to subsets of users.
Container monitoring for resource and application metrics with cAdvisor. Shipping monitoring information with the container so it is monitored irrespective of the host it runs on.
Intro to monitoring in distributed systems, cAdvisor, heapster, kubedash, kubernetes
Running High Performance & Fault-tolerant Elasticsearch Clusters on DockerSematext Group, Inc.
This document discusses running Elasticsearch clusters on Docker containers. It describes how Docker containers are more lightweight than virtual machines and have less overhead. It provides examples of running official Elasticsearch Docker images and customizing configurations. It also covers best practices for networking, storage, constraints, and high availability when running Elasticsearch on Docker.
Zookeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and leader election. It allows for distributed applications to synchronize transactions and configuration updates. Zookeeper uses a data model of znodes that can be persistent, ephemeral, or sequential. Clients can set watches on znodes to receive notifications of changes. Zookeeper provides consistency guarantees including sequential consistency, atomicity, and a single system image. Many large companies and open source projects use Zookeeper for coordination across distributed systems.
Introduction to ZooKeeper - TriHUG May 22, 2012mumrah
Presentation given at TriHUG (Triangle Hadoop User Group) on May 22, 2012. Gives a basic overview of Apache ZooKeeper as well as some common use cases, 3rd party libraries, and "gotchas"
Demo code available at https://github.com/mumrah/trihug-zookeeper-demo
Slides for presentation on ZooKeeper I gave at Near Infinity (www.nearinfinity.com) 2012 spring conference.
The associated sample code is on GitHub at https://github.com/sleberknight/zookeeper-samples
Distributed system coordination by zookeeper and introduction to kazoo python...Jimmy Lai
Zookeeper is a coordination tool to let people build distributed systems easier. In this slides, the author summarizes the usage of zookeeper and provides Kazoo Python library as example.
This is Apache ZooKeeper session.
ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
By the end of this presentation you should be fairly clear about Apache ZooKeeper.
To watch the video or know more about the course, please visit
http://www.knowbigdata.com/page/big-data-and-hadoop-online-instructor-led-training
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2kvXlPd
This CloudxLab Introduction to Apache ZooKeeper tutorial helps you to understand ZooKeeper in detail. Below are the topics covered in this tutorial:
1) Data Model
2) Znode Types
3) Persistent Znode
4) Sequential Znode
5) Architecture
6) Election & Majority Demo
7) Why Do We Need Majority?
8) Guarantees - Sequential consistency, Atomicity, Single system image, Durability, Timeliness
9) ZooKeeper APIs
10) Watches & Triggers
11) ACLs - Access Control Lists
12) Usecases
13) When Not to Use ZooKeeper
ZooKeeper is a highly available, scalable, distributed configuration, consensus, group membership, leader election, naming and coordination service. It provides a hierarchical namespace and basic operations like create, delete, and read data. It is useful for building distributed applications and services like queues. Future releases will focus on monitoring improvements, read-only mode, and failure detection models. The community is working on features like children for ephemeral nodes and viewing session information.
ZooKeeper is a coordination service for distributed systems that allows nodes to perform tasks correctly. It provides features like atomic operations, order guarantees, and high availability. ZooKeeper uses the Zab protocol to elect a leader and achieve consensus. It stores data in a hierarchical structure of znodes that can be persistent or ephemeral. Software load balancers can use ZooKeeper to store cluster configuration and direct traffic. The Apache Curator library provides a higher-level API for common ZooKeeper patterns like leader election and locks.
This document discusses ZooKeeper, an open-source server that enables distributed coordination. It provides instructions for installing ZooKeeper, describes ZooKeeper's data tree and API, and exercises for interacting with ZooKeeper including creating znodes, using watches, and setting up an ensemble across multiple servers.
The document provides an overview and summary of Curator, a client library for Apache ZooKeeper. Curator aims to simplify ZooKeeper development by providing a friendlier API, handling retries, and implementing common patterns ("recipes") like leader election and locks. It consists of a client, framework, and recipes components. The framework handles connection management and retries, while recipes implement distributed primitives. Details about common recipes like locks and leader election are provided.
Centralized Application Configuration with Spring and Apache ZookeeperRyan Gardner
From talk given at Spring One 2gx Dallas, 2014
Application configuration is an evolution. It starts as a hard-coded strings in your application and hopefully progresses to something external, such as a file or system property that can be changed without deployment. But what happens when other enterprise concerns enter the mix, such as audit requirements or access control around who can make changes? How do you maintain the consistency of values across too many application servers to manage at one time from a terminal window? The next step in the application configuration evolution is centralized configuration that can be accessed by your applications as they move through your various environments on their way to production. Such a service transfers the ownership of configuration from the last developer who touched the code to a well-versed application owner who is responsible for the configuration of the application across all environments. At Dealer.com, we have created one such solution that relies on Apache ZooKeeper to handle the storage and coordination of the configuration data and Spring to handle to the retrieval, creation and registration of configured objects in each application. The end result is a transparent framework that provides the same configured objects that could have been created using a Spring configuration, configuration file and property value wiring. This talk will cover both the why and how of our solution, with a focus on how we leveraged the powerful attributes of both Apache ZooKeeper and Spring to rid our application of local configuration files and provide a consistent mechanism for application configuration in our enterprise.
This document provides a guide for developing distributed applications that use ZooKeeper. It discusses ZooKeeper's data model including znodes, ephemeral nodes, and sequence nodes. It describes ZooKeeper sessions, watches, consistency guarantees, and available bindings. It provides an overview of common ZooKeeper operations like connecting, reads, writes, and handling watches. It also discusses program structure, common problems, and troubleshooting. The guide is intended to help developers understand key ZooKeeper concepts and how to integrate ZooKeeper coordination services into their distributed applications.
ZooKeeper is a distributed coordination service that allows distributed applications to synchronize data and configuration information. It uses a data model of directories and files, called znodes, that can contain small amounts of structured data. ZooKeeper maintains data consistency through a leader election process and quorum-based consensus algorithm called Paxos. It provides applications with synchronization primitives and configuration maintenance in a highly-available and reliable way.
ZooKeeper is an open-source coordination service for distributed applications that provides common services like leader election, configuration management, and locks in a simple interface to help distributed processes coordinate actions and share information. It provides guarantees around consistency, reliability, and timeliness to applications using its hierarchical data model and APIs. Popular distributed systems like Hadoop and Kafka use ZooKeeper for tasks such as cluster management, metadata storage, and detecting node failures.
This document outlines a presentation on developing distributed applications with Akka and Akka Cluster. It introduces Akka as a toolkit for building highly concurrent, distributed, and fault tolerant applications. It discusses concurrency paradigms like actors, dataflow, and software transactional memory. Live demos are presented showing actors, Akka remoting and clustering, and consistent replicated data types. The presentation emphasizes building distributed systems with Akka's actor model and using features like routers, deployment, and CRDTs to manage distributed state.
The document discusses the .NET driver for Cassandra. It provides an overview of the driver and how to connect to Cassandra and execute queries from .NET applications. Key points covered include how to connect to a cluster, execute queries using simple and prepared statements, handle paging of large result sets, and map query results to .NET objects. Examples are provided showing common operations like creating a session, executing queries, and updating data using batches in a .NET application connecting to Cassandra.
This document discusses Knewton's use of ZooKeeper and PettingZoo to implement distributed machine learning on a Python cluster. It begins by explaining what ZooKeeper is and how it provides services for distributed synchronization. It then discusses the state of ZooKeeper libraries for Python, including incomplete bindings and lack of high-level recipes. PettingZoo is introduced as Knewton's library that implements common ZooKeeper recipes for Python, allowing their machine learning models to be sharded and scaled across multiple machines. Distributed discovery, distributed bags, leader queues, and role matching are highlighted as key recipes that enable dynamic reconfiguration and load balancing of their distributed system.
This talk covers why Apache Zookeeper is a good fit for coordinating processes in a distributed environment, prior Python attempts at a client and the current state of the art Python client library, how unifying development efforts to merge several Python client libraries has paid off, features available to Python processes, and how to gracefully handle failures in a set of distributed processes.
This document discusses integrating the Python driver for Cassandra into Python applications. It covers connecting to Cassandra, executing queries, prepared statements, asynchronous queries, object mapping with cqlengine, and best practices for application development including using virtual environments. The presentation aims to make working with Cassandra from Python straightforward and high performing.
The document discusses continuous deployment and practices at Disqus for releasing code frequently. It emphasizes shipping code as soon as it is ready after it has been reviewed, passes automated tests, and some level of QA. It also discusses keeping development simple, integrating code changes through automated testing, using metrics for reporting, and doing progressive rollouts of new features to subsets of users.
Container monitoring for resource and application metrics with cAdvisor. Shipping monitoring information with the container so it is monitored irrespective of the host it runs on.
Intro to monitoring in distributed systems, cAdvisor, heapster, kubedash, kubernetes
Running High Performance & Fault-tolerant Elasticsearch Clusters on DockerSematext Group, Inc.
This document discusses running Elasticsearch clusters on Docker containers. It describes how Docker containers are more lightweight than virtual machines and have less overhead. It provides examples of running official Elasticsearch Docker images and customizing configurations. It also covers best practices for networking, storage, constraints, and high availability when running Elasticsearch on Docker.
Zookeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and leader election. It allows for distributed applications to synchronize transactions and configuration updates. Zookeeper uses a data model of znodes that can be persistent, ephemeral, or sequential. Clients can set watches on znodes to receive notifications of changes. Zookeeper provides consistency guarantees including sequential consistency, atomicity, and a single system image. Many large companies and open source projects use Zookeeper for coordination across distributed systems.
Introduction to ZooKeeper - TriHUG May 22, 2012mumrah
Presentation given at TriHUG (Triangle Hadoop User Group) on May 22, 2012. Gives a basic overview of Apache ZooKeeper as well as some common use cases, 3rd party libraries, and "gotchas"
Demo code available at https://github.com/mumrah/trihug-zookeeper-demo
Slides for presentation on ZooKeeper I gave at Near Infinity (www.nearinfinity.com) 2012 spring conference.
The associated sample code is on GitHub at https://github.com/sleberknight/zookeeper-samples
Curators are responsible for managing museum collections. They select items for display, direct acquisitions and loans, conduct research, and oversee conservation efforts. Effective curation requires documenting each item, providing access for researchers while protecting collections, implementing preventative conservation measures, controlling pests, planning for emergencies, and establishing policies for deaccessioning items. Curators ensure collections are well-managed, preserved, and made available to educate the public.
This document summarizes a presentation about SolrCloud shard splitting. It introduces the presenter and his background with Apache Lucene and Solr. The presentation covers an overview of SolrCloud, how documents are routed to shards in SolrCloud, the SolrCloud collections API, and the new functionality for splitting shards in Solr 4.3 to allow dynamic resharding of collections without downtime. It provides details on the shard splitting mechanism and tips for using the new functionality.
This document discusses scaling Solr using SolrCloud. It provides an overview of Solr history and architectures. It then describes how SolrCloud addresses limitations of earlier architectures by utilizing Apache ZooKeeper for coordination across Solr nodes and shards. Key concepts discussed include collections, shards, replicas, and routing queries across shards. The document also covers configuration topics like caches, indexing tuning, and monitoring.
This document discusses scaling search with Apache SolrCloud. It provides an introduction to Solr and how scaling search was difficult in previous versions due to manually managing shards and replicas. SolrCloud makes scaling easier by utilizing ZooKeeper for centralized configuration and management across a cluster. Nodes can be added to a SolrCloud cluster and will automatically be configured and assigned as shards or replicas. This allows for effortless scaling, fault tolerance, and load balancing. The document promotes upcoming features in Solr 4 and demonstrates indexing and querying in a SolrCloud cluster.
This document discusses SolrCloud failover and testing. It provides an overview of how SolrCloud uses ZooKeeper to elect an overseer node to monitor cluster state and automatically create a new replica on an available node when one goes down, allowing failover capability. It also discusses challenges with distributed testing and recommends focusing more on backfilling tests when changing code, fixing frequently failing tests, and adding more unit tests to improve Solr's testing culture.
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Shalin Shekhar Mangar
This document discusses scaling SolrCloud to support large numbers of document collections. It begins by introducing SolrCloud and some of its key capabilities and terminology. It then describes four problems that can arise at large scale: high cluster state load, overseer performance issues, inflexible data management, and limitations with data export. For each problem, solutions are proposed that were implemented in Apache Solr to improve scalability, such as splitting the cluster state, optimizing the overseer, enabling more flexible data splitting and migration, and allowing distributed deep paging exports. The document concludes by describing efforts to test SolrCloud at massive scale through automated tools and cloud infrastructure.
This document discusses using Apache Geode and ActiveMQ Artemis to build a scalable IoT platform. It introduces IoT and the MQTT protocol. ActiveMQ Artemis is described as a high performance message broker that is embeddable and supports clustering. Geode is presented as a distributed in-memory data platform for building data-intensive applications that require high performance, scalability, and availability. Example users of Geode include large companies handling billions of records and thousands of transactions per second. Key capabilities of Geode like regions, functions, querying, and continuous queries are summarized.
Windows 8 apps can access data from services in several ways:
- They can call ASMX, WCF, and REST services asynchronously using HttpClient and retrieve responses.
- They can access oData services using the oData client library.
- They can retrieve RSS feeds using SyndicationClient and parse the responses.
- They can perform background transfers using BackgroundDownloader.
- They can update tiles periodically by polling a service and setting updates.
Organizations continue to adopt Solr because of its ability to scale to meet even the most demanding workflows. Recently, LucidWorks has been leading the effort to identify, measure, and expand the limits of Solr. As part of this effort, we've learned a few things along the way that should prove useful for any organization wanting to scale Solr. Attendees will come away with a better understanding of how sharding and replication impact performance. Also, no benchmark is useful without being repeatable; Tim will also cover how to perform similar tests using the Solr-Scale-Toolkit in Amazon EC2.
Apache ZooKeeper is an open-source distributed coordination service that helps manage large sets of hosts. It implements coordination protocols to provide a consistent view of shared state across distributed applications or servers. ZooKeeper uses a hierarchical namespacing system called znodes to store configuration data and other information. It ensures highly reliable distributed coordination through features like leader election, group membership, and notifications.
Cassandra is an open source, distributed, decentralized, elastically scalable, highly available, and fault-tolerant database. It originated at Facebook in 2007 to solve their inbox search problem. Some key companies using Cassandra include Twitter, Facebook, Digg, and Rackspace. Cassandra's data model is based on Google's Bigtable and its distribution design is based on Amazon's Dynamo.
Paul Dix, CTO and co-founder of InfluxData, discussed the future of InfluxDB and the release of InfluxDB 2.0 Open Source. He explained that InfluxDB 2.0 has been rebuilt from the ground up to address limitations of the original InfluxDB like lack of distributed features and poor performance for high cardinality analytics data. The new database, called InfluxDB IOx, uses a columnar data store with parquet files and is designed to be distributed, federated, and able to run analytics at scale on high cardinality data.
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...DataStax
At Knewton we operate across five different VPCs a total of 29 clusters, each ranging from 3 nodes to 24 nodes. For a team of three to maintain this is not herculean, however good tools to diagnose issues and gather information in a distributed manner are vital to moving quickly and minimizing engineering time spent.
The database team at Knewton has been successfully using a combination of Ansible and custom open sourced tools to maintain and improve the Cassandra deployment at Knewton. I will be talking about several of these tools and giving examples of how we are using them. Specifically I will discuss the cassandra-tracing tool, which analyzes the contents of the system_traces keyspace, and the cassandra-stat tool, which gives real-time output of the operations of a cassandra cluster. Distributed administration with ad-hoc Ansible will also be covered and I will walk through examples of using these commands to identify and remediate clusterwide issues.
About the Speaker
Jeffrey Berger Lead Database Engineer, Knewton
Dr. Jeffrey Berger is currently the lead database engineer at Knewton, an education tech startup in NYC. He joined the tech scene in NYC in 2013 and spent two years working with MongoDB, becoming a certified MongoDB administrator and a MongoDB Master. He received his Cassandra Administrator certification at Cassandra Summit 2015. He holds a Ph.D. in Theoretical Physics from Penn State and spent several years working on high energy nuclear interactions.
Node.js is an event-driven, asynchronous JavaScript runtime that allows JavaScript to be used for server-side scripting. It uses an event loop model that maps events to callbacks to handle concurrent connections without blocking. This allows Node.js applications to scale to many users. Modules in Node.js follow the CommonJS standard and can export functions and objects to be used by other modules. The event emitter pattern is commonly used to handle asynchronous events. Node.js is well-suited for real-time applications with intensive I/O operations but may not be the best choice for CPU-intensive or enterprise applications.
As the popularity of PostgreSQL continues to soar, many companies are exploring ways of migrating their application database over. At Redgate Software, we recently added PostgreSQL as an optional data store for SQL Monitor, our flagship monitoring application, after nearly 18 years of being backed exclusively by SQL Server. Knowing that others will be taking this journey in the near future, we'd like to discuss what we learned. In this training, we'll discuss the planning that needs to take place before a migration begins, including datatype changes, PostgreSQL configuration modifications, and query differences. This will be a mix of slides and demo from our own learnings, as well as those of some clients we've helped along the way.
DISQUS is a comment system that handles high volumes of traffic, with up to 17,000 requests per second and 250 million monthly visitors. They face challenges in unpredictable spikes in traffic and ensuring high availability. Their architecture includes over 100 servers split between web servers, databases, caching, and load balancing. They employ techniques like vertical and horizontal data partitioning, atomic updates, delayed signals, consistent caching, and feature flags to scale their large Django application.
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...DataStax
Go90 is a mobile entertainment platform offering access to live and on demand videos. We built the web services platform and social features like activity feed for go90 by making heavy use of Cassandra and Scala, and would like to share what we learned during development and while operating go90. In this presentation, we cover our data model evolution from the initial prototypes to the current production version and the significant performance gain by using a better data model. We will explain how we apply time series data modeling and the benefits of using expiring columns with DateTieredCompactionStrategy. We will also talk about interesting experiences related to table modifications, tombstones and table pagination. On the operations side, we will discuss our findings on java driver usage, performance, monitoring, cluster maintenance, version upgrade, 2-way ssl and many more. We hope you can learn from our mistakes instead of making them yourself!
About the Speakers
Christopher Webster Software Engineer, AOL
Christopher Webster works on the web services platform for the go90 AOL project. Previously he was a Computer Scientist for the Mission Control Technologies project at NASA Ames Center. Chris worked as a senior staff engineer at Sun Microsystems for Project zembly, the cloud development and deployment environment as well as technical lead in many NetBeans projects. Chris is an author of the NetBeans Field Guide and Assemble the Social Web With Zembly.
Thomas Ng Software Engineer, AOL
Thomas Ng is a software engineer at AOL, building web services for the go90 mobile entertainment platform using Cassandra, Scala and Kafka.
OrigoDB is an in-memory database toolkit that allows writing and data to exist in the same process. It uses write-ahead command logging and snapshots for persistence. The document discusses OrigoDB's architecture, data modeling approaches, testing strategies, hosting options, and configuration capabilities like different persistence modes and kernels. It provides examples of using OrigoDB for various applications and demonstrates its immutability and server capabilities.
This document provides an overview of SQL Server internals including:
- The query processing pipeline including parsing, optimizing, and executing queries.
- How the optimizer evaluates and chooses the most efficient query execution plan.
- The role of indexes, statistics, and parallelism in query optimization.
- Transaction logging and the different SQL Server recovery models.
- Key SQL Server memory structures like the buffer pool and query plan cache.
- Threading and scheduling within the SQL Server query processor.
Learning Objectives - This module will cover Advance HBase concepts. You will also learn what Zookeeper is all about, how it helps in monitoring a cluster, why HBase uses Zookeeper and how to Build Applications with Zookeeper.
This document discusses troubleshooting Oracle WebLogic performance issues. It outlines various tools that can be used for troubleshooting including operating system tools like sar and vmstat, Java tools like jps and jstat, and WebLogic-specific tools like the WebLogic Diagnostics Framework. It also covers taking thread dumps, configuring WebLogic logging and debugging options, and using the Oracle Diagnostic Logging framework.
This document discusses deploying and managing Apache Solr at scale. It introduces the Solr Scale Toolkit, an open source tool for deploying and managing SolrCloud clusters in cloud environments like AWS. The toolkit uses Python tools like Fabric to provision machines, deploy ZooKeeper ensembles, configure and start SolrCloud clusters. It also supports benchmark testing and system monitoring. The document demonstrates using the toolkit and discusses lessons learned around indexing and query performance at scale.
You can find the first part of this presentation here: https://www.slideshare.net/secret/pAvK8Qd9f07oa
This presentation takes a deep dive into how the Million Song Library, a microservices-based application, was built using the Netflix Stack, Cassandra and Datastax.
To learn more about Million Song Library and its components visit the project on GitHub: https://github.com/kenzanlabs/million-song-library
Lea
Kubernetes is an open-source system for managing containerized applications across multiple hosts. It includes key components like Pods, Services, ReplicationControllers, and a master node for managing the cluster. The master maintains state using etcd and schedules containers on worker nodes, while nodes run the kubelet daemon to manage Pods and their containers. Kubernetes handles tasks like replication, rollouts, and health checking through its API objects.
Similar to Apache zookeeper seminar_trinh_viet_dung_03_2016 (20)
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Infrastructure Challenges in Scaling RAG with Custom AI modelsZilliz
Building Retrieval-Augmented Generation (RAG) systems with open-source and custom AI models is a complex task. This talk explores the challenges in productionizing RAG systems, including retrieval performance, response synthesis, and evaluation. We’ll discuss how to leverage open-source models like text embeddings, language models, and custom fine-tuned models to enhance RAG performance. Additionally, we’ll cover how BentoML can help orchestrate and scale these AI components efficiently, ensuring seamless deployment and management of RAG systems in the cloud.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
3. Overview – What is ZooKeeper?
• An open source, high-performance
coordination service for distributed
application.
• Exposes common services in simple
interface:
• Naming
• Configuration management
• Locks & synchronization
• Groups services
• Build your own on it for specific needs
4. Overview – Who uses ZooKeeper?
• Companies:
• Yahoo!
• Zynga
• Rackspace
• Linkedlin
• Netflix, and many more…
• Projects:
• Apache Map/Reduce (Yarn)
• Apache HBase
• Apache Kafka
• Apache Storm
• Neo4j, and many more…
5. Overview – ZooKeeper Use Cases
• Configuration Management
• Cluster member nodes bootstrapping configuration from a
centralized source in unattended way
• Distributed Cluster Management
• Node join / leave
• Node statuses in real time
• Naming service – e.g. DNS
• Distributed synchronization – locks, barriers, queues
• Leader election in a distributed system
6. The ZooKeeper Service (ZKS)
• ZooKeeper Service is replicated over a set of machines
• All machines store a copy of the data (in-memory)
• A leader is elected on service startup
• Clients only connect to a single ZooKeeper server and maintain a
TCP connection
7. The ZKS - Sessions
• Before executing any request, client must establish a
session with service
• All operations client summits to service are associated to
a session
• Client initially connects to any server in ensemble, and
only to single server.
• Session offer order guarantees – requests in session are
executed in FIFO order
8. The ZKS – Session States and Lifetime
• Main possible states: CONNECTING, CONNECTED,
CLOSED, NOT_CONNECTED
9. The ZooKeeper Data Model (ZDM)
• Hierarchal name space
• Each node is called as a ZNode
• Every ZNode has data (given as byte[])
and can optionally have children
• ZNode paths:
• Canonical, absolute, slash-separated
• No relative references
• Names can have Unicode characters
• ZNode maintain stat structure
10. ZDM - Versions
• Eash Znode has version number, is incremented every
time its data changes
• setData and delete take version as input, operation
succeeds only if client’s version is equal to server’s one
11. ZDM – ZNodes – Stat Structure
• The Stat structure for each znode in ZooKeeper is made
up of the following fields:
• czxid
• mzxid
• pzxid
• ctime
• mtime
• dataVersion
• cversion
• aclVersion
• ephemeralOwner
• dataLength
• numChildren
12. ZDM – Types of ZNode
• Persistent ZNode
• Have lifetime in ZooKeeper’s namespace until they’re explicitly
deleted (can be deleted by delete API call)
• Ephemeral ZNode
• Is deleted by ZooKeeper service when the creating client’s session
ends
• Can also be explicitly deleted
• Are not allowed to have children
• Sequential Znode
• Is assigned a sequence number by ZooKeeper as a part of name
during creation
• Sequence number is integer (4bytes) with format of 10 digits with 0
padding. E.g. /path/to/znode-0000000001
14. ZDM – Znode – Reads & Writes
• Read requests are processed locally at the ZooKeeper
server to which client is currently connected
• Write requests are forwarded to leader and go through
majority consensus before a response is generated
15. ZDM – Consistency Guarantees
• Sequential Consistency
• Atomicity
• Single System Image
• Reliability
• Timeliness (Eventual Consistency)
16. ZDM - Watches
• A watch event is one-time trigger, sent to client that set
watch, which occurs when data for which watch was set
changes.
• Watches allow clients to get notifications when a znode
changes in any way (NodeChildrenChanged,
NodeCreated, NodeDataChanged,NodeDeleted)
• All of read operations – getData(), getChildren(), exists()
– have option of setting watch
• ZooKeeper Guarantees about Watches:
• Watches are ordered, order of watch events corresponds to the
order of the updates
• A client will see a watch event for znode it is watching before
seeing the new data that corresponds to that znode
18. ZDM – Access Control List
• ZooKeeper uses ACLs to control access to its znodes
• ACLs are made up of pairs of (scheme:id, permission)
• Build-in ACL schemes
• world: has single id, anyone
• auth: doesn’t use any id, represents any authenticated user
• digest: use a username:password
• host: use the client host name as ACL id identity
• ip: use the client host IP as ACL id identity
• ACL Permissions:
• CREATE
• READ
• WRITE
• DELETE
• ADMIN
• E.g. (ip:192.168.0.0/16, READ)
19. Recipe #1: Queue
• A distributed queue is very common data structure used in
distributed systems.
• Producer: generate / create new items and put them into
queue
• Consumer: remove items from queue and process them
• Addition and removal of items follow ordering of FIFO
20. Recipe #1: Queue (cont)
• A ZNode will be designated to hold a queue instance,
queue-znode
• All queue items are stored as znodes under queue-znode
• Producers add an item to queue by creating znode under
queue-znode
• Consumers retrieve items by getting and then deleting a
child from queue-znode
QUEUE-ZNODE : “queue instance”
|-- QUEUE-0000000001 : “item1”
|-- QUEUE-0000000002 : “item2”
|-- QUEUE-0000000003 : “item3”
21. Recipe #1: Queue (cont)
• Let /_QUEUE_ represent top-level znode, is called queue-
znode
• Producer put something into queue by creating a
SEQUENCE_EPHEMERAL znode with name “queue-N”,
N is monotonically increasing number
create (“queue-”, SEQUENCE_EPHEMARAL)
• Consumer process getChildren() call on queue-znode
with watch event set to true
M = getChildren(/_QUEUE_, true)
• Client picks up items from list and continues processing
until reaching the end of the list, and then check again
• The algorithm continues until get_children() returns
empty list
22. Recipe #2: Group Membership
• A persistent Znode /membership represent the root of the
group in ZooKeeper tree
• Any client that joins the cluster creates ephemeral znode
under /membership to locate memberships in tree and set
a watch on /membership
• When another node joins or leaves the cluster, this node
gets a notification and becomes aware of the change in
group membership
23. Recipe #2: Group Membership (cont)
• Let /_MEMBERSHIP_ represent root of group membership
• Client joining the group create ephemeral nodes under root
• All members of group will register for watch events on
/_MEMBERSHIP, thereby being aware of other members in
group
L = getChildren(“/_MEMBERSHIP”, true)
• When new client joins group, all other members are notified
• Similarly, a client leaves due to failure or otherwise,
ZooKeeper automatically delete node, trigger event
• Live members know which node joined or left by looking at
the list of children L
Centralized and highly reliable (simple) data registry
Unattended = without the owner present
- Each server maintains an in-core database, which represents the entire state of the ZooKeeper namespace. To ensure that updates are durable, and thus recoverable in the event of a server crash, updates are logged to a local disk. Also, the writes are serialized to the disk before they are applied to the in-memory database
- (3) The client initially connects to any server in the ensemble, and only to a single server. It uses a TCP connection to communicate with the server, but the session may be moved to a different server if the client has not heard from its current server for some time. Moving a session to a different server is handled transparently by the ZooKeeper client library
- (4)
- A session starts at the NOT_CONNECTED state and transitions to CONNECTING (arrow 1) with the initialization of the ZooKeeper client.
- Normally, the connection to a ZooKeeper server succeeds and the session transitions to CONNECTED (arrow 2).
- When the client loses its connection to the ZooKeeper server or doesn’t hear from the server, it transitions back to CONNECTING (arrow 3) and tries to find another ZooKeeper server. If it is able to find another server or to reconnect to the original server, it transitions back to CONNECTED once the server confirms that the session is still valid.
- Otherwise, it declares the session expired and transitions to CLOSED (arrow 4).
- The application can also explicitly close the session (arrows 4 and 5)
- Each znode has a version number associated with it that is incremented every time its data changes
Zxid: Each change will have a unique zxid and if zxid1 is smaller than zxid2 then zxid1 happened before zxid2. zxid is 64-bit integer = 32bits EPOCH and 32bits COUNTER
Czxid: The zxid of the change that caused this znode to be created
Mzxid: The zxid of the change that last modified this znode
Pzxid: This is the transaction ID for a znode change that pertains to adding or removing children
Ctime: The time in milliseconds from epoch when this znode was created
Mtime: The time in milliseconds from epoch when this znode was last modified
dataVersion: The number of changes to the data of this znode
cVersion: The number of changes to the children of this znode
aclversion: The number of changes to the ACL of this znode
ephemeralOwner: The session id of the owner of this znode if the znode is an ephemeral node. If it is not an ephemeral node, it will be zero
dataLength: The length of the data field of this znode
numChildren: The number of children of this znode
ZNode's type is set at its creation time
(1) Persistent znodes are useful for storing data that needs to be highly available and accessible by all the components of a distributed application. For example, an application can store the configuration data in a persistent znode. The data as well as the znode will exist even if the creator client dies
(2) An end to a client's session can happen because of disconnection due to a client crash or explicit termination of the connection
The concept of ephemeral znodes can be used to build distributed applicationswhere the components need to know the state of the other constituent components or resources. For example, a distributed group membership service can be implemented by using ephemeral znodes. The property of ephemeral nodes getting deleted when the creator client's session ends can be used as an analogue of a node that is joining or leaving a distributed cluster. Using the membership service, any node is able discover the members of the group at any particular time.
READ requests such as exists(), getData(), and getChildren() are processed locally by the ZooKeeper server where the client is connected. This makes the read operations very fast in ZooKeeper
WRITE or update requests such as create(), delete(), and setData() are forwarded to the leader in the ensemble. The leader carries out the client request as a transaction. This transaction is similar to the concept of a transaction in a database management system
A ZooKeeper transaction also comprises all the steps required to successfully execute the request as a single work unit, and the updates are applied atomically
- Sequential Consistency: Updates from a client will be applied in the order that they were sent
- Atomicity: Updates either succeed or fail -- there are no partial results
- Single System Image: A client sees the same view of the service regardless of the ZK server it connects to.
- Reliability: Updates persists once applied, till overwritten by some clients. If a client gets a successful return code, the update will have been applied
- Timeliness: The clients’ view of the system is guaranteed to be up-to-date within a certain time bound. (Eventual Consistency)
CREATE: you can create a child node
READ: you can get data from a node and list its children.
WRITE: you can set data for a node
DELETE: you can delete a child node
ADMIN: you can set permissions
world has a single id, anyone, that represents anyone.
auth doesn't use any id, represents any authenticated user.
digest uses a username:password string to generate MD5 hash which is then used as an ACL ID identity. Authentication is done by sending the username:password in clear text. When used in the ACL the expression will be the username:base64 encoded SHA1 password digest.
host uses the client host name as an ACL ID identity. The ACL expression is a hostname suffix. For example, the ACL expression host:corp.com matches the ids host:host1.corp.com and host:host2.corp.com, but not host:host1.store.com.
ip uses the client host IP as an ACL ID identity. The ACL expression is of the form addr/bits where the most significant bits of addr are matched against the most significant bits of the client host IP
The FIFO order of the items is maintained using sequential property of znode provided by ZooKeeper. When a producer process creates a znode for a queue item, it sets the sequential flag. This lets ZooKeeper append the znode name with a monotonically increasing sequence number as the suffix. ZooKeeper guarantees that the sequence numbers are applied in order and are not reused. The consumer process processes the items in the correct order by looking at the sequence number of the znode.