Salvatore Sanfilippo – How Redis Cluster works, and why
In this talk the algorithmic details of Redis Cluster will be exposed in order to show what were the design tensions in the clustered version of an high performance database supporting complex data type, the selected tradeoffs, and their effect on the availability and consistency of the resulting solution.Other non-chosen solutions in the design space will be illustrated for completeness.
Introduction to memcached, a caching service designed for optimizing performance and scaling in the web stack, seen from perspective of MySQL/PHP users. Given for 2nd year students of professional bachelor in ICT at Kaho St. Lieven, Gent.
Meta/Facebook's database serving social workloads is running on top of MyRocks (MySQL on RocksDB). This means our performance and reliability depends a lot on RocksDB. Not just MyRocks, but also we have other important systems running on top of RocksDB. We have learned many lessons from operating and debugging RocksDB at scale.
In this session, we will offer an overview of RocksDB, key differences from InnoDB, and share a few interesting lessons learned from production.
Mario Molina, Software Engineer
CDC systems are usually used to identify changes in data sources, capture and replicate those changes to other systems. Companies are using CDC to sync data across systems, cloud migration or even applying stream processing, among others.
In this presentation we’ll see CDC patterns, how to use it in Apache Kafka, and do a live demo!
https://www.meetup.com/Mexico-Kafka/events/277309497/
Introduction to memcached, a caching service designed for optimizing performance and scaling in the web stack, seen from perspective of MySQL/PHP users. Given for 2nd year students of professional bachelor in ICT at Kaho St. Lieven, Gent.
Meta/Facebook's database serving social workloads is running on top of MyRocks (MySQL on RocksDB). This means our performance and reliability depends a lot on RocksDB. Not just MyRocks, but also we have other important systems running on top of RocksDB. We have learned many lessons from operating and debugging RocksDB at scale.
In this session, we will offer an overview of RocksDB, key differences from InnoDB, and share a few interesting lessons learned from production.
Mario Molina, Software Engineer
CDC systems are usually used to identify changes in data sources, capture and replicate those changes to other systems. Companies are using CDC to sync data across systems, cloud migration or even applying stream processing, among others.
In this presentation we’ll see CDC patterns, how to use it in Apache Kafka, and do a live demo!
https://www.meetup.com/Mexico-Kafka/events/277309497/
This is the presentation I made on JavaDay Kiev 2015 regarding the architecture of Apache Spark. It covers the memory model, the shuffle implementations, data frames and some other high-level staff and can be used as an introduction to Apache Spark
Evening out the uneven: dealing with skew in FlinkFlink Forward
Flink Forward San Francisco 2022.
When running Flink jobs, skew is a common problem that results in wasted resources and limited scalability. In the past years, we have helped our customers and users solve various skew-related issues in their Flink jobs or clusters. In this talk, we will present the different types of skew that users often run into: data skew, key skew, event time skew, state skew, and scheduling skew, and discuss solutions for each of them. We hope this will serve as a guideline to help you reduce skew in your Flink environment.
by
Jun Qin & Karl Friedrich
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Databricks
Spark SQL is a highly scalable and efficient relational processing engine with ease-to-use APIs and mid-query fault tolerance. It is a core module of Apache Spark. Spark SQL can process, integrate and analyze the data from diverse data sources (e.g., Hive, Cassandra, Kafka and Oracle) and file formats (e.g., Parquet, ORC, CSV, and JSON). This talk will dive into the technical details of SparkSQL spanning the entire lifecycle of a query execution. The audience will get a deeper understanding of Spark SQL and understand how to tune Spark SQL performance.
[Meetup] a successful migration from elastic search to clickhouseVianney FOUCAULT
Paris Clickhouse meetup 2019: How Contentsquare successfully migrated to Clickhouse !
Discover the subtleties of a migration to Clickhouse. What to check before hand, then how to operate clickhouse in Production
Redis is an advanced in memory key-value store designed for a world where "Memory is the new disk and disk is the new tape". Redis has some unique properties -- like blazing read and write speed, rich atomic operations and asynchronous persistence -- which make it ideally suited for a number of situations.
The tutorial includes an introduction to redis, data types used for redis, performance related to redis, sweet spots of redis, design consideration/best practices, adopters of redis. Begins with a section giving an introduction to redis which includes an introduction to redis and the features of redis. It also includes a brief history of redis, characteristics of redis, language support of redis. Following is a data types section. It includes the data types of redis like strings, lists, hashes, sets, sorted sets. It also includes not thinking of redis as an RDBMS, installation, atomicity of commands, key expiration.
Moreover, it also includes operations on lists, sets, sorted sets, hashes, keys, redis administration command. Alongside there is a section about performance of redis which includes the performance given by redis like hardware, payload size, batch size. It also includes benchmarks attained by redis, a demo version of redis, data durability and advantages of persistence. Then comes a section about sweet spots of redis. It includes sweet spots like cache server, tag cloud, auto completion, activity feeds and many more sweet spots. It also includes case studies as a video marketing platform, content publishing app etc.
A neighbouring section to this is about best practices which includes the design considerations and best practices of redis like avoid excessive long keys, have human readable keys, all data must fit in memory, polygot persistence is a smart choice and many more practices and design considerations. The last section of this tutorial includes some adopters of redis like stack overflow, craiglist, github, Instagram, blizzard entertainment.
BlueStore, A New Storage Backend for Ceph, One Year InSage Weil
BlueStore is a new storage backend for Ceph OSDs that consumes block devices directly, bypassing the local XFS file system that is currently used today. It's design is motivated by everything we've learned about OSD workloads and interface requirements over the last decade, and everything that has worked well and not so well when storing objects as files in local files systems like XFS, btrfs, or ext4. BlueStore has been under development for a bit more than a year now, and has reached a state where it is becoming usable in production. This talk will cover the BlueStore design, how it has evolved over the last year, and what challenges remain before it can become the new default storage backend.
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudNoritaka Sekiyama
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud (Hadoop / Spark Conference Japan 2019)
# English version #
http://hadoop.apache.jp/hcj2019-program/
Kubernetes dealing with storage and persistenceJanakiram MSV
Storage is a critical part of running containers, and Kubernetes offers some powerful primitives for managing it. This webinar discusses various strategies for adding persistence to the containerised workloads.
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
Apache Kafka is a new breed of messaging system built for the "big data" world. Coming out of LinkedIn (and donated to Apache), it is a distributed pub/sub system built in Scala. It has been an Apache TLP now for several months with the first Apache release imminent. Built for speed, scalability, and robustness, Kafka should definitely be one of the data tools you consider when designing distributed data-oriented applications.
The talk will cover a general overview of the project and technology, with some use cases, and a demo.
How to build a streaming Lakehouse with Flink, Kafka, and HudiFlink Forward
Flink Forward San Francisco 2022.
With a real-time processing engine like Flink and a transactional storage layer like Hudi, it has never been easier to build end-to-end low-latency data platforms connecting sources like Kafka to data lake storage. Come learn how to blend Lakehouse architectural patterns with real-time processing pipelines with Flink and Hudi. We will dive deep on how Flink can leverage the newest features of Hudi like multi-modal indexing that dramatically improves query and write performance, data skipping that reduces the query latency by 10x for large datasets, and many more innovations unique to Flink and Hudi.
by
Ethan Guo & Kyle Weller
Running MariaDB in multiple data centersMariaDB plc
MariaDB is often deployed in multiple data centers for high availability and/or disaster recovery. Tim Tadeo, Senior Sales Engineer, walks through real-world use cases and the topologies customers have created to leverage multiple data centers. He also discusses important considerations and how to address them, as well as more advanced options such as dedicated binlog servers for cross–data center replication.
Webinar slides: An Introduction to Performance Monitoring for PostgreSQLSeveralnines
To operate PostgreSQL efficiently, you need to have insight into database performance and make sure it is at optimal levels.
With that in mind, we dive into monitoring PostgreSQL for performance in this webinar replay.
PostgreSQL offers many metrics through various status overviews and commands, but which ones really matter to you? How do you trend and alert on them? What is the meaning behind the metrics? And what are some of the most common causes for performance problems in production?
We discuss this and more in ordinary, plain DBA language. We also have a look at some of the tools available for PostgreSQL monitoring and trending; and we’ll show you how to leverage ClusterControl’s PostgreSQL metrics, dashboards, custom alerting and other features to track and optimize the performance of your system.
AGENDA
- PostgreSQL architecture overview
- Performance problems in production
- Common causes
- Key PostgreSQL metrics and their meaning
- Tuning for performance
- Performance monitoring tools
- Impact of monitoring on performance
- How to use ClusterControl to identify performance issues
- Demo
SPEAKER
Sebastian Insausti, Support Engineer at Severalnines, has loved technology since his childhood, when he did his first computer course (Windows 3.11). And from that moment he was decided on what his profession would be. He has since built up experience with MySQL, PostgreSQL, HAProxy, WAF (ModSecurity), Linux (RedHat, CentOS, OL, Ubuntu server), Monitoring (Nagios), Networking and Virtualization (VMWare, Proxmox, Hyper-V, RHEV).
Prior to joining Severalnines, Sebastian worked as a consultant to state companies in security, database replication and high availability scenarios. He’s also a speaker and has given a few talks locally on InnoDB Cluster and MySQL Enterprise together with an Oracle team. Previous to that, he worked for a Mexican company as chief of sysadmin department as well as for a local ISP (Internet Service Provider), where he managed customers' servers and connectivity.
This webinar builds upon a related blog post by Sebastian: https://severalnines.com/blog/performance-cheat-sheet-postgresql.
Noah Davis & Luke Melia of Weplay share a series of examples of Redis in the real world. In doing so, they cover a survey of Redis' features, approach, history and philosophy. Most examples are drawn from the Weplay team's experience using Redis to power features on Weplay.com, a social site for youth sports.
This is the presentation I made on JavaDay Kiev 2015 regarding the architecture of Apache Spark. It covers the memory model, the shuffle implementations, data frames and some other high-level staff and can be used as an introduction to Apache Spark
Evening out the uneven: dealing with skew in FlinkFlink Forward
Flink Forward San Francisco 2022.
When running Flink jobs, skew is a common problem that results in wasted resources and limited scalability. In the past years, we have helped our customers and users solve various skew-related issues in their Flink jobs or clusters. In this talk, we will present the different types of skew that users often run into: data skew, key skew, event time skew, state skew, and scheduling skew, and discuss solutions for each of them. We hope this will serve as a guideline to help you reduce skew in your Flink environment.
by
Jun Qin & Karl Friedrich
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Databricks
Spark SQL is a highly scalable and efficient relational processing engine with ease-to-use APIs and mid-query fault tolerance. It is a core module of Apache Spark. Spark SQL can process, integrate and analyze the data from diverse data sources (e.g., Hive, Cassandra, Kafka and Oracle) and file formats (e.g., Parquet, ORC, CSV, and JSON). This talk will dive into the technical details of SparkSQL spanning the entire lifecycle of a query execution. The audience will get a deeper understanding of Spark SQL and understand how to tune Spark SQL performance.
[Meetup] a successful migration from elastic search to clickhouseVianney FOUCAULT
Paris Clickhouse meetup 2019: How Contentsquare successfully migrated to Clickhouse !
Discover the subtleties of a migration to Clickhouse. What to check before hand, then how to operate clickhouse in Production
Redis is an advanced in memory key-value store designed for a world where "Memory is the new disk and disk is the new tape". Redis has some unique properties -- like blazing read and write speed, rich atomic operations and asynchronous persistence -- which make it ideally suited for a number of situations.
The tutorial includes an introduction to redis, data types used for redis, performance related to redis, sweet spots of redis, design consideration/best practices, adopters of redis. Begins with a section giving an introduction to redis which includes an introduction to redis and the features of redis. It also includes a brief history of redis, characteristics of redis, language support of redis. Following is a data types section. It includes the data types of redis like strings, lists, hashes, sets, sorted sets. It also includes not thinking of redis as an RDBMS, installation, atomicity of commands, key expiration.
Moreover, it also includes operations on lists, sets, sorted sets, hashes, keys, redis administration command. Alongside there is a section about performance of redis which includes the performance given by redis like hardware, payload size, batch size. It also includes benchmarks attained by redis, a demo version of redis, data durability and advantages of persistence. Then comes a section about sweet spots of redis. It includes sweet spots like cache server, tag cloud, auto completion, activity feeds and many more sweet spots. It also includes case studies as a video marketing platform, content publishing app etc.
A neighbouring section to this is about best practices which includes the design considerations and best practices of redis like avoid excessive long keys, have human readable keys, all data must fit in memory, polygot persistence is a smart choice and many more practices and design considerations. The last section of this tutorial includes some adopters of redis like stack overflow, craiglist, github, Instagram, blizzard entertainment.
BlueStore, A New Storage Backend for Ceph, One Year InSage Weil
BlueStore is a new storage backend for Ceph OSDs that consumes block devices directly, bypassing the local XFS file system that is currently used today. It's design is motivated by everything we've learned about OSD workloads and interface requirements over the last decade, and everything that has worked well and not so well when storing objects as files in local files systems like XFS, btrfs, or ext4. BlueStore has been under development for a bit more than a year now, and has reached a state where it is becoming usable in production. This talk will cover the BlueStore design, how it has evolved over the last year, and what challenges remain before it can become the new default storage backend.
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudNoritaka Sekiyama
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud (Hadoop / Spark Conference Japan 2019)
# English version #
http://hadoop.apache.jp/hcj2019-program/
Kubernetes dealing with storage and persistenceJanakiram MSV
Storage is a critical part of running containers, and Kubernetes offers some powerful primitives for managing it. This webinar discusses various strategies for adding persistence to the containerised workloads.
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
Apache Kafka is a new breed of messaging system built for the "big data" world. Coming out of LinkedIn (and donated to Apache), it is a distributed pub/sub system built in Scala. It has been an Apache TLP now for several months with the first Apache release imminent. Built for speed, scalability, and robustness, Kafka should definitely be one of the data tools you consider when designing distributed data-oriented applications.
The talk will cover a general overview of the project and technology, with some use cases, and a demo.
How to build a streaming Lakehouse with Flink, Kafka, and HudiFlink Forward
Flink Forward San Francisco 2022.
With a real-time processing engine like Flink and a transactional storage layer like Hudi, it has never been easier to build end-to-end low-latency data platforms connecting sources like Kafka to data lake storage. Come learn how to blend Lakehouse architectural patterns with real-time processing pipelines with Flink and Hudi. We will dive deep on how Flink can leverage the newest features of Hudi like multi-modal indexing that dramatically improves query and write performance, data skipping that reduces the query latency by 10x for large datasets, and many more innovations unique to Flink and Hudi.
by
Ethan Guo & Kyle Weller
Running MariaDB in multiple data centersMariaDB plc
MariaDB is often deployed in multiple data centers for high availability and/or disaster recovery. Tim Tadeo, Senior Sales Engineer, walks through real-world use cases and the topologies customers have created to leverage multiple data centers. He also discusses important considerations and how to address them, as well as more advanced options such as dedicated binlog servers for cross–data center replication.
Webinar slides: An Introduction to Performance Monitoring for PostgreSQLSeveralnines
To operate PostgreSQL efficiently, you need to have insight into database performance and make sure it is at optimal levels.
With that in mind, we dive into monitoring PostgreSQL for performance in this webinar replay.
PostgreSQL offers many metrics through various status overviews and commands, but which ones really matter to you? How do you trend and alert on them? What is the meaning behind the metrics? And what are some of the most common causes for performance problems in production?
We discuss this and more in ordinary, plain DBA language. We also have a look at some of the tools available for PostgreSQL monitoring and trending; and we’ll show you how to leverage ClusterControl’s PostgreSQL metrics, dashboards, custom alerting and other features to track and optimize the performance of your system.
AGENDA
- PostgreSQL architecture overview
- Performance problems in production
- Common causes
- Key PostgreSQL metrics and their meaning
- Tuning for performance
- Performance monitoring tools
- Impact of monitoring on performance
- How to use ClusterControl to identify performance issues
- Demo
SPEAKER
Sebastian Insausti, Support Engineer at Severalnines, has loved technology since his childhood, when he did his first computer course (Windows 3.11). And from that moment he was decided on what his profession would be. He has since built up experience with MySQL, PostgreSQL, HAProxy, WAF (ModSecurity), Linux (RedHat, CentOS, OL, Ubuntu server), Monitoring (Nagios), Networking and Virtualization (VMWare, Proxmox, Hyper-V, RHEV).
Prior to joining Severalnines, Sebastian worked as a consultant to state companies in security, database replication and high availability scenarios. He’s also a speaker and has given a few talks locally on InnoDB Cluster and MySQL Enterprise together with an Oracle team. Previous to that, he worked for a Mexican company as chief of sysadmin department as well as for a local ISP (Internet Service Provider), where he managed customers' servers and connectivity.
This webinar builds upon a related blog post by Sebastian: https://severalnines.com/blog/performance-cheat-sheet-postgresql.
Noah Davis & Luke Melia of Weplay share a series of examples of Redis in the real world. In doing so, they cover a survey of Redis' features, approach, history and philosophy. Most examples are drawn from the Weplay team's experience using Redis to power features on Weplay.com, a social site for youth sports.
An insight into NoSQL solutions implemented at RTV Slovenia and elsewhere, what problems we are trying to solve and an introduction to solving them with Redis.
Talk given at #wwwh @ Ljubljana, 30.1.2013 by me, Tit Petric
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...Redis Labs
Dynomite is a
thin, distributed dynamo layer for different storage engines and protocols. Currently at Netflix, we are focusing on using
Redis as the storage engine. Dynomite supports multi-datacenter replication and is designed for high availability. In the age of high scalability and big data, Dynomite’s design goal is to turn single-server datastore solutions into peer-to-peer, linearly
scalable, clustered systems while still preserving the native client/server protocols of the datastores, e.g., Redis protocol. In this talk, we are going to present Dynomite recent features, and the Dyno client. Both projects are open source and available to the community.
Introduction to Redis 3.0, and it’s features and improvements. What’s difference between Redis / Memcached / Aerospike ? The strong sides of Redis, and away from the weak sides.
本議程介紹 Redis 3.0 及其歷史,探討 Redis 的特性與改進。並一併分析 Redis / Memcached / Aerospike 三者之間的差異,有助於未來面對業務場景需求提供瞭解與判斷。最後,分享 Redis 適用之場景,及其不適用場景下的備案或整合方案。議程適於 Redis 初學者、對 Redis 想深入瞭解者,及曾經莫名被 Redis 雷擊或坑殺者。
Redis Introduction and customized framework base on StackExchange.Redis but update to using singleton pattern and JSON
Configuration Mapping with Redis Instance Group and Name concept.
Redis Developers Day 2014 - Redis Labs TalksRedis Labs
These are the slides that the Redis Labs team had used to accompany the session that we gave during the first ever Redis Developers Day on October 2nd, 2014, London. It includes some of the ideas we've come up with to tackle operational challenges in the hyper-dense, multi-tenants Redis deployments that our service - Redis Cloud - consists of.
Redis Day Keynote Salvatore Sanfillipo Redis LabsRedis Labs
Redis' seventh birthday was recently celebrated with the community, several contributors and users. This is Salvatore's keynote as he kicked off Redis Day in Tel Aviv.
LANs are constantly evolving, build your XYZ Account Network with that baked-in…
Extreme Networks brings XYZ Account simplicity, agility, and optimized performance to your most strategic business asset. The data center is critically important to business operations in the enterprise, but often organizations have difficulty leveraging their data centers as a strategic business asset. At Extreme Networks, we focus on providing an Intelligent Enterprise Data Center Network that’s purpose-built for enterprise requirements. Our OneFabric Data Center Solution:
XoS “can be like an elastic Fabric” for XYZ Account Network…
Demand for application availability has changed how applications are hosted in today’s datacenter. Evolutionary changes have occurred throughout the various elements of the data center, starting with server and storage virtualization and network virtualization. Motivations for server virtualization were initially associated with massive cost reduction and redundancy but have now evolved to focus on greater scalability and agility within the data center. Data center focused LAN technologies have taken a similar path; with a goal of redundancy and then to create a more scalable fabric within and between data centers.
As vendors continue to tout networking architectures that decouple software from hardware, bare-metal switches are moving into the spotlight. These switches are built on merchant silicon deliver a lower-cost and more flexible switching alternative. Extreme Purple Metal switches are open enough to allow our customers to choose their network architecture based on their specific needs without going all the way to bare metal. We believe in the disaggregation of traditional enterprise networking. Extreme uses merchant silicon versus custom ASICs. Custom ASICs have fallen behind. Unless a vendor can build and compete against merchant silicon, there's no point in doing custom ASICs.
OMI - The Missing Piece of a Modular, Flexible and Composable Computing WorldAllan Cantle
These slides are part of a "Trends in Memory Desegregation" Webinar published in March 2021. You can see the webinar recording here https://youtu.be/g0QEX5qE8kE.
The presentation slides show how the Open Memory Interface, OMI , is a critical System Architecture building block towards our industry being able to easily build Domain Specific Architectures of the future as defined by the gods of Computing Architecture John Hennessy and David Patterson.
VMworld 2017 - Top 10 things to know about vSANDuncan Epping
In this session Cormac Hogan and I go over the top 10 things to know about vSAN. This is based on two years of questions/answers from our field and customers. Useful for any VMware vSAN customer!
#STO1264BU #STO1264BE
Disaggregated Networking - The Drivers, the Software & The High AvailabilityOpen Networking Summit
Dis-agregration is real… This trend started with SDN and the separation of Data plane and Control plane. The scope has expanded to include separate of hardware and software and created a whole new industry of white boxes, general purpose X86 commodity hardware. All three markets - Cloud, Enterprise and Carriers are now engaged in various solutions inside the Data Center. The disaggregation is impacted all parts of the network including Access and Edge layers.
ScyllaDB Open Source 5.0 is the latest evolution of our monstrously fast and scalable NoSQL database – powering instantaneous experiences with massive distributed datasets.
Join us to learn about ScyllaDB Open Source 5.0, which represents the first milestone in ScyllaDB V. ScyllaDB 5.0 introduces a host of functional, performance and stability improvements that resolve longstanding challenges of legacy NoSQL databases.
We’ll cover:
- New capabilities including a new IO model and scheduler, Raft-based schema updates, automated tombstone garbage collection, optimized reverse queries, and support for the latest AWS EC2 instances
- How ScyllaDB 5.0 fits into the evolution of ScyllaDB – and what to expect next
- The first look at benchmarks that quantify the impact of ScyllaDB 5.0's numerous optimizations
This will be an interactive session with ample time for Q & A – bring us your questions and feedback!
Concurrency in Distributed Systems : Leslie Lamport papersSubhajit Sahu
In computer science, concurrency is the ability of different parts or units of a program, algorithm, or problem to be executed out-of-order or in partial order, without affecting the final outcome. This allows for parallel execution of the concurrent units, which can significantly improve overall speed of the execution in multi-processor and multi-core systems. In more technical terms, concurrency refers to the decomposability property of a program, algorithm, or problem into order-independent or partially-ordered components or units.[1]
A number of mathematical models have been developed for general concurrent computation including Petri nets, process calculi, the parallel random-access machine model, the actor model and the Reo Coordination Language.
This slide deck explains a bit how to deal best with state in scalable systems, i.e. pushing it to the system boundaries (client, data store) and trying to avoid state in-between.
Then it picks arbitrarily two scenarios - one in the frontend part and one in the backend part of a system and shows concrete techniques to deal with them.
In the frontend part is examined how to deal with session state of servlet containers in scalable scenarios and introduces the concept of a shared session cache layer. Also an example implementation using Redis is shown.
In the backend part it is examined how to deal with potential data inconsistencies that can occur if maximum availability of the data store is required and eventual consistency is used. The normal way is to resolve inconsistencies manually implementing business specific logic or - even worse - asking the user to resolve it. A pure technical solution called CRDTs (Conflict-free Replicated Data Types) is then shown. CRDTs, based on sound mathematical concepts, are self-stabilizing data structures that offer a generic way to resolve inconsistencies in an eventual consistent data store. Besides some theory also some examples are shown to provide a feeling how CRDTs feel in practice.
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...confluent
In the financial industry, losing data is unacceptable. Financial firms are adopting Kafka for their critical applications. Kafka provides the low latency, high throughput, high availability, and scale that these applications require. But can it also provide complete reliability? As a system architect, when asked “Can you guarantee that we will always get every transaction,” you want to be able to say “Yes” with total confidence.
In this session, we will go over everything that happens to a message – from producer to consumer, and pinpoint all the places where data can be lost – if you are not careful. You will learn how developers and operation teams can work together to build a bulletproof data pipeline with Kafka. And if you need proof that you built a reliable system – we’ll show you how you can build the system to prove this too.
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInLinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn. This was a presentation made at QCon 2009 and is embedded on LinkedIn's blog - http://blog.linkedin.com/
Serial ATA (SATA) is a computer bus interface that connects host bus adapters to mass storage devices such as hard disk drives and optical drives. Serial ATA replaces the older AT Attachment standard (later referred to as Parallel ATA or PATA), offering several advantages over the older interface: reduced cable size and cost (seven conductors instead of 40 or 80), native hot swapping, faster data transfer through higher signalling rates, and more efficient transfer through an (optional) I/O queuing protocol.
In the big data world, our data stores communicate over an asynchronous, unreliable network to provide a facade of consistency. However, to really understand the guarantees of these systems, we must understand the realities of networks and test our data stores against them.
Jepsen is a tool which simulates network partitions in data stores and helps us understand the guarantees of our systems and its failure modes. In this talk, I will help you understand why you should care about network partitions and how can we test datastores against partitions using Jepsen. I will explain what Jepsen is and how it works and the kind of tests it lets you create. We will try to understand the subtleties of distributed consensus, the CAP theorem and demonstrate how different data stores such as MongoDB, Cassandra, Elastic and Solr behave under network partitions. Finally, I will describe the results of the tests I wrote using Jepsen for Apache Solr and discuss the kinds of rare failures which were found by this excellent tool.
Transactions and Concurrency Control PatternsJ On The Beach
Transactions and Concurrency Control Patterns by Vlad Mihalcea
Transactions and Concurrency Control are very of paramount importance when it comes to enterprise systems data integrity. However, this topic is very tough since you have to understand the inner workings of the database system, its concurrency control design choices (e.g. 2PL, MVCC), transaction isolation levels and locking schemes.
In this presentation, I’m going to explain what data anomalies can happen depending on the transaction isolation level, with references to Oracle, SQL Server, PostgreSQL, and MySQL.
I will also demonstrate that database transactions are not enough, especially for multi-request web flows. For this reason, I’m going to present multiple application-level transaction patterns based on both optimistic and pessimistic locking mechanisms.
Last, I’m going to talk about Concurrency Control strategies used in the Hibernate second-level caching mechanism, which can boost performance without compromising strong consistency.
Similar to Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barcelona 2014 (20)
Nathan Ford- Divination of the Defects (Graph-Based Defect Prediction through...NoSQLmatters
While metrics generated by static code analysis are well established as predictors of possible future defects, there is another untapped source of useful information, namely your source code revision history. This presentation will discuss converting this revision information into a graph representation, various defect prediction models and how to generate their related change metrics through graph traversal, as well as the potential applications and benefits of these graph enabled prediction models.
Stefan Hochdörfer - The NoSQL Store everyone ignores: PostgreSQL - NoSQL matt...NoSQLmatters
PostgreSQL is well known being an object-relational database management system. In it`s core PostgreSQL is schema-aware dealing with fixed database tables and column types. However, recent versions of PostgreSQL made it possible to deal with schema-free data. Learn which new features PostgreSQL supports and how to use those features in your application.
NoSQL matters, on that much I'm sure we can all agree. But if we take a closer look, what really matters when it comes to choosing a data store and/or a data processing platform? What really matters when it comes to getting the most out of that platform? And what is really going to matter as we take things to the next level?
Peter Bakas - Zero to Insights - Real time analytics with Kafka, C*, and Spar...NoSQLmatters
In this talk, Peter will cover his experience using Spark, Cassandra & Kafka to build a real time analytics platform that processed billions events a day. He will cover the challenges in how to turn all those raw events into actionable insights. He will also cover scaling the platform across multiple regions, as well as across multiple cloud environments.
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...NoSQLmatters
Data analysis is an exploratory process that requires a variety of tools and a flexible data store. Data analysis projects are easy to start but quickly become difficult to manage and error prone when depending on file-based data storage. Relational databases are poorly equipped to accommodate the dynamic demands complex analysis. This talk describes best practices for using MongoDB for analytics projects. Examples will be drawn from a large scale text mining project (approximately 25 million documents) that applies machine learning (neural networks and support vector machines) and statistical analysis. Tools discussed include R, Spark, Python scientific stack, and custom pre-processing scripts but the focus is on using these with the document database.
Mark Harwood - Building Entity Centric Indexes - NoSQL matters Dublin 2015NoSQLmatters
Sometimes we need to step back and take a look at the bigger picture - not just counting huge piles of individual log records, but reasoning about the behaviors of the people who are ultimately generating this firehose of data. While your DevOps folks care deeply about log records from a machine utlization perspective, marketing wants to know what these records tell us about the customers' needs. Elasticsearch Aggregations are a great feature but are not a panacea. We can happily use them to summarise complex things like the number of web requests per day broken down by geography and browser type on a busy website, but we would quickly run out of memory if we tried to calculate something as simple as a single number for the average duration of visitor web sessions when using the very same dataset. Why does this occur? A web session duration is an example of a behavioural attribute not held on any one log record; it has to be derived by finding the first and last records for each session in our weblogs, requiring some complex query expressions and a lot of memory to connect all the data points. We can maintain a more useful joined-up-picture if we run an ongoing background process to fuse related events from one index into ?entity-centric? summaries in another index e.g: • Web log events summarised into ?web session? entities • Road-worthiness test results summarised into ?car? entities • Reviews in a marketplace summarised into a ?reviewer? entity Using real data, this session will demonstrate how to incrementally build entity-centric indexes alongside event-centric indexes by using simple scripts to uncover interesting behaviours that accumulate over time. We'll explore: • Which cars are driven long distances after failing roadworthiness tests? • Which website visitors look to be behaving like ?bots?? • Which seller in my marketplace has employed an army of ?shills? to boost his feedback rating? Attendees will leave this session with all the tools required to begin building entity-centric indexes and using that data to derive richer business insights across every department in their organization.
Prassnitha Sampath - Real Time Big Data Analytics with Kafka, Storm & HBase -...NoSQLmatters
Relevance and Personalization is crucial to building personalized local commerce experience at Groupon. Talk covers overview of the real time analytics infrastructure that handles over 3 million events/ second and stores and scales to billions of data points. Solution covers how our Kafka -> Storm -> Redis/ HBase pipeline is used to generate real time analytics for hundreds of millions of users of Groupon. Solution includes various architecture design choices and tradeoffs including some interesting algorithmic choices such as Bloom Filters & Hyper Log Log. Attendees can take away learnings from our real-life experience that can help them understand various tuning methods, their tradeoffs and apply them in their solutions
Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...NoSQLmatters
Building applications on streaming data has its challenges. If you are trying to use programs such as Apache Spark or Storm to build applications, this presentation will explain the advantages and disadvantages of each solution and how to choose the right tool for your next streaming data project. Building streaming data applications that can manage the massive quantities of data generated from mobile devices, M2M, sensors and other IoT devices, is a big challenge that many organizations face today. Traditional tools, such as conventional database systems, do not have the capacity to ingest data, analyze it in real-time, and make decisions. New technologies such as Apache Spark and Storm are now coming to the forefront as possible solutions to handing fast data streams. Typical technology choices fall into one of three categories: OLAP, OLTP, and stream-processing systems. Each of these solutions has its benefits, but some choices support streaming data and application development much better than others. Employing a solution that handles streaming data, provides state, ensures durability, and supports transactions and real-time decisions is key to benefitting from fast data. During this presentation you will learn: - The difference between fast OLAP, stream-processing, and OLTP database solutions. - The importance of state, real-time analytics and real-time decisions when building applications on streaming data. - How streaming applications deliver more value when built on a super-fast in-memory, SQL database.
Just a few years ago all software systems were designed to be monoliths running on a single big and powerful machine. But nowadays most companies desire to scale out instead of scaling up, because it is much easier to buy or rent a large cluster of commodity hardware then to get a single machine that is powerful enough. In the database area scaling out is realized by utilizing a combination of polyglot persistence and sharding of data. On the application level scaling out is realized by microservices. In this talk I will briefly introduce the concepts and ideas of microservices and discuss their benefits and drawbacks. Afterwards I will focus on the point of intersection of a microservice based application talking to one or many NoSQL databases. We will try and find answers to these questions: Are the differences to a monolithic application? How to scale the whole system properly? What about polyglot persistence? Is there a data-centric way to split microservices?
Chris Ward - Understanding databases for distributed docker applications - No...NoSQLmatters
In this talk we'll focus on the use of Crate alongside Weave in Docker containers, the technical challenges, best practices learned, and getting a big data application running alongside it. You'll learn about the reasons why Crate.IO is building "yet another NoSQL database" and why it's unique and important when running web scale containerized applications. We'll show why the shared-nothing architecture is so important when deploying large clusters in containers and how it addresses the issues and fears of a Docker-based persistence layer. You will learn how to deploy a Crate cluster in the cloud within minutes using Docker, some of the challenges you'll encounter, and how to overcome them in order to scale your backends efficiently. We focused on super simple integration with any cloud provider, striving it to be as turnkey as possible with minimal up-front configuration required to establish a cluster. Once established, we'll show how to scale the cluster horizontally by simply adding more nodes. The session will also give you examples when you should use Crate compared to other similar technologies such as MongoDB, Hadoop, Cassandra or FoundationDB. We'll talk about this approach's strengths and what types of applications are well-suited for this type of data store, as well what is not. Finally we'll outline how to architect an application that is easy to scale using Crate and Docker.
Philipp Krenn - Host your database in the cloud, they said... - NoSQL matters...NoSQLmatters
More than two years ago we faced the decision whether to run our MongoDB database on Amazon's EC2 ourselves or to rely on a Database as a Service provider. Common wisdom told us that a well known provider, focusing all its knowledge and energy on running MongoDB, would be a better choice than us trying it on the side. Well, this talk describes what can go wrong, since we have seen a lot of interesting minor and major hiccups — including stopped instances, broken backups, a major security incident, and more broken backups. Additionally, we discuss some reasons why a hosted solution is not always the better choice and which new challenges arise from it.
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...NoSQLmatters
What if we would try to make Elasticsearch SQL 92 compliant (http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt)? This wouldn't serve that much nowadays, you would say. Well, we actually tried to do the exercise and we have some interesting conclusions. While we take Elasticsearch as an example for this "side by side", the issues we are addressing also apply to nosql in general. With this unusual exercise, we take the occasion to compare relational databases / sql with Elasticsearch / nosql on all the levels : functionality, semantics, performance and user experience.
Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015NoSQLmatters
There are many frameworks that can offer real time on top of Hadoop. This talk will show you the usage of Pivotal HAWQ and how it is easy to use SQL for querying your Hadoop data. Come and see the power and easy of use that can help you on using the Hadoop ecosystem.
DuyHai DOAN - Real time analytics with Cassandra and Spark - NoSQL matters Pa...NoSQLmatters
Apache Spark is a general data processing framework which allows you perform map-reduce tasks (but not only) in memory. Apache Cassandra is a highly available and massively scalable NoSQL data-store. By combining Spark flexible API and Cassandra performance, we get an interesting alternative to the Hadoop eco-system for both real-time and batch processing. During this talk we will highlight the tight integration between Spark & Cassandra and demonstrate some usages with live code demo.
Benjamin Guinebertière - Microsoft Azure: Document DB and other noSQL databas...NoSQLmatters
When deploying your service to Microsoft Azure, you have a number of options in terms of noSQL: you can install databases on Linux or Windows virtual machines by yourself, or via the marketplace, or you can use open source databases available as a service like HBase or proprietary and managed databases like Document DB. After showing these options, we'll show Document DB in more details. This is a noSQL database as a service that stores JSON.
David Pilato - Advance search for your legacy application - NoSQL matters Par...NoSQLmatters
How do you mix SQL and NoSQL worlds without starting a messy revolution?This live coding talk will show you how to add Elasticsearch to your legacy application without changing all your current development habits. Your application will have suddenly have advanced search features, all without the need to write complex SQL code!David will start from a Spring, Hibernate and Postgresql based application and will add a complete integration of Elasticsearch, all live from the stage during his presentation.
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015NoSQLmatters
During this live-coding session, Tugdual will move an old fashion full SQL application (JavaEE) to the new NoSQL world.Using MongoDB, and REST, he will show the benefits of this new architecture: * Easyness * Flexibility * High availability * Scalability; During this presentation, you will learn more about: * Document Oriented Model * JSON * REST * Iterative development; This demonstration is also a good opportunity to see how you can migrate data from a relational database, and the various schema options.
Gregorry Letribot - Druid at Criteo - NoSQL matters 2015NoSQLmatters
How do you monitor performance for one of your clients on a specific user segmentation when dealing with billions of events a day ? With over 2 billion ads served and 230Tb of data processed a day, we at Criteo have a comprehensive need for an interactive analytics stack. And by interactive, we mean a querying system with dynamic filtering to drill down over multiple dimensions, answering within sub-second latency. This session will take you on our journey with Druid, ""an open-source data store designed for real-time exploratory analytics on large data sets"". We will explore Druid's architecture and noticeable concepts, how relevant they are for some use cases and how it really performs.
In many modern applications the database side is realized using polyglot persistence – store each data format (graphs, documents, etc.) in an appropriate separate database. This approach yields several benefits, databases are optimized for their specific duty, however there are also drawbacks: * keep all databases in sync * queries might require data from several databases * experts needed for all used systems A multi-model database is not restricted to one data format, but can cope with several of them. In this talk i will present how a multi-model database can be used in a polyglot persistence setup and how it will reduce the effort drastically.
Rob Harrop- Key Note The God, the Bad and the Ugly - NoSQL matters Paris 2015NoSQLmatters
The impact that NoSQL has had on the technology community cannot be overstated. The proliferation of new and exciting data systems has led to a slew of interesting solutions to problems that were once solved the relational way. In this session we explore all that is great and good about NoSQL: the innovative software, the clever storage paradigms and the reigniting of developer interest in data access. It is unfortunate that NoSQL is not only a force for good in our community. We'll explore some of the darker corners of NoSQL: the disregard for years of proven technology, the overbearing hype, the overblown marketing and the ever present arguments over which technology is best. We close the session by exploring what can be done to extract even more value from the NoSQL movement, where we can improve how the community interacts with the larger technology community and what the future holds for data access technologies.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
3. Go Cluster
• Redis Cluster must have same Redis use case.
• Tradeoffs are inherently needed in DS.
• CAP? Merge values? Strong consistency and
consensus? How to replicate values?
4. CP systems
CAP: consistency price is added latency
Client S1
S2
S3
S4
5. CP systems
Reply to client after majority ACKs
Client S1
S2
S3
S4
6. And… there is the disk
S1 S2 S3
Disk Disk Disk
CP algorithms may require fsync-befor-ack.
Durability / Consistency not always orthogonal.
7. AP systems
Eventual consistency with merges?
(note: merge is not strictly part of EC)
Client
S1
S2
A = {1,2,3,8,12,13,14}
Client
A = {2,3,8,11,12,1}
8. Many kinds of consistencies
• “C” of CAP is strong consistency.
• It is not the only available tradeoff of course.
• Consistency is the set of liveness and safety
properties a given system provides.
• Eventual consistency: like to say nothing at all.
What liveness/safety properties if not “C”?
18. Global slots config
• A master FAIL state triggers a failover.
• Cluster needs a coherent view of configuration.
• Who is serving this slot currently?
• Slots config must eventually converge.
19. Raft and failover
• Config propagation is solved using ideas from the
Raft algorithm (just a subset).
• Raft is a consensus algorithm built on top of
different “layers”.
• Raft paper is already a classic (highly
recommended).
• Full Raft not needed for Redis Cluster slots config.
20. Failover and config
Failed
Slave
Slave
Slave
Master
Master
Master
Epoch = Epoch+1
(logical clock)
Vote for me!
21. Too easy?
• Why we don’t need full Raft?
• Because our config is idempotent: when the
partition heals we can trow away slots config for
new versions.
• Same algorithm is used in Sentinel v2 and works
well.
22. Config propagation
• After a successful failover, new slot config is
broadcasted.
• If there are partitions, when they heal, config will
get updated (broadcasted from time to time, plus
stale config detection and UPADTE messages).
• Config with greater Epoch always wins.
23. Redis Cluster consistency?
• Eventual consistent: last failover wins.
• In the “vanilla” losing writes is unbound.
• Mechanisms to avoid unbound data loss.
27. More data safety?
• OP logging until async ACK received.
• Re-played to master when node turns into slave.
• “Safe” connections, on demand.
• Example SADD (idempotent + commutative).
• SET-LWW foo bar <wall-clock>.
28. Multi key ops
• Hey hashtags!
• {user:1000}.following {user:1000}.followers.
• Unavailable for small windows, but no data
exchange between nodes.
29. Multi key ops
(availability)
• Single key ops: always available during resharding.
• Multi key ops, available if:
• No manual resharding of this hash slot in progress.
• Resharding in progress, but source or destination
node have all keys.
• Otherwise we get a -TRYAGAIN error.