An overview and discussion on indexing data in Redis to facilitate fast and efficient data retrieval. Presented on September 22nd, 2014 to the Redis Tel Aviv Meetup.
Analyze Virtual Machine Overhead Compared to Bare Metal with TracingScyllaDB
Running a virtual machine will obviously add some overhead over running on bare metal. This is expected. But there are some cases that the overhead is much higher than expected. This talk discusses using tracing to analyze this overhead from a Linux host running KVM. Ideally, the guest would also be running Linux to get a more detailed explanation of the events, but analysis can still be done when the guest is something else.
High-Performance Networking Using eBPF, XDP, and io_uringScyllaDB
In the networking world there are a number of ways to increase performance over naive use of basic Berkeley sockets. These techniques have ranged from polling blocking sockets, non-blocking sockets controlled by Epoll, all the way through completely bypassing the Linux kernel for maximum network performance where you talk directly to the network interface card by using something like DPDK or Netmap. All these tools have their place, and generally occupy a space from convenience to performance. But in recent years, that landscape has changed massively.. The tools available to the average Linux systems developer have improved from the creation of io_uring, to the expansion of bpf from a simple filtering language to a full-on programming environment embedded directly in the kernel. Along with that came something called XDP (express datapath). This was Linux kernel's answer to kernel-bypass networking. AF_XDP is the new socket type created by this feature, and generally works very similarly to something like DPDK. History lessons out of the way, this talk will look into, and discuss the merits of this technology, it's place in the broader ecosystem and how it can be used to attain the highest level of performance possible. This talk will dive into crucial details, such as how AF_XDP works, how it can be integrated into a larger system and finally more advanced topics such as request sharding/load balancing. There will be detailed look at the design of AF_XDP, the eBpf code used, as well as the userspace code required to drive it all. It will also include performance numbers from this setup compared to regular kernel networking. And most importantly how to put all this together to handle as much data as possible on a single modern multi-core system.
Systemd: the modern Linux init system you will learn to loveAlison Chaiken
The talk combines a design overview of systemd with some tutorial incofrmation about how to configure it. Systemd's features and pitfalls are illustrated by short demos and real-life examples. Files used in the demos are listed under "Presentations" at http://she-devel.com/
Video of the live presentation will appear here:
http://www.meetup.com/Silicon-Valley-Linux-Technology/events/208133972/
Analyze Virtual Machine Overhead Compared to Bare Metal with TracingScyllaDB
Running a virtual machine will obviously add some overhead over running on bare metal. This is expected. But there are some cases that the overhead is much higher than expected. This talk discusses using tracing to analyze this overhead from a Linux host running KVM. Ideally, the guest would also be running Linux to get a more detailed explanation of the events, but analysis can still be done when the guest is something else.
High-Performance Networking Using eBPF, XDP, and io_uringScyllaDB
In the networking world there are a number of ways to increase performance over naive use of basic Berkeley sockets. These techniques have ranged from polling blocking sockets, non-blocking sockets controlled by Epoll, all the way through completely bypassing the Linux kernel for maximum network performance where you talk directly to the network interface card by using something like DPDK or Netmap. All these tools have their place, and generally occupy a space from convenience to performance. But in recent years, that landscape has changed massively.. The tools available to the average Linux systems developer have improved from the creation of io_uring, to the expansion of bpf from a simple filtering language to a full-on programming environment embedded directly in the kernel. Along with that came something called XDP (express datapath). This was Linux kernel's answer to kernel-bypass networking. AF_XDP is the new socket type created by this feature, and generally works very similarly to something like DPDK. History lessons out of the way, this talk will look into, and discuss the merits of this technology, it's place in the broader ecosystem and how it can be used to attain the highest level of performance possible. This talk will dive into crucial details, such as how AF_XDP works, how it can be integrated into a larger system and finally more advanced topics such as request sharding/load balancing. There will be detailed look at the design of AF_XDP, the eBpf code used, as well as the userspace code required to drive it all. It will also include performance numbers from this setup compared to regular kernel networking. And most importantly how to put all this together to handle as much data as possible on a single modern multi-core system.
Systemd: the modern Linux init system you will learn to loveAlison Chaiken
The talk combines a design overview of systemd with some tutorial incofrmation about how to configure it. Systemd's features and pitfalls are illustrated by short demos and real-life examples. Files used in the demos are listed under "Presentations" at http://she-devel.com/
Video of the live presentation will appear here:
http://www.meetup.com/Silicon-Valley-Linux-Technology/events/208133972/
PostgreSQL continuous backup and PITR with BarmanEDB
How can I achieve an RPO of 5 minutes for the backups of my PostgreSQL databases? And what about RPO=0 for zero data loss backups? This talk will give you answers to those questions, by guiding you through an overview of Disaster Recovery of PostgreSQL databases with Barman, covering its key concepts and providing useful patterns and tips.
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxConJérôme Petazzoni
Containers are everywhere. But what exactly is a container? What are they made from? What's the difference between LXC, butts-nspawn, Docker, and the other container systems out there? And why should we bother about specific filesystems?
In this talk, Jérôme will show the individual roles and behaviors of the components making up a container: namespaces, control groups, and copy-on-write systems. Then, he will use them to assemble a container from scratch, and highlight the differences (and likelinesses) with existing container systems.
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Henning Jacobs
Kubernetes has the concept of resource requests and limits. Pods get scheduled on the nodes based on their requests and optionally limited in how much of the resource they can consume. Understanding and optimizing resource requests/limits is crucial both for reducing resource "slack" and ensuring application performance/low-latency. This talk shows our approach to monitoring and optimizing Kubernetes resources for 80+ clusters to achieve cost-efficiency and reducing impact for latency-critical applications. All shown tools are Open Source and can be applied to most Kubernetes deployments.
MySQL InnoDB Cluster: Management and Troubleshooting with MySQL ShellMiguel Araújo
MySQL InnoDB Cluster and MySQL Shell session presented at Oracle CodeOne2019.
Abstract:
MySQL InnoDB Cluster provides a built-in high-availability solution for MySQL. Combining MySQL Group Replication with MySQL Router and MySQL Shell into an integrated full-stack solution, InnoDB Cluster provides easy setup and management of MySQL instances into a fault-tolerant database service. MySQL Shell is the “control panel” of InnoDB Cluster, enabling the easy and straightforward configuration and management of InnoDB clusters by providing a scriptable and interactive API: the AdminAPI. Recent enhancements and features added to MySQL Shell make the management of InnoDB clusters even more powerful and smoother. Attend this session to get an overview of the latest developments and improved InnoDB Cluster administration tasks.
Notes:
The slideshow includes a video that cannot be seen in slideshare/PDF. If you're interested in it please check the following blog post: https://mysqlhighavailability.com/mysql-innodb-cluster-automatic-node-provisioning/
Create a Varnish cluster in Kubernetes for Drupal caching - DrupalCon North A...Ovadiah Myrgorod
Varnish is a caching proxy usually used for high profile Drupal sites. However, configuring Varnish is not an easy task that requires a lot of work. It is even more difficult when it comes to creating a scalable cluster of Varnish nodes.
Fortunately, there is a solution. I’ve been working on kube-httpcache project (https://github.com/mittwald/kube-httpcache) that takes care of many things such as routing, scaling, broadcasting, config-reloading, etc...
If you need to run more than one instance of Varnish, this session is for you. You will learn how to:
* Launch a single instance of Varnish in Kubernetes.
* Configure Varnish for Drupal.
* Scale Varnish from 1 to N nodes as part of the cluster.
* Make your Varnish cluster resilient.
* Reload Varnish configs on the fly.
* Properly invalidate cache for multiple Varnish nodes.
This session requires some basic understanding of Docker and Kubernetes; however, I will provide some intro if you are new to it.
Join this session and enjoy!
This tutorial is an introduction to Debian packaging. It teaches prospective developers how to modify existing packages, how to create their own packages, and how to interact with the Debian community. In addition to the main tutorial, it includes three practical sessions on modifying the 'grep' package, and packaging the 'gnujump' game and a Java library.
The engineering challenges of designing for low latency execution include tightly controlling the time it takes to detect the onset of latency excursion and a diagnosis of its most likely cause. In modern x-as-a-service (XaaS) forms of distributed applications, the points at which latency is experienced by a service consumer are separated by many layers of modular abstractions from the underlying system hardware. This separation makes it difficult to pinpoint the causes of latency pushouts and to apply corrective actions in a timely manner. The classic performance methodology to profile ‘cycles’ of work may be broadly successful in extracting higher levels of latency, but not very effective in determining causes of short-duration latency surges; and, to determine that, it is frequently necessary to:
• trace execution
• pinpoint when a significant latency stretch out occurs
• establish its correlation with a nearby precursor or a set of precursor events
Each of these steps can incur significant overheads; further, one has to be concerned that even modest overheads from tracing risk contributing to tail latencies. Not just the detection of the onset of a latency excursion, but the identification of why it occurs must be completed quickly so that if a corrective action is possible, it can be taken promptly. Similarly, if no recourse to curb the latency of a slice of computation is available at some point in time, then it is ideal that steps to minimize the impact of the exception are put into effect as early as possible
In our talk, we present an approach that complements the very low overhead software tracing provided by KUtrace. It uses eBPF to trigger a collection of additional data at very low overhead from the hardware performance monitoring unit (PMU) so that latency excursions within a span of execution can be examined in a timely manner. We will describe the use of PMU capabilities like precise events-based sampling (PEBS) and timed last branch records (Timed LBRs) in close proximity to events of interest to extract critical clues. We will further discuss planned future work to integrate in-band network telemetry (INT) into these tracing flows.
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including the General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and GPU instance families. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
I have described all about linux OS starting from basics.
I guess this PPT will really be very very helpful for you guys.
This was one of the most appreciable PPT in my time when i presented it in my class.
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021Valeriy Kravchuk
Bpftrace is a relatively new eBPF-based open source tracer for modern Linux versions (kernels 5.x.y) that is useful for analyzing production performance problems and troubleshooting software. Basic usage of the tool, as well as bpftrace one liners and advanced scripts useful for MariaDB DBAs are presented. Problems of MariaDB Server dynamic tracing with bpftrace and some possible solutions and alternative tracing tools are discussed.
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Odinot Stanislas
Après la petite intro sur le stockage distribué et la description de Ceph, Jian Zhang réalise dans cette présentation quelques benchmarks intéressants : tests séquentiels, tests random et surtout comparaison des résultats avant et après optimisations. Les paramètres de configuration touchés et optimisations (Large page numbers, Omap data sur un disque séparé, ...) apportent au minimum 2x de perf en plus.
PostgreSQL continuous backup and PITR with BarmanEDB
How can I achieve an RPO of 5 minutes for the backups of my PostgreSQL databases? And what about RPO=0 for zero data loss backups? This talk will give you answers to those questions, by guiding you through an overview of Disaster Recovery of PostgreSQL databases with Barman, covering its key concepts and providing useful patterns and tips.
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxConJérôme Petazzoni
Containers are everywhere. But what exactly is a container? What are they made from? What's the difference between LXC, butts-nspawn, Docker, and the other container systems out there? And why should we bother about specific filesystems?
In this talk, Jérôme will show the individual roles and behaviors of the components making up a container: namespaces, control groups, and copy-on-write systems. Then, he will use them to assemble a container from scratch, and highlight the differences (and likelinesses) with existing container systems.
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Henning Jacobs
Kubernetes has the concept of resource requests and limits. Pods get scheduled on the nodes based on their requests and optionally limited in how much of the resource they can consume. Understanding and optimizing resource requests/limits is crucial both for reducing resource "slack" and ensuring application performance/low-latency. This talk shows our approach to monitoring and optimizing Kubernetes resources for 80+ clusters to achieve cost-efficiency and reducing impact for latency-critical applications. All shown tools are Open Source and can be applied to most Kubernetes deployments.
MySQL InnoDB Cluster: Management and Troubleshooting with MySQL ShellMiguel Araújo
MySQL InnoDB Cluster and MySQL Shell session presented at Oracle CodeOne2019.
Abstract:
MySQL InnoDB Cluster provides a built-in high-availability solution for MySQL. Combining MySQL Group Replication with MySQL Router and MySQL Shell into an integrated full-stack solution, InnoDB Cluster provides easy setup and management of MySQL instances into a fault-tolerant database service. MySQL Shell is the “control panel” of InnoDB Cluster, enabling the easy and straightforward configuration and management of InnoDB clusters by providing a scriptable and interactive API: the AdminAPI. Recent enhancements and features added to MySQL Shell make the management of InnoDB clusters even more powerful and smoother. Attend this session to get an overview of the latest developments and improved InnoDB Cluster administration tasks.
Notes:
The slideshow includes a video that cannot be seen in slideshare/PDF. If you're interested in it please check the following blog post: https://mysqlhighavailability.com/mysql-innodb-cluster-automatic-node-provisioning/
Create a Varnish cluster in Kubernetes for Drupal caching - DrupalCon North A...Ovadiah Myrgorod
Varnish is a caching proxy usually used for high profile Drupal sites. However, configuring Varnish is not an easy task that requires a lot of work. It is even more difficult when it comes to creating a scalable cluster of Varnish nodes.
Fortunately, there is a solution. I’ve been working on kube-httpcache project (https://github.com/mittwald/kube-httpcache) that takes care of many things such as routing, scaling, broadcasting, config-reloading, etc...
If you need to run more than one instance of Varnish, this session is for you. You will learn how to:
* Launch a single instance of Varnish in Kubernetes.
* Configure Varnish for Drupal.
* Scale Varnish from 1 to N nodes as part of the cluster.
* Make your Varnish cluster resilient.
* Reload Varnish configs on the fly.
* Properly invalidate cache for multiple Varnish nodes.
This session requires some basic understanding of Docker and Kubernetes; however, I will provide some intro if you are new to it.
Join this session and enjoy!
This tutorial is an introduction to Debian packaging. It teaches prospective developers how to modify existing packages, how to create their own packages, and how to interact with the Debian community. In addition to the main tutorial, it includes three practical sessions on modifying the 'grep' package, and packaging the 'gnujump' game and a Java library.
The engineering challenges of designing for low latency execution include tightly controlling the time it takes to detect the onset of latency excursion and a diagnosis of its most likely cause. In modern x-as-a-service (XaaS) forms of distributed applications, the points at which latency is experienced by a service consumer are separated by many layers of modular abstractions from the underlying system hardware. This separation makes it difficult to pinpoint the causes of latency pushouts and to apply corrective actions in a timely manner. The classic performance methodology to profile ‘cycles’ of work may be broadly successful in extracting higher levels of latency, but not very effective in determining causes of short-duration latency surges; and, to determine that, it is frequently necessary to:
• trace execution
• pinpoint when a significant latency stretch out occurs
• establish its correlation with a nearby precursor or a set of precursor events
Each of these steps can incur significant overheads; further, one has to be concerned that even modest overheads from tracing risk contributing to tail latencies. Not just the detection of the onset of a latency excursion, but the identification of why it occurs must be completed quickly so that if a corrective action is possible, it can be taken promptly. Similarly, if no recourse to curb the latency of a slice of computation is available at some point in time, then it is ideal that steps to minimize the impact of the exception are put into effect as early as possible
In our talk, we present an approach that complements the very low overhead software tracing provided by KUtrace. It uses eBPF to trigger a collection of additional data at very low overhead from the hardware performance monitoring unit (PMU) so that latency excursions within a span of execution can be examined in a timely manner. We will describe the use of PMU capabilities like precise events-based sampling (PEBS) and timed last branch records (Timed LBRs) in close proximity to events of interest to extract critical clues. We will further discuss planned future work to integrate in-band network telemetry (INT) into these tracing flows.
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including the General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and GPU instance families. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
I have described all about linux OS starting from basics.
I guess this PPT will really be very very helpful for you guys.
This was one of the most appreciable PPT in my time when i presented it in my class.
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021Valeriy Kravchuk
Bpftrace is a relatively new eBPF-based open source tracer for modern Linux versions (kernels 5.x.y) that is useful for analyzing production performance problems and troubleshooting software. Basic usage of the tool, as well as bpftrace one liners and advanced scripts useful for MariaDB DBAs are presented. Problems of MariaDB Server dynamic tracing with bpftrace and some possible solutions and alternative tracing tools are discussed.
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Odinot Stanislas
Après la petite intro sur le stockage distribué et la description de Ceph, Jian Zhang réalise dans cette présentation quelques benchmarks intéressants : tests séquentiels, tests random et surtout comparaison des résultats avant et après optimisations. Les paramètres de configuration touchés et optimisations (Large page numbers, Omap data sur un disque séparé, ...) apportent au minimum 2x de perf en plus.
Redis Use Patterns (DevconTLV June 2014)Itamar Haber
An introduction to Redis for the SQL practitioner, covering data types and common use cases.
The video of this session can be found at: https://www.youtube.com/watch?v=8Unaug_vmFI
Power to the People: Redis Lua ScriptsItamar Haber
Redis is the Sun.
Earth is your application.
Imagine that the Moon is stuck in the middle of the Sun.
You send non-melting rockets (scripts) with robots
(commands) and cargo (data) back and forth…
Redis is an advanced key-value store or a data structure server. This presentation will cover the following topics:
* An overview of Redis
* Data Structures
* Basics of Setup and Installation
* Basics of Administration
* Programming with Redis
* Considerations of Running Redis in a Virtual Machine
* Redis Resources There will be a number of demonstrations to help explain some of the concepts being presented.
Get more than a cache back! The Microsoft Azure Redis Cache (NDC Oslo)Maarten Balliauw
Serving up content on the Internet is something our web sites do daily. But are we doing this in the fastest way possible? How are users in faraway countries experiencing our apps? Why do we have three webservers serving the same content over and over again? In this session, we’ll explore the Azure Content Delivery Network or CDN, a service which makes it easy to serve up blobs, videos and other content from servers close to our users. We’ll explore simple file serving as well as some more advanced, dynamic edge caching scenarios.
A list of all URLs in the deck is at: https://gist.github.com/itamarhaber/87e8c8c7126fbfb3f722
A lightening talk filled to the brim with knowledge and tips about Redis, data structures, performance, RAM and tips to take Redis to the max
Build a Geospatial App with Redis 3.2- Andrew Bass, Sean Yesmunt, Sergio Prad...Redis Labs
We created an app to find nearby running partners, and to demonstrate Redis Data structures and functions. In this talk, we will review the data structures and walk through our NodeJS app that depends solely on Redis Geospatial Indexes. Functions demoed are GEOADD, ZREM, GEOHASH, GEOPOS, GEODIST, GEORADIUS, GEORADIUSBYMEMBER
RespClient - Minimal Redis Client for PowerShellYoshifumi Kawai
RespClient is a minimal RESP(REdis Serialization Protocol) client for C# and PowerShell.
https://github.com/neuecc/RespClient
at Japan PowerShell User Group #3
#jpposh
HIgh Performance Redis- Tague Griffith, GoProRedis Labs
High Performance Redis looks at a wide range of techniques - from programming to system tuning - to deploy and maintain an extremely high performing Redis cluster. From the operational
perspective, the talk lays out multiple techniques for clustering (sharding) Redis systems and examines how the different
approaches impact performance time. The talk further examines system settings (Linux network parameters, Redis
system) and how they impact performance (both good and bad). Finally, for the developer, we look at how different ways of structuring data actually demonstrate different performance characteristics
Using Redis as Distributed Cache for ASP.NET apps - Peter Kellner, 73rd Stre...Redis Labs
I will build from scratch in this session a Microsoft ASP.NET website that caches WebAPI REST calls with both MSOpenTech’s Redis implementation for running while developing in Visual Studio as well as running on a Windows server running IIS. I will show you how to build a safe reusable caching library in c# that can be used in any .net project. I will also demonstrate how to use the Redis cache services that are available on Microsoft’s Azure cloud platform. Further, I’ll demonstrate a real world web site that uses Azure Redis cache and show statistics on how Redis improves performance consistently and reliably.
Scalable Streaming Data Pipelines with RedisAvram Lyon
Slides for talk presented at LA Redis meetup, April 16, 2016 at Scopely.
This is a draft of a session to be presented at Redis Conference 2016.
Description:
Scopely's portfolio of social and mid-core games generates billions of events each day, covering everything from in-game actions to advertising to game engine performance. As this portfolio grew in the past two years, Scopely moved all event analysis from third-party hosted solutions to a new event analytics pipeline on top of Redis and Kinesis, dramatically reducing operating costs and enabling new real-time analysis and more efficient warehousing. Our solution receives events over HTTP and SQS and provides real-time aggregation using a custom Redis-backed application, as well as prompt loads into HDFS for batch analyses.
Recently, we migrated our realtime layer from a pure Redis datastore to a hybrid datastore with recent data in Redis and older data in DynamoDB, retaining performance while further reducing costs. In this session we will describe our experience building, tuning and monitoring this pipeline, and the role of Redis in supporting handling of Kinesis worker failover, deployment, and idempotence, in addition to its more visible role in data aggregation. This session is intended be helpful for those building streaming data systems and looking for solutions for aggregation and idempotence.
As a data scientist I frequently need to create web apps to provide interactive functionality, deliver data APIs or simply publish results. It is now easier than ever to deploy your data driven web app by using cloud based application platforms to do the heavy lifting. Cloud Foundry (http://cloudfoundry.org) is an open source public and private cloud platform that enables simple app deployment, scaling and connectivity to data services like PostgreSQL, MongoDB, Redis and Cassandra.
Resources: http://www.ianhuston.net/2015/01/cloud-foundry-for-data-science-talk/
After more than 5 years of doing this, I think I managed to capture the essence of the beast quite neatly. Here's what matters about Redis, the open source in-memory data structure store, IMO.
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...NoSQLmatters
Simon Elliston Ball – When to NoSQL and When to Know SQL
With NoSQL, NewSQL and plain old SQL, there are so many tools around it’s not always clear which is the right one for the job.This is a look at a series of NoSQL technologies, comparing them against traditional SQL technology. I’ll compare real use cases and show how they are solved with both NoSQL options, and traditional SQL servers, and then see who wins. We’ll look at some code and architecture examples that fit a variety of NoSQL techniques, and some where SQL is a better answer. We’ll see some big data problems, little data problems, and a bunch of new and old database technologies to find whatever it takes to solve the problem.By the end you’ll hopefully know more NoSQL, and maybe even have a few new tricks with SQL, and what’s more how to choose the right tool for the job.
Introduction to Databases - query optimizations for MySQLMárton Kodok
This was module 6 part of a course of Web technologies. We cover relational databases, advantages/disadvantages. How to leverage MySQL index usage, and query optimizations. In the final part also mention NoSQL databases.
Code is not text! How graph technologies can help us to understand our code b...Andreas Dewes
Today, we almost exclusively think of code in software projects as a collection of text files. The tools that we use (version control systems, IDEs, code analyzers) also use text as the primary storage format for code. In fact, the belief that “code is text” is so deeply ingrained in our heads that we never question its validity or even become aware of the fact that there are other ways to look at code.
In my talk I will explain why treating code as text is a very bad idea which actively holds back our understanding and creates a range of problems in large software projects. I will then show how we can overcome (some of) these problems by treating and storing code as data, and more specifically as a graph. I will show specific examples of how we can use this approach to improve our understanding of large code bases, increase code quality and automate certain aspects of software development.
Finally, I will outline my personal vision of the future of programming, which is a future where we no longer primarily interact with code bases using simple text editors. I will also give some ideas on how we might get to that future.
Rust was initially designed as a systems language. The ecosystem has grown fast, and we can now develop full webaps with it. This talk walks through the various components that make up a simple webapp and how they fit nicely together with the language features.
This talk was given at the DevFest conference in Toulouse, France, in november 2018.
NoSQL - MongoDB. Agility, scalability, performance. I am going to talk about the basis of NoSQL and MongoDB. Why some projects requires RDBMs and another NoSQL databases? What are the pros and cons to use NoSQL vs. SQL? How data are stored and transefed in MongoDB? What query language is used? How MongoDB supports high availability and automatic failover with the help of the replication? What is sharding and how it helps to support scalability?. The newest level of the concurrency - collection-level and document-level.
AWS July Webinar Series - Getting Started with Amazon DynamoDBAmazon Web Services
This webinar provides an overview of Amazon DynamoDB, a fast, flexible, and fully managed NoSQL database service for Mobile, Web, AdTech, IOT and Gaming applications that need consistent, single-digit millisecond latency at any scale.The webinar will cover key topics around general architecture of DynamoDB, data types, throughput provisioning, querying and indexing, and recent features.
The webinar includes a live demo of the basic operations used to read and write data to a DynamoDB table, and how the concept of provisioned IO affects the throughput of these operations.
Learning Objectives:
Enable users to understand how DynamoDB works so that they can evaluate and use DynamoDB as the data store for their application
Rails is a great Ruby-based framework for producing web sites quickly and effectively. Here are a bunch of tips and best practices aimed at the Ruby newbie.
Starting with v4, modules hold a promise for changing how Redis is used and developed for. Enabling custom data types and commands, Redis Modules build upon and extend the core functionality to handle any use case.
The video of the webinar given with these slides is at: https://youtu.be/EglSYFodaqw
How I Implemented the #1 Requested Feature In Redis In Less than 1 Hour with ...Itamar Haber
These are the slides from my RedisConf 18 session about developing Redis modules. It focuses on the new keyspace notifications API and the blocking API, by showing how to develop the Ze POP module (https://github.com/itamarhaber/zpop).
P.S. no pineapples or bananas were hurt.
An introduction and status update on Redis' upcoming new data structure - Stream - that is not unlike a log, has some Apache Kafka-like thingamagigs and can be also used for time series data
Developing a Redis Module - Hackathon KickoffItamar Haber
Slides deck for kicking off Redis Labs' Modules Hackathon - https://www.hackerearth.com/sprints/redislabs-hackathon-global
Video of the webinar is at: https://youtu.be/LPxx4QPyUPw
Leveraging Probabilistic Data Structures for Real Time Analytics with Redis M...Itamar Haber
An introduction to Redis, Redis' new modules API and probabilistic data structures (PDS). Budding data scientists and big-data gurus use PDSs for estimating that which is difficult to be accurately counted.
The slides we used at the first meetup hosted at Redis Labs' TLV offices :)
Touches on some of the more notable user-facing functionality in the newest Redis version, as well as interesting internal optimizations with major gains.
#RedisTLV: www.meetup.com/Tel-Aviv-Redis-Meetup/events/227594422/
Recording: https://www.youtube.com/watch?v=qHkXVY2LpwU
External links: https://gist.github.com/itamarhaber/dddc3d4d9c19317b1477
Applications today are required to process massive amounts of data and return responses in real time. Simply storing Big Data is no longer enough; insights must be gleaned and decisions made as soon as data rushes in. In-memory databases like Redis provide the blazing fast speeds required for sub-second application response times. Using a combination of in-memory Redis and disk-based MongoDB can significantly reduce the “digestive” challenge associated with processing high velocity data.
Redis & MongoDB: Stop Big Data Indigestion Before It StartsItamar Haber
Efficiently digesting data in large volumes can prove to be challenging for any database. The challenges are compounded when this influx must be analyzed on the fly, or "tasted", to satisfy the sophisticated palates of modern apps. Luckily, there are several proven remedies you can concoct with Redis to help with potential indigestion.
The URLs from the presentation are also available at: https://gist.github.com/itamarhaber/325e515c1715a12ef132
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
2. A Little About Myself
A Redis Geek and Chief Developers Advocate
at .com
I write at http://redislabs.com/blog and edit the
Redis Watch newsletter at
http://redislabs.com/redis-watch-archive
3. Motivation
● Redis is a Key-Value datastore -> fetching
(is always) by (primary) key is fast
● Searching for keys is expensive - SCAN (or,
god forbid, the "evil" KEYS command)
● Searching for values in keys requires a full
(hash) table scan & sending the data to the
client for processing
5. antirez is Right
● Redis is a "database SDK"
● Indices imply some kind of schema (and
there's none in Redis)
● Redis wasn't made for indexing
● ...
But despite the Creator's humble opinion,
sometimes you still need a fast way to search :)
6. So What is an Index?
"A database index is a data
structure that improves the speed
of data retrieval operations"
Wikipedia, 2014
Space-Time Tradeoff
7. What Can be Indexed?
Data Index
Key -> Value Value -> Key
• Values can be numbers or strings
• Can be derived from "opaque" values:
JSONs, data structures (e.g. Hash),
functions, …
8. Index Operations Checklist
1. Create index from existing data
2. Update the index on
a. Addition of new values
b. Updates of existing values
c. Deletion of keys (and also RENAME/MIGRATE…)
3. Drop the index
4. If needed do index housekeeping
5. Access keys using the index
9. A Simple Example: Reverse Lookup
Assume the following database, where every
user has a single unique email address:
HMSET user:1 id "1" email "dfucbitz@terah.net"
How would you go about efficiently fetching the
user's ID given an email address?
10. Reverse Lookup (Pseudo) Recipe
def idxEmailAdd(email, id): # 2.a
if not(r.setnx("_email:" + email, id)):
raise Exception("INDEX_EXISTS")
def idxEmailCreate(): # 1
for each u in r.scan("user:*"):
id, email = r.hmget(u, "id", "email")
idxEmailAdd(email, id)
11. Reverse Lookup Recipe, more admin
def idxEmailDel(email): # 2.c
r.del("_email:" + email)
def idxEmailUpdate(old, new): # 2.b
idxEmailDel(old)
idxEmailAdd(new)
def idxEmailDrop(): ... # similar to Create
14. Reverse Lookup Recipe, Analysis
● Asymptotic computational complexity:
o Creating the index: O(N), N is no. of values
o Adding a new value to the index: O(1)
o Deleting a value from the index: O(1)
o Updating a value: O(1) + O(1) = O(1)
o Deleting the index: O(N), N is no. of values
● What about memory? Every key in Redis
takes up some extra space...
15. Hash Index
_email = { "dfucbitz@terah.net": 1,
"foo@bar.baz": 2 ... }
● Small lookups (e.g. countries) → single key
● Big lookups → partitioned to "buckets" (e.g.
by email address hash value)
More info: http://redis.io/topics/memory-optimization
17. Uniqueness
The lookup recipe makes the assumption that
every user has a single email address and that
it's unique (i.e. 1:1 relationship).
What happens if several keys (users) have the
same indexed value (email)?
18. Non-Uniqueness with Lists
Use lists instead of using Redis' strings/hashes.
To add:
r.lpush("_email:" + email, id) # 2.a
Simple. What about accessing the list for writes
or reads? Naturally, getting the all list's
members is O(N) but...
19. What?!? WTF do you mean O(N)?!?
Because a Redis List is essentially a linked list,
traversing it requires up to N operations
(LINDEX, LRANGE…). That
means that updates & deletes
are O(N)
Conclusion: suitable when N (i.e. number of
duplicate index entries) is smallish (e.g. < 10)
20. OT: A Tip for Traversing Lists
Lists don't have LSCAN, but with
RPOPLPUSH you easily can do a
circular list pattern and go over all
the members in O(N) w/o copying
the entire list.
More at: http://redis.io/commands/rpoplpush
21. Back to Non-Uniqueness - Hashes
Use Hashes to store multiple index values:
r.hset("_email:" + email, id, "") # 2.a
Great - still O(1). How about deleting?
r.hdel("_email:" + email, id) # 2.b
Another O(1).
(unused)
22. Non-Uniqueness, Sets Variant
r.sadd("_email:" + email, id) # 2.a
Great - still O(1). How about deleting?
r.srem("_email:" + email, id) # 2.b
Another O(1).
23. List vs. Hash vs. Set for NUIVs*
* Non-Unique Index Value
● Memory: List ~= Set ~= Hash (N < 100)
● Performance: List < Set, Hash
● Unlike a List's elements, Set members and
Hash fields are:
o Unique - meaning you can't index the same key
more than once (makes sense).
o Unordered - a non-issue for this type of index.
o Are SCANable
● Forget Lists, use Sets or Hashes.
24. Forget Hashes, Sets are Better
Because of the Set operations:
SUNION, SDIFF, SINTER
Endless possibilities, including
matchmaking:
SINTER _interest:devops _hair:blond _gender:...
25. [This Slide has No Title]
NULL means no value and Redis is all about
values.
When needed, arbitrarily decide on a value for
NULLs (e.g. "<null>") and handle it
appropriately in code.
26. Index Cardinality (~= unique values)
● High cardinality/no duplicates -> use a Hash
● Some duplicates -> use Hash and "pointers"
to Sets
_email = { "dfucbitz@terah.net": 1,
"foo@bar.baz": "*" ...}
_email:foo@bar.baz = { 2, 3 }
● Low cardinality is, however, another story...
27. Low Cardinality
When an indexed attribute has a small number
of possible values (e.g. Boolean, gender...):
● If distribution of values is 50:50, consider not
indexing it at all
● If distribution is heavily unbalanced (5:95),
index only the smaller subsets, full scan rest
● Use a bitmap index if possible
28. Bitmap Index
Assumption: key names are ordered
How: a Bitset where a bit's position maps to a
key and the bit's value is the indexed value:
first bit -> dfucbitz is online
_isLoggedIn = /100…/
second bit -> foo isn't logged in
29. Bitmap Index, cont.
More than 2 values? Use n Bitsets, where n is
the number of possible indexed values, e.g.:
_isFromTerah = /100.../
_isFromEarth = /010.../
Bonus: BITOP AND / OR / XOR / NOT
BITOP NOT _ET _isFromEarth
BITOP AND onlineET _isLoggedIn _ET
30. Interlude: Redis Indices Save Space
Consider the following: in a relational database
you need "x2" space: for the indexed data
(stored in a table) and for the index itself.
With most Redis indices, you don't have to
store the indexed data -> space saved :)
31. Numerical Ranges with Sorted Sets
Numerical values, including timestamps
(epoch), are trivially indexed with a Sorted Set:
ZADD _yearOfBirth 1972 "1" 1961 "2"...
ZADD _lastLogin 1411245569 "1"
Use ZRANGEBYSCORE and
ZREVRANGEBYSCORE for range queries
32. Ordered "Composite" Numerical Indices
Use Sorted Sets scores that are constructed by
the sort (range) order. Store two values in one
score using the integer and fractional parts:
user:1 = { "id": "1", "weightKg": "82",
"heightCm": "218", ... }
score = weightKg + ( heightCm / 1000 )
33. "Composite" Numerical Indices, cont.
For more "complex" sorts (up to 53 bits of
percision), you can construct the score like so:
user:1 = { "id": "1", "weightKg": "82",
"heightCm": "218", "IQ": "100", ... }
score = weightKg * 1000000 +
heightCm * 1000 + IQ
Adapted from:
http://www.dr-josiah.com/2013/10/multi-column-sql-like-sorting-in-redis.html
34. Full Text Search (Almost) (v2.8.9+)
ZRANGEBYLEX on Sorted Set members that
have the same score is handy for suffix
wildcard searches, i.e. dfuc*, a-la
autocomplete: http://autocomplete.redis.io/
Tip: by storing the reversed string (gnirts) you
can also do prefix searches, i.e. *terah.net, just
as easily.
35. Another Nice Thing With Sorted Sets
By combining the use of two of these, it is
possible to map ranges to keys (or just data).
For example, what is 5?
ZADD min 1 "low" 4 "medium" 7 "high"
ZADD max 3 "low" 6 "medium" 9 "high"
ZREVRANGEBYSCORE min –inf 5 LIMIT 0 1
ZRANGEBYSCORE max 5 +inf LIMIT 0 1
36. Binary Trees
Everybody knows that
binary trees are really useful
for searching and other stuff.
You can store a binary tree
as an array in a Sorted Set:
(Happy 80th Birthday!)
37. Why stop at binary trees? BTrees!
@thinkingfish from Twitter explained that they
took the BSD implementation of BTrees and
welded it into Redis (open source rulez!). This
allows them to do efficient (speed-wise, not
memory) key and range lookups.
http://highscalability.com/blog/2014/9/8/how-twitter-uses-redis-
to-scale-105tb-ram-39mm-qps-10000-ins.html
38. Index Atomicity & Consistency
In a relational database the index is (hopefully)
always in sync with the data.
You can strive for that in Redis, but:
• Your code will be much more complex
• Performance will suffer
• There will be bugs/edge cases/extreme
uses…
39. The Opposite of Atomicity & Consistency
On the other extreme, you could consider
implementing indexing with a:
• Periodical process (lazy indexing)
• Producer/Consumer pattern (i.e. queue)
• Keyspace notifications
You won't have any guarantees, but you'll be
offloading the index creation from the app.
40. Indices, Lua & Clustering
Server-side scripting is an obvious
consideration for implementing a lot (if
not all) of the indexing logic. But ...
… in a cluster setup, a script runs on
a single shard and can only access the
keys there -> no guarantee that a key
and an index are on the same shard.
41. Don't Think – Copy-Paste!
For even more "inspiration" you can review the
source code of popular ORMs libraries for
Redis, for example:
• https://github.com/josiahcarlson/rom
• https://github.com/yohanboniface/redis-limpyd