RedisConf17 - Pain-free Pipelining

•Download as PPTX, PDF•

1 like•321 views

The document discusses how pipelining commands can improve performance when making requests to Redis over a network. It shows that pipelining multiple commands together can increase throughput significantly by reducing the number of round trips needed to the server. The document provides a benchmark showing single commands getting 17k requests/second while pipelined commands achieve 260k requests/second. It also demonstrates how to simulate network latency using Toxiproxy to throttle connections and see even larger gains from pipelining when there is network overhead. The RedPipe library is introduced as a way to pipeline commands while still maintaining a familiar API and handling responses with futures to avoid blocking.

Technology

PAIN-FREE PIPELINING
John Loehrer
Joya Communications
efficient Redis network i/o

• Client: INCR X
• Server: 1
• Client: INCR X
• Server: 2
• Client: INCR X
• Server: 3
• Client: INCR X
• Client: INCR X
• Client: INCR X
• Server: 1
• Server: 2
• Server: 3
SINGLE VS. PIPELINE

COMPARE THROUGHPUT
Single commands vs Pipelined

• single: 17k req/s
• pipelined: 260k req/s
QUICK BENCHMARK
redis-benchmark -t get -c 1

LOOPBACK != PROD
need to simulate real network conditions

TOXIPROXY THROTTLING
https://github.com/Shopify/toxiproxy

CONFIGURE TOXIPROXY
• toxiproxy-cli create redis -l localhost:26379 -u localhost:6379
• toxiproxy-cli toxic add redis -t latency -a latency=1

• single: 216 req/s
• pipelined: 4300 req/s
NETWORK BENCHMARK
1 ms throttle

0 15 30 45 60
localhost
productio
n
CPU Network
LOW-HANGING FRUIT
Optimize CPU or Network?

OPTIMIZE RTT
mitigate cost of network round-trips

REFACTORING HURTS
pipelining api is different from single calls

• one connection for each command
• requires different programming approach
ASYNC I/O ISSUES

LEARN FROM MY MISERY
After years of effort …

Reference Implementation
redpipe.readthedocs.io

FAMILIAR API
RedPipe works almost like redis-py

HOW IS IT DIFFERENT?
• redis-py pipelines make you wait till
payday
• RedPipe gives you instant credit

WHAT’S THE POINT?
Code Reuse & Flexibility

• assign data before you have it
• return the response before it has been
calculated
• allows logical encapsulation even while
pipelining

FUTURES WRAP RESPONSES
Use them like the real thing

WRAP ANOTHER PIPE
passes the commands upstream

WHAT ELSE?
RedPipe has some other fun things …

DEFINE YOUR DATA TYPE
• strings
• lists
• sets
• hashes
• sorted sets
• hyperloglog
• geo (in progress)

SPECIFY A CONNECTION
can talk to multiple backends transparently

CHARACTER ENCODING
translate fields to a consistent data-type

SUPPORT COMPLEX
TYPES
hash fields support python primitives
• bool
• int
• float
• list
• dict

WRITE YOUR OWN
fields just need an encode/decode method

STRUCTS
ability to manipulate redis like a dictionary …
And still pipeline it all

Roblox is a social gaming platform with millions of users that uses Redis and a custom Redis Pub/Sub backplane to deliver real-time messages at scale. The system handles over 1 million concurrent connections and delivers 20,000 messages per second. It uses Redis for caching, rate limiting, and as a datastore. The system was designed for best effort delivery without guarantees and limits notifications by user, connection, and message size. It has been successfully scaled through capacity planning and automated testing.

Back your App with MySQL & Redis, the Cloud Foundry Way- Kenny Bastani, Pivotal

Redis Labs

In this session, we will build a minimum viable Spring Data web service with REST API, add a MySQL backing service as the primary data store, and a Redis Labs backing service for caching. We will demonstrate performance metrics without Redis caching enabled and then with Redis caching enabled. I will also provide an intro-level explanation of the platform capabilities within Pivotal Web Services.

The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...

Redis Labs

The document summarizes a presentation about using Protocol Buffers and Redis together. It discusses how Protocol Buffers provide strict data types, versioning, and serialization/deserialization benefits. It then outlines Redis key patterns using namespaces, versions, data categories and identifiers. Examples are provided to show how Protocol Buffers messages can be stored in Redis using these key patterns, including storing connections data in sets, sorted sets and individual messages. Benefits discussed include structure, readability, testing and abstraction.

RedisConf17 - Redis Enterprise: Continuous Availability, Unlimited Scaling, S...

Redis Labs

RedisEnterprise provides continuous availability, unlimited scaling, security and cost-effective Redis capabilities. It offers Redis Enterprise Cloud on hosted resources within major clouds, and Redis Enterprise Pack software for on-premises use. Redis Enterprise uses a shared-nothing cluster architecture for high performance and availability, with instant recovery from various outage types like node failures. It can scale out easily and automatically by adding nodes. Redis Enterprise also offers flash memory support for lower costs, security features like role-based access control and encryption, and extensibility through modules. It aims to simplify compliance, automation, and multi-tenancy management across infrastructure types.

Walmart & IBM Revisit the Linear Road Benchmark- Roger Rea, IBM

Redis Labs

The document compares the performance of IBM Streams to other streaming analytics offerings using the Linear Road benchmark, finding that IBM Streams achieved an L-Rating of 200 using 4 Azure nodes, significantly outperforming Apache Apex and Apache Storm. It also describes how Walmart uses streaming analytics for real-time inventory control and logistics monitoring, and how IBM Streams was able to implement the Linear Road benchmark in under 15 days of development time.

What's new with enterprise Redis - Leena Joshi, Redis Labs

Redis Labs

Redis Labs manages over 160k+ HA databases, 10k clustered databases, without data loss in spite of one node failure a day and one data center outage per month. Using Enterprise Redis(RLEC), Redis Labs delivers seamless zero downtime scaling, true high availability with persistence, cross-rack/zone/ datacenter replication and instant automatic failover. Learn how. Join this session for a deep dive into how enterprise Redis makes for no-hassle Redis deployments and the roadmap for new Redis capabilities. Discover new cost savings with Redis on Flash for cost-effective high performance operations and analytics

Gwen Shapira, Confluent | Kafka Summit 2020 Keynote | Kafka’s New Architecture

confluent

HIgh Performance Redis- Tague Griffith, GoPro

Redis Labs

High Performance Redis looks at a wide range of techniques - from programming to system tuning - to deploy and maintain an extremely high performing Redis cluster. From the operational perspective, the talk lays out multiple techniques for clustering (sharding) Redis systems and examines how the different approaches impact performance time. The talk further examines system settings (Linux network parameters, Redis system) and how they impact performance (both good and bad). Finally, for the developer, we look at how different ways of structuring data actually demonstrate different performance characteristics

This document discusses Redis Fault Injection (RedFI), a tool for introducing faults during testing to simulate error conditions that may not otherwise occur. It provides fault injection as a proxy to Redis in a transparent way. Example use cases include testing race conditions, latency, dropped connections, and slow consumers. The document demonstrates RedFI and outlines future plans to integrate similar fault injection into Envoy and build replay and regression detection systems.

Spark Compute as a Service at Paypal with Prabhu Kasinathan

Databricks

Apache Spark is a gift to the big data community, which adds tons of new features on every release. However, it’s difficult to manage petabyte-scale Hadoop clusters with hundreds of edge nodes, multiple Spark releases and demonstrate operational efficiencies and standardization. In order to address these challenges, Paypal has developed and deployed a REST0based Spark platform: Spark Compute as a Service (SCaaS),which provides improved application development, execution, logging, security, workload management and tuning. This session will walk through the top challenges faced by PayPal administrators, developers and operations and describe how Paypal’s SCaaS platform overcomes them by leveraging open source tools and technologies, like Livy, Jupyter, SparkMagic, Zeppelin, SQL Tools, Kafka and Elastic. You’ll also hear about the improvements PayPal has added, which enable it to run greater than 10,000 Spark applications in production effectively.

RedisConf18 - Redis at LINE - 25 Billion Messages Per Day

Redis Labs

LINE uses Redis for caching and primary storage of messaging data. It operates over 60 Redis clusters with over 1,000 machines and 10,000 nodes to handle 25 billion messages per day. LINE developed its own Redis client and monitoring system to support client-side sharding without a proxy, automated failure detection, and scalable cluster monitoring. While the official Redis Cluster was tested, it exhibited some issues around memory usage and maximum node size for LINE's large scale needs.

Scalable and Reliable Logging at Pinterest

Krishna Gade

Pinterest uses Kafka as the central logging system to collect over 120 billion messages per day from thousands of hosts. They developed Singer, a lightweight logging agent, to reliably upload application logs to Kafka with low latency. Data is then moved from Kafka to cloud storage using systems like Secor and Merced that ensure exactly-once processing. Maintaining high log quality requires monitoring for anomalies, auditing new features, and catching issues both before and after releases through automated tooling.

Should you read Kafka as a stream or in batch? Should you even care? | Ido Na...

HostedbyConfluent

This document discusses whether it is better to process data using a stream or batch approach. It describes how one company evolved their data pipeline from a micro-batch streaming process to a batch approach. The streaming process was very expensive, costing $400,000 per year to run. It also had issues with wasted resources during idle times, slow processing during bursts of data, and long recovery times from outages. The company rearchitected the process to use discrete time windows run in isolated batch jobs. This new batch approach reduced costs by 60% to $160,000 per year and improved processing efficiency and outage recovery times.

Stream processing with Apache Flink @ OfferUp

Bowen Li

RedisConf17 - Operationalizing Redis at Scale

Redis Labs

This document discusses how Square operationalized Redis at scale. It describes their goals of standard setup and deployment, security through SSL/TLS, high availability, and monitoring. It then details how they used LXC containers, ghostunnel for SSL/TLS, disabled Redis persistence, relied on replication for durability, and implemented access controls. Looking forward, it discusses automating operations with SpinCycle and hopes for native Redis SSL/TLS and internal patches from Square.

RedisConf17 - Redis Development, An Update - @antirez

Redis Labs

Redis 4.0 will include several major new features and improvements including PSYNC2 for better replication, UNLINK for non-blocking DEL operations, and modules to allow building additional functionality on Redis. Streams will be a new data type in Redis 4.2, providing an abstract log file structure for ordered, timestamped entries that allows for efficient random access, deletion of old data, and multiple consumers similar to Kafka. Streams are designed to address use cases like time series data and message queues more efficiently than existing Redis data types.

RedisConf18 - Redis on Flash

Redis Labs

The document discusses Etermax's use of Redis as their primary database. They initially used MySQL but faced latency issues as user base and traffic grew. They migrated to Redis, starting with a single instance and eventually using Redis Enterprise Cloud with sharding. Testing showed Redis on Flash provided acceptable latency while significantly reducing infrastructure costs. Migrating production to Redis on Flash reduced costs by over 50% without code changes. Redis on Flash works well if instant data recovery is not required.

Redis in a Multi Tenant Environment–High Availability, Monitoring & Much More!

Redis Labs

This document discusses best practices for running Redis in a multi-tenant environment. It covers architectural considerations like high availability, security and isolation techniques using ACLs and SSL, and the importance of monitoring and understanding your environment. The key opportunities are that failover is easy to implement, changes can be introduced smoothly, and the architecture is reusable. Challenges include managing global sentinels and Redis drivers. Case studies demonstrate issues like customer spikes causing problems and the importance of monitoring.

Robust Stream Processing with Apache Flink

Jamie Grier

Building Microservices with Apache Kafka by Colin McCabe

Data Con LA

Abstract:- Building distributed systems is challenging. Luckily, Apache Kafka provides a powerful toolkit for putting together big services as a set of scalable, decoupled components. In this talk, I'll describe some of the design tradeoffs when building microservices, and how Kafka's powerful abstractions can help. I'll also talk a little bit about what the community has been up to with Kafka Streams, Kafka Connect, and exactly-once semantics.

Apache Flink @ Alibaba - Seattle Apache Flink Meetup

Bowen Li

This document summarizes Haitao Wang's experience working on streaming platforms at Alibaba and Microsoft. It describes Alibaba's data infrastructure challenges in handling large volumes of streaming data. It introduces Alibaba Blink, a distribution of Apache Flink that was developed to meet Alibaba's scale needs. Blink has achieved unprecedented throughput of 472 million events per second with latency of 10s of milliseconds. The document outlines improvements made in Blink's runtime, declarative SQL support, and use cases at Alibaba including real-time A/B testing, search index building, and online machine learning.

Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day

Ankur Bansal

Building data pipelines is pretty hard! Building a multi-datacenter active-active real time data pipeline for multiple classes of data with different durability, latency and availability guarantees is much harder. Real time infrastructure powers critical pieces of Uber (think Surge) and in this talk we will discuss our architecture, technical challenges, learnings and how a blend of open source infrastructure (Apache Kafka and Samza) and in-house technologies have helped Uber scale.

Bitsy graph database

LambdaZen LLC

RedisConf18 - Implementing a New Data Structure for Redis

Redis Labs

1. The document discusses implementing a new data structure, Cuckoo filters, as a Redis module. It describes the presenter's experience level in C programming and goals for the project. 2. Probabilistic data structures like Bloom filters and Cuckoo filters are described as space-efficient alternatives to sets for membership testing with some error. The presenter explains why Cuckoo filters were chosen over Bloom filters for the ability to delete items. 3. Advice is given on starting a new Redis module, including using existing code as examples and writing tests at different speeds. Good API design, leveraging existing work, and thorough testing are emphasized.

Vitalii Korzh - "Exciting Migrations"

LogeekNightUkraine

The document discusses the migration of a large enterprise digital asset management (DAM) system over 4 years from an on-premise architecture to AWS. Some key changes included moving from Oracle to MySQL to reduce licensing costs, dockerizing services, migrating over 400TB of data to S3, upgrading Solr versions to reduce index size, and adding asynchronous processing to improve scalability. The migration required addressing challenges such as slow builds, overwhelmed networks during large data transfers, and compatibility issues when dockerizing some applications. The lessons learned highlight planning evolutions over revolutions, understanding limitations, and having metrics to address issues.

RedisConf18 - Redis Enterprise on Cloud Native Platforms

Redis Labs

This document provides an introduction to cloud-native platforms and Kubernetes, and demonstrates how Redis Enterprise can run on these platforms. It discusses how Kubernetes provides orchestration of containers and manages the application lifecycle. It then demonstrates deploying Redis Enterprise on Kubernetes, showing how it uses a custom Kubernetes controller and operator to provide auto-bootstrapping of Redis clusters within Kubernetes pods. The demo shows creating a Redis database, service discovery, and benchmarking tool deployment on the Kubernetes-hosted Redis Enterprise clusters.

Escalando Foursquare basado en Checkins y Recomendaciones

Manuel Vargas

1) Foursquare scaled its data storage by sharding and replicating across multiple databases as user and venue data grew significantly. 2) As the application complexity increased, Foursquare transitioned to a service-oriented architecture using Finagle for RPC but faced challenges with duplication, tracing issues, and reliability. 3) Foursquare developed common tools for builds, deploys, monitoring, tracing, and circuit breaking to help manage the increasingly distributed system and facilitate independent development of features.

Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...

HostedbyConfluent

This document summarizes Heng Zhang's presentation on improving logging ingestion quality at Pinterest. It discusses how Pinterest ingests large volumes of logging data at scale through a pipeline that favors scalability over consistency. This can lead to data corruption and loss issues. The presentation proposes a logging auditing framework to address these problems. It would add CRC checksums, audit headers and events at various stages to detect corrupted messages, track data loss metrics, and process audit events to remove bad data and provide alerts. The framework was tested and rolled out across Pinterest's ingestion pipelines with no downtime, improving data quality.

Globus Compute with Integrated Research Infrastructure (IRI) workflows

Globus

As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and I will give a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.

Globus Compute wth IRI Workflows - GlobusWorld 2024

Globus

As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.

What's hot

RedisConf18 - Redis Fault Injection

Redis Labs

Spark Compute as a Service at Paypal with Prabhu Kasinathan

Databricks

RedisConf18 - Redis at LINE - 25 Billion Messages Per Day

Redis Labs

Scalable and Reliable Logging at Pinterest

Krishna Gade

Should you read Kafka as a stream or in batch? Should you even care? | Ido Na...

HostedbyConfluent

Stream processing with Apache Flink @ OfferUp

Bowen Li

RedisConf17 - Operationalizing Redis at Scale

Redis Labs

RedisConf17 - Redis Development, An Update - @antirez

Redis Labs

RedisConf18 - Redis on Flash

Redis Labs

Redis in a Multi Tenant Environment–High Availability, Monitoring & Much More!

Redis Labs

Robust Stream Processing with Apache Flink

Jamie Grier

Building Microservices with Apache Kafka by Colin McCabe

Data Con LA

Apache Flink @ Alibaba - Seattle Apache Flink Meetup

Bowen Li

Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day

Ankur Bansal

Bitsy graph database

LambdaZen LLC

RedisConf18 - Implementing a New Data Structure for Redis

Redis Labs

Vitalii Korzh - "Exciting Migrations"

LogeekNightUkraine

RedisConf18 - Redis Enterprise on Cloud Native Platforms

Redis Labs

Escalando Foursquare basado en Checkins y Recomendaciones

Manuel Vargas

Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...

HostedbyConfluent

What's hot (20)

RedisConf18 - Redis Fault Injection

Spark Compute as a Service at Paypal with Prabhu Kasinathan

RedisConf18 - Redis at LINE - 25 Billion Messages Per Day

Scalable and Reliable Logging at Pinterest

Should you read Kafka as a stream or in batch? Should you even care? | Ido Na...

Stream processing with Apache Flink @ OfferUp

RedisConf17 - Operationalizing Redis at Scale

RedisConf17 - Redis Development, An Update - @antirez

RedisConf18 - Redis on Flash

Redis in a Multi Tenant Environment–High Availability, Monitoring & Much More!

Robust Stream Processing with Apache Flink

Building Microservices with Apache Kafka by Colin McCabe

Apache Flink @ Alibaba - Seattle Apache Flink Meetup

Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day

Bitsy graph database

RedisConf18 - Implementing a New Data Structure for Redis

Vitalii Korzh - "Exciting Migrations"

RedisConf18 - Redis Enterprise on Cloud Native Platforms

Escalando Foursquare basado en Checkins y Recomendaciones

Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...

Similar to RedisConf17 - Pain-free Pipelining

Globus Compute with Integrated Research Infrastructure (IRI) workflows

Globus

Globus Compute wth IRI Workflows - GlobusWorld 2024

Globus

As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.

Redis Day Keynote Salvatore Sanfillipo Redis Labs

Redis Labs

Cassandra Day SV 2014: Spark, Shark, and Apache Cassandra

DataStax Academy

Evan Chan from Ooyala presents on integrating Apache Spark and Apache Cassandra for interactive analytics. He discusses how Ooyala uses Cassandra for analytics and is becoming a major Spark user. The talk focuses on using Spark to generate dynamic queries over Cassandra data, as precomputing all possible aggregates is infeasible at Ooyala's scale. Chan describes Ooyala's architecture that uses Spark to generate materialized views from Cassandra for fast querying, and demonstrates running queries over a Spark/Cassandra dataset.

Distributed app development with nodejs and zeromq

Ruben Tan

This document discusses using Node.js and ZeroMQ for distributed application development. It defines distributed applications as apps distributed across multiple cloud locations that communicate via a standardized protocol. ZeroMQ is introduced as a socket library that can be used for inter-app communication, with common patterns being push-pull for sending data and req-rep for request-response. Scaling is discussed as adding more app instances for push-pull and adding more rep apps for req-rep. Sample ZeroMQ code in Node.js is also provided.

Network with node

Philipp Fehre

This document discusses using Node.js for networking and making asynchronous DNS requests. Some key advantages of Node.js are its event-driven programming model and speed. It describes building packets and sending them via UDP/TCP sockets. It also covers performing a large-scale reverse DNS lookup project more quickly using an asynchronous approach with Node.js compared to a synchronous Python implementation.

Integrate Kubernetes into CORD(Central Office Re-architected as a Datacenter)

inwin stack

- CORD aims to virtualize telecom central offices using open source software and commodity hardware. Kubernetes could help integrate NFV apps but challenges remain. - Issues include converting existing VM-based NFVs to containers, supporting both OpenStack and Kubernetes, and ensuring the SDN controller ONOS can communicate with Kubernetes network components. - The presenter's team addressed these by designing a multi-interface CNI plugin and centralized IPAM using Etcd to integrate ONOS and provide pod networking. Further work is needed to fully integrate ONOS control and test the solution.

Building Scalable, Distributed Job Queues with Redis and Redis::Client

Mike Friedman

This document discusses using Redis and the Redis::Client Perl module to build scalable distributed job queues. It provides an overview of Redis, describing it as a key-value store that is simple, fast, and open-source. It then covers the various Redis data types like strings, lists, hashes, sets and sorted sets. Examples are given of how to work with these types using Redis::Client. The document discusses using Redis lists to implement job queues, with jobs added via RPUSH and popped via BLPOP. Benchmark results show the Redis-based job queue approach significantly outperforms using a MySQL jobs table with polling. Some caveats are provided about the benchmarks.

Serving Deep Learning Models At Scale With RedisAI: Luca Antiga

Redis Labs

This document provides an overview and roadmap for RedisAI, which allows serving deep learning models using Redis. Key points: - RedisAI turns Redis into a full-fledged deep learning runtime by introducing tensors as a new data type and enabling model execution on CPU and GPU. - Models can be exported from frameworks like TensorFlow and PyTorch and served using the RedisAI API. Scripts can also be used to define computations directly in RedisAI. - RedisAI aims to keep models hot in memory, run anywhere Redis runs, and optimize resource usage. Future plans include DAG execution, auto-batching, ONNX support, and advanced monitoring. - A demo of RedisAI will be provided

Build a Deep Learning App with Tensorflow & Redis by Jayesh Ahire and Sherin ...

Redis Labs

Running a distributed system across kubernetes clusters - Kubecon North Ameri...

Alex Robinson

Kubernetes makes it easy to run distributed applications, even those that manage persistent state, within the confines of a single cluster. Running the same applications in a multi-region or multi-cloud fashion across multiple Kubernetes clusters, however, is considerably more difficult due to the networking and service discovery problems involved. In this talk, Alex will walk through his team’s experience over the last six months of running a distributed database across Kubernetes clusters in different regions and their attempts to make the process repeatable on different cloud providers and on-prem environments. He’ll cover common problems they encountered, solutions they’ve tried, how they’re running things today, and the future improvements he’s most excited about from community projects like Istio.

Hadoop Spark - Reuniao SouJava 12/04/2014

soujavajug

Todd Lipcon gives a presentation introducing Apache Spark. He begins with an overview of Spark, explaining that it is a general purpose computational framework that improves on MapReduce by leveraging distributed memory for better performance and providing a more developer-friendly API. Lipcon then discusses Spark's Resilient Distributed Datasets (RDDs) and its expressive transformations and actions API. He provides examples of word count programs in Java and Scala. Lipcon also highlights Spark's integration with Hadoop, built-in machine learning library MLlib, and streaming capabilities through Spark Streaming.

Redis Everywhere - Sunshine PHP

Ricard Clau

Redis is an in-memory key-value data store that can be used for caching, sessions, queues, leaderboards, and more. It provides fast performance due to being memory-resident and supporting different data structures like strings, hashes, lists, sets, and sorted sets. Redis is useful for read-heavy and real-time applications but may not be suitable if data does not fit in memory or for relational data needs. The presentation discusses using Redis with PHP and Symfony, data sharding strategies, and war stories from a social game with 7.5M daily users.

Scaling Hadoop at LinkedIn

DataWorks Summit

LinkedIn leverages the Apache Hadoop ecosystem for its big data analytics. Steady growth of the member base at LinkedIn along with their social activities results in exponential growth of the analytics infrastructure. Innovations in analytics tooling lead to heavier workloads on the clusters, which generate more data, which in turn encourage innovations in tooling and more workloads. Thus, the infrastructure remains under constant growth pressure. Heterogeneous environments embodied via a variety of hardware and diverse workloads make the task even more challenging. This talk will tell the story of how we doubled our Hadoop infrastructure twice in the past two years. • We will outline our main use cases and historical rates of cluster growth in multiple dimensions. • We will focus on optimizations, configuration improvements, performance monitoring and architectural decisions we undertook to allow the infrastructure to keep pace with business needs. • The topics include improvements in HDFS NameNode performance, and fine tuning of block report processing, the block balancer, and the namespace checkpointer. • We will reveal a study on the optimal storage device for HDFS persistent journals (SATA vs. SAS vs. SSD vs. RAID). • We will also describe Satellite Cluster project which allowed us to double the objects stored on one logical cluster by splitting an HDFS cluster into two partitions without the use of federation and practically no code changes. • Finally, we will take a peek at our future goals, requirements, and growth perspectives. SPEAKERS Konstantin Shvachko, Sr Staff Software Engineer, LinkedIn Erik Krogen, Senior Software Engineer, LinkedIn

Still waiting for IPv6? Try the inlets-operator

Alex Ellis

This document introduces inlets, a tool that provides public IPs for Kubernetes clusters by tunneling traffic through a proxy. It discusses how the number of available IPv4 addresses has been exhausted, and how inlets addresses this by tapping into unused IP address stockpiles from cloud providers. The document outlines several use cases for inlets including webhook integrations, chatops, and IoT command and control. It also provides an overview of the inlets operator for Kubernetes and the roadmap for future inlets features and services.

Kubernetes上で動作する機械学習モジュールの配信＆管理基盤Rekcurd について

LINE Corporation

Rekcurd is a tool for deploying and managing machine learning models on Kubernetes. It provides a dashboard for uploading models, switching between models without stopping services, and versioning models. Rekcurd integrates with Kubernetes for high availability, auto-scaling, load balancing and auto-healing of machine learning services. The presentation demonstrates how Rekcurd addresses common tasks in serving machine learning models on Kubernetes.

Cloud Connected Devices on a Global Scale (CPN303) | AWS re:Invent 2013

Amazon Web Services

Increasingly, mobile and other connected devices are leveraging the scalability and capabilities of the cloud to deliver services to end users. However, connecting these devices to the cloud presents unique challenges. Resource constraints make it impossible to use many common frameworks and transport restrictions make it difficult to use dynamic cloud resources. In this session, learn how you can develop and deploy highly-scalable global solutions using Amazon Web Services (Amazon Virtual Private Cloud, Elastic IP addresses, Amazon Route 53, Auto Scaling) and tools like Puppet. Hear how Panasonic and Banjo architect their cloud infrastructure from both a start-up and enterprise perspective.

NOSQL, CouchDB, and the Cloud

boorad

Brad Anderson presented on NOSQL databases and CouchDB. He discussed how relational databases do not scale well and are rigid. NOSQL databases like CouchDB are a better fit for large, growing datasets. CouchDB is a document oriented database written in Erlang that uses a REST API and supports views and incremental replication. It can be deployed on a cloud platform to improve scalability, redundancy and query distribution.

Intro to CakePHP

Walther Lalk

CakePHP is a modern PHP framework that aims to reduce development time and promote rapid application development. It takes a convention over configuration approach and encourages best practices like DRY coding. CakePHP has an active community that provides support through forums, Slack, IRC and meetups. The framework continues to evolve through new releases that bring additional features while maintaining backwards compatibility.

DevNation Atlanta

boorad

This document provides an overview of NoSQL databases and CouchDB. It discusses how NoSQL databases are a better fit than relational databases for large datasets and real-time applications. It then describes CouchDB, an open-source document-oriented NoSQL database, covering its features like schema-free documents, robustness, concurrency, REST API, views, replication, and deployment in the cloud. The document concludes with a discussion of Erlang and eventually demos CouchDB.

Similar to RedisConf17 - Pain-free Pipelining (20)

Globus Compute with Integrated Research Infrastructure (IRI) workflows

Globus Compute wth IRI Workflows - GlobusWorld 2024

Redis Day Keynote Salvatore Sanfillipo Redis Labs

Cassandra Day SV 2014: Spark, Shark, and Apache Cassandra

Distributed app development with nodejs and zeromq

Network with node

Integrate Kubernetes into CORD(Central Office Re-architected as a Datacenter)

Building Scalable, Distributed Job Queues with Redis and Redis::Client

Serving Deep Learning Models At Scale With RedisAI: Luca Antiga

Build a Deep Learning App with Tensorflow & Redis by Jayesh Ahire and Sherin ...

Running a distributed system across kubernetes clusters - Kubecon North Ameri...

Hadoop Spark - Reuniao SouJava 12/04/2014

Redis Everywhere - Sunshine PHP

Scaling Hadoop at LinkedIn

Still waiting for IPv6? Try the inlets-operator

Kubernetes上で動作する機械学習モジュールの配信＆管理基盤Rekcurd について

Cloud Connected Devices on a Global Scale (CPN303) | AWS re:Invent 2013

NOSQL, CouchDB, and the Cloud

Intro to CakePHP

DevNation Atlanta

More from Redis Labs

Redis Day Bangalore 2020 - Session state caching with redis

Redis Labs

This document discusses using Redis caching to improve performance for the DBS Paylah mobile wallet application. Paylah aims to significantly increase its user base which will increase load on its backend systems. Caching application data and session state in Redis can reduce latency, improve responsiveness for users, and reduce costs by lowering load on legacy backend databases and mainframes. The document outlines some key Paylah use cases where caching transaction histories and account details in Redis would accelerate retrieval and improve the mobile experience by avoiding the need to access slower backend systems on each request.

Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020

Redis Labs

The document discusses rate limiting and metering using Redis. It begins by introducing rate limiting and metering and why Redis is well-suited for these tasks. It then covers different Redis data structures that can be used, such as lists, hashes, sorted sets and strings. Common Redis commands for counting, setting keys and checking time to live are also presented. Different rate limiting design patterns and anti-patterns are described, including fixed window, sliding window and token bucket approaches. Finally, resources for further information are provided.

SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020

Redis Labs

The document discusses common use cases for combining SQL, Redis, and Kubernetes including caching, session management, rate limiting, and data ingestion. It outlines how Kubernetes can be used for scaling microservices while Redis is used for data service scaling. The presentation proposes combining Redis, SQL Server, and Kubernetes with a proxy service, and describes using Redis for caching, session storage, and rate limiting of SQL data. It also discusses running Redis and front-end apps on Kubernetes and deploying SQL as a Kubernetes service through a proxy.

Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...

Redis Labs

This document discusses using Rust and Redis to build cloud native platforms. It first provides context about devops and the need to do more with less. It then discusses how platforms are becoming more distributed and Kubernetes upends distribution paradigms. The document dives into how Rust addresses issues like concurrency and systems programming. It also discusses how Redis can be used for caching, queues, streams and more. Finally, it mentions that Rust and Redis will be demonstrated.

Redis for Data Science and Engineering by Dmitry Polyakovsky of Oracle

Redis Labs

This document contains a presentation about using Redis for data science and engineering. It introduces the presenter and provides an agenda that covers using Redis for data science and data engineering. The presentation notes that Redis can be used as both a data store and job queue, has flexible data structures and is fast, though it uses RAM and cannot query by value. It also lists Python Pandas and includes a demo and links for further information.

Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020

Redis Labs

Jamie Scott from RedisLabs presented on practical use cases for access control lists (ACLs) in Redis 6. The presentation covered new security features in Redis 6 including encryption in transit, key space and command restrictions, and multiple access control list users. It demonstrated how ACLs allow users to define access based on key labels and restrictions. ACLs can facilitate discretionary and mandatory access controls. The presentation showed examples of using ACLs to restrict user access by key labels and commands to enhance operational security.

Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020

Redis Labs

This document summarizes a presentation about Redis version 6 and beyond. Some key points include: - Redis version 6 includes new features like ACL for security, client-side caching, diskless replication, and multi-threaded I/O. - Redis is positioned as both a cache and a database due to its speed, data structures, and ability to handle complex data models through modules. - Redis Enterprise provides additional capabilities like durability, high availability, geo-distribution, security and multi-tenancy. - Modern data models in Redis modules include Streams, RediSearch, RedisGraph, RedisTimeSeries, RedisAI, RedisJSON and RedisBloom. - RedisInsight is

Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...

Redis Labs

The document discusses how Sinclair Broadcast Group leverages Redis for system monitoring of its content delivery network. It operates 193 news stations with 10,000 active pages daily and millions in archive. New stories are posted every 15 seconds and must be visible across its 1,000+ targets within 1 minute. Redis is used to track performance across the multi-level CDN and ensure service level agreements are met with real-time resolution and alerting. It provides a black box view of the audience experience and can scale monitoring to all relevant pages within 30 seconds. Redis acts as a distributed data store to parallelize the monitoring task across the large scale of the network.

JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...

Redis Labs

The document summarizes a presentation about when to use the RedisJSON data type. It discusses how Coupang uses Redis extensively for their ad platform. It then compares the performance and memory usage of storing JSON data as strings, hashes, or using the RedisJSON data type. Benchmark results show RedisJSON can provide better performance for retrieving and updating JSON fields compared to strings and hashes, though it uses more memory. The document recommends using RedisJSON for smaller JSON payloads after benchmarking and memory monitoring.

Highly Available Persistent Session Management Service by Mohamed Elmergawi o...

Redis Labs

The document discusses the challenges of building a highly available persistent session management service. It describes Zulily's legacy architecture which lacked high availability and required manual intervention. A new architecture is proposed using Redis for persistent storage, Dynomite for real-time replication across data centers, and a connection pooling proxy to improve efficiency and distribute load. The architecture provides high availability through replication, reduces overhead through connection pooling, and handles failures through consistent hashing and health checks. It was tested through simulations and showed a failure rate of only 0.42% during outages.

Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...

Redis Labs

The document describes the process that a Redis command follows from the client side to the server side. On the client side, the command is sent over the network to the Redis server. On the server side, the command is read from the kernel buffers, validated, executed by calling the relevant command handler, and the response is written back to the client over the network. The core functions involved on the server side are ReadQueryFromClient(), ProcessInputBuffer(), ProcessCommand(), Call(), and handleClientsWithPendingWrites(). Redis 6.0 introduced I/O threads to handle reads and writes in parallel for improved performance while still maintaining Redis' single-threaded processing model.

Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...

Redis Labs

This document discusses MDmetrix, a healthcare data intelligence company that uses RedisGraph to provide flexible analysis of hospital data. RedisGraph is a graph database that represents data as nodes and relationships and uses an adjacency matrix and linear algebra to query the graph. MDmetrix models its healthcare data as a property graph in RedisGraph to allow for complex queries across different data dimensions like patients, facilities, procedures and drugs. RedisGraph allows MDmetrix to query the data more easily than traditional OLAP cubes or relational databases due to the semi-structured and flexible nature of the graph model.

RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020

Redis Labs

RediSearch 1.6 includes a new low-level API that allows other Redis modules to embed RediSearch indexing capabilities. It also introduces index aliasing and several performance improvements such as forked thread garbage collection. Based on benchmarks, RediSearch 1.6 shows 48-73% better performance than version 1.4, particularly during high update rates where it maintains more stable read latencies.

RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020

Redis Labs

RedisGraph 2.0 provides significant improvements including: - Full text search support through embedded RediSearch 1.6 enabling graph-aided search. - Support for returning full graph responses to enable better visualization. - Broad support for Cypher including triadic closure and new graph-aided search capabilities. - Performance improvements of up to 3.7x faster operations per second and 3.9x faster query times compared to RedisGraph v1.2. - Support for benchmarking including the LDBC benchmark.

RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020

Redis Labs

RedisTimeSeries is a time-series database that provides compression to reduce memory usage by up to 98% and improve performance. The RedisTimeSeries 1.2 release includes compression algorithms based on a Facebook paper that provide stable ingestion times independent of the number of data points. It also includes a reviewed API with performance improvements and clearer functionality. Performance testing showed ingestion throughput improved by 2-3% and query performance increased from 15-70% with the new release compared to the previous version.

RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020

Redis Labs

This document summarizes RedisAI 0.9 and its capabilities for model deployment and benchmarking. It introduces RedisAI's new tensor data type and ability to deploy models to CPU and GPU. It then discusses AIBench, a tool developed to benchmark AI serving solutions like RedisAI, TensorFlow Serving, and REST APIs. The benchmarks show RedisAI providing 5.5x and 2.5x more inferences than REST APIs and TensorFlow Serving respectively, due to its data locality. The document concludes by mentioning RedisAI's integration with MLFlow for model deployment with a single command.

Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...

Redis Labs

The document discusses how Freshworks uses Redis Labs to rate limit 30 million API requests per day through their API gateway called Fluffy. Fluffy stores rate limit policies and maintains counters to track requests. Redis Labs allows Fluffy to easily scale to handle the high volume of requests by providing a fast, in-memory data store for managing rate limiting counters. The system was able to successfully rate limit 30 million requests per day with Redis Labs.

Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...

Redis Labs

Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...

Redis Labs

Redis was used by Myntra to solve several complex scaling problems. It was used to build a scalable user segment service to support high read throughput of up to 5 million requests per minute with low latency. Redis allowed the service to scale beyond a single instance and included features like automatic backups and memory management. Redis also helped build a scalable mobile verification platform to reliably handle 100,000 requests per minute and scale to support higher future volumes. It was used as both a transient store and persistent backend. Finally, Redis locks helped build a scalable A/B testing platform by allowing experiments to be created and updated in an orderly concurrent fashion.

Redis as a High Scale Swiss Army Knife by Rahul Dagar and Abhishek Gupta of G...

Redis Labs

This document discusses how Redis is used as a high-performance data store and messaging broker to power various services and personalization features at Goibibo, a leading online travel agency in India. Some key ways Redis is used include caching website content to improve performance, powering probabilistic models for personalization, acting as a broker for asynchronous background tasks, storing real-time user behavior signals to power adaptive features, and powering location-based services. Redis provides high throughput, reliability and various data structures to meet Goibibo's needs.

More from Redis Labs (20)