An insight into NoSQL solutions implemented at RTV Slovenia and elsewhere, what problems we are trying to solve and an introduction to solving them with Redis.
Talk given at #wwwh @ Ljubljana, 30.1.2013 by me, Tit Petric
Redis is an open source in memory database which is easy to use. In this introductory presentation, several features will be discussed including use cases. The datatypes will be elaborated, publish subscribe features, persistence will be discussed including client implementations in Node and Spring Boot. After this presentation, you will have a basic understanding of what Redis is and you will have enough knowledge to get started with your first implementation!
Redis is an open source in memory database which is easy to use. In this introductory presentation, several features will be discussed including use cases. The datatypes will be elaborated, publish subscribe features, persistence will be discussed including client implementations in Node and Spring Boot. After this presentation, you will have a basic understanding of what Redis is and you will have enough knowledge to get started with your first implementation!
Provides an overview of Redis which is a Key Value NoSQL database and the different data types it supports. Also shows how to use Redis Client API from node.
Slides from my talk on Redis at Human talks nantes http://humantalks.com/cities/nantes/events/136/talks/414-introduction-a-redis
Redsmin - Fully featured GUI for Redis: https://redsmin.com @redsmin
RedisWeekly - http://redisweekly.com @redisweekly
Redis Introduction and customized framework base on StackExchange.Redis but update to using singleton pattern and JSON
Configuration Mapping with Redis Instance Group and Name concept.
An overview and discussion on indexing data in Redis to facilitate fast and efficient data retrieval. Presented on September 22nd, 2014 to the Redis Tel Aviv Meetup.
Redis Use Patterns (DevconTLV June 2014)Itamar Haber
An introduction to Redis for the SQL practitioner, covering data types and common use cases.
The video of this session can be found at: https://www.youtube.com/watch?v=8Unaug_vmFI
The slides we used at the first meetup hosted at Redis Labs' TLV offices :)
Touches on some of the more notable user-facing functionality in the newest Redis version, as well as interesting internal optimizations with major gains.
#RedisTLV: www.meetup.com/Tel-Aviv-Redis-Meetup/events/227594422/
Starting with v4, modules hold a promise for changing how Redis is used and developed for. Enabling custom data types and commands, Redis Modules build upon and extend the core functionality to handle any use case.
The video of the webinar given with these slides is at: https://youtu.be/EglSYFodaqw
Noah Davis & Luke Melia of Weplay share a series of examples of Redis in the real world. In doing so, they cover a survey of Redis' features, approach, history and philosophy. Most examples are drawn from the Weplay team's experience using Redis to power features on Weplay.com, a social site for youth sports.
Redis is a NoSQL technology that rides a fine line between database and in-memory cache. Redis also offers "remote data structures", which gives it a significant advantage over other in-memory databases. This session will cover several PHP clients for Redis, and how to use them for caching, data modeling and generally improving application throughput.
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...Chester Chen
Building highly efficient data lakes using Apache Hudi (Incubating)
Even with the exponential growth in data volumes, ingesting/storing/managing big data remains unstandardized & in-efficient. Data lakes are a common architectural pattern to organize big data and democratize access to the organization. In this talk, we will discuss different aspects of building honest data lake architectures, pin pointing technical challenges and areas of inefficiency. We will then re-architect the data lake using Apache Hudi (Incubating), which provides streaming primitives right on top of big data. We will show how upserts & incremental change streams provided by Hudi help optimize data ingestion and ETL processing. Further, Apache Hudi manages growth, sizes files of the resulting data lake using purely open-source file formats, also providing for optimized query performance & file system listing. We will also provide hands-on tools and guides for trying this out on your own data lake.
Speaker: Vinoth Chandar (Uber)
Vinoth is Technical Lead at Uber Data Infrastructure Team
Provides an overview of Redis which is a Key Value NoSQL database and the different data types it supports. Also shows how to use Redis Client API from node.
Slides from my talk on Redis at Human talks nantes http://humantalks.com/cities/nantes/events/136/talks/414-introduction-a-redis
Redsmin - Fully featured GUI for Redis: https://redsmin.com @redsmin
RedisWeekly - http://redisweekly.com @redisweekly
Redis Introduction and customized framework base on StackExchange.Redis but update to using singleton pattern and JSON
Configuration Mapping with Redis Instance Group and Name concept.
An overview and discussion on indexing data in Redis to facilitate fast and efficient data retrieval. Presented on September 22nd, 2014 to the Redis Tel Aviv Meetup.
Redis Use Patterns (DevconTLV June 2014)Itamar Haber
An introduction to Redis for the SQL practitioner, covering data types and common use cases.
The video of this session can be found at: https://www.youtube.com/watch?v=8Unaug_vmFI
The slides we used at the first meetup hosted at Redis Labs' TLV offices :)
Touches on some of the more notable user-facing functionality in the newest Redis version, as well as interesting internal optimizations with major gains.
#RedisTLV: www.meetup.com/Tel-Aviv-Redis-Meetup/events/227594422/
Starting with v4, modules hold a promise for changing how Redis is used and developed for. Enabling custom data types and commands, Redis Modules build upon and extend the core functionality to handle any use case.
The video of the webinar given with these slides is at: https://youtu.be/EglSYFodaqw
Noah Davis & Luke Melia of Weplay share a series of examples of Redis in the real world. In doing so, they cover a survey of Redis' features, approach, history and philosophy. Most examples are drawn from the Weplay team's experience using Redis to power features on Weplay.com, a social site for youth sports.
Redis is a NoSQL technology that rides a fine line between database and in-memory cache. Redis also offers "remote data structures", which gives it a significant advantage over other in-memory databases. This session will cover several PHP clients for Redis, and how to use them for caching, data modeling and generally improving application throughput.
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...Chester Chen
Building highly efficient data lakes using Apache Hudi (Incubating)
Even with the exponential growth in data volumes, ingesting/storing/managing big data remains unstandardized & in-efficient. Data lakes are a common architectural pattern to organize big data and democratize access to the organization. In this talk, we will discuss different aspects of building honest data lake architectures, pin pointing technical challenges and areas of inefficiency. We will then re-architect the data lake using Apache Hudi (Incubating), which provides streaming primitives right on top of big data. We will show how upserts & incremental change streams provided by Hudi help optimize data ingestion and ETL processing. Further, Apache Hudi manages growth, sizes files of the resulting data lake using purely open-source file formats, also providing for optimized query performance & file system listing. We will also provide hands-on tools and guides for trying this out on your own data lake.
Speaker: Vinoth Chandar (Uber)
Vinoth is Technical Lead at Uber Data Infrastructure Team
This is an exam cheat sheet hopes to cover all keys points for GCP Data Engineer Certification Exam
Let me know if there is any mistake and I will try to update it
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...javier ramirez
How would you build a database to support sustained ingestion of several hundreds of thousands rows per second while running near real-time queries on top?
In this session I will go over some of the technical decisions and trade-offs we applied when building QuestDB, an open source time-series database developed mainly in JAVA, and how we can achieve over four million row writes per second on a single instance without blocking or slowing down the reads. There will be code and demos, of course.
We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Apache Cassandra, part 2 – data model example, machineryAndrey Lomakin
Aim of this presentation to provide enough information for enterprise architect to choose whether Cassandra will be project data store. Presentation describes each nuance of Cassandra architecture and ways to design data and work with them.
Architectural anti-patterns for data handlingGleicon Moraes
Now with three more anti patterns and a new required listening. This is the Discipline release, all hail to King Crimson and Fripp's care with details.
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...Amazon Web Services
Learn how to deploy a managed Presto environment to interactively query log data on AWS
Organizations often need to quickly analyze large amounts of data, such as logs, generated from a wide variety of sources and formats. However, traditional approaches require a lot of time and effort designing complex data transformation and loading processes; and configuring data warehouses. Using AWS, you can start querying your datasets within minutes
In this webinar you will learn how you can deploy a managed Presto environment in minutes to interactively query log data using plain ANSI SQL. Presto is a popular open source SQL engine for running interactive analytic queries against data sources of all sizes. We will talk about common use cases and best practices for running Presto on Amazon EMR.
Learning Objectives:
• Learn how to deploy a managed Presto environment running on Amazon EMR
• Understand best practices for running Presto on Amazon EMR, including use of Amazon EC2 Spot instances
• Learn how other customers are using Presto to analyze large data sets
A quick start guide to using HDF5 files in GLOBE ClaritasGuy Maslen
GLOBE Claritas V6.0 includes support for a new data format based on the HDF5 standard; here's how to get started with HDF5 files, and the benefits they bring
Overview of MongoDB and Other Non-Relational DatabasesAndrew Kandels
My Minnesota PHP Usergroup (mnphp.org) presentation where I give an overview on MongoDB and other non-relational databases and their ability to solve unique, complex problems.
Give you a brief overview of the product. - What is esProc SPL? And show some cases helping you to know what it uses for. Talk about why esProc works better. And overview its brief characteristics. After that, Introduce the main technical solutions which esProc is often used.
Hands on experience in real-time data process with AWS Kinesis, Firehose, S3 ...Chuan-Yen Chiang
Hands-on experience in building real-time data process pipeline with AWS Kinesis, Firehose, S3 and Athena. And why we migrate our data analysis job from Google BigQuery to AWS Athena
2. NoSQL: why it’s here
SQL:
- Slow query performance
- Concurrency / locking
- Hard to scale (even harder for writes, storage)
Typical problems
- Session storage
- Statistics (high write to read ratio)
- Modifying schema on large data sets
Tit Petric / Twitter @titpetric
3. NoSQL: memcache
Memcache (RTV 2008-Present)
- Pro: stability, speed
- Pro: simple text protocol (added binary to fuck with us)
- Love/Hate: Scaling out reads/writes
- Con: Persistence
- Con: Replication
- Con: Key eviction not based on LRU/TTL but slab allocation
Tit Petric / Twitter @titpetric
4. NoSQL: sharedance
Sharedance (2009-2011)
- Pro: Persistent KV storage
- Pro: Simple text protocol (wrote a client in LUA)
- Con: Had to patch daemon to handle eviction load (1 key = 1
file, filesystems can’t handle this)
- Con: Had to use special ReiserFS filesystem on deployment
Tit Petric / Twitter @titpetric
6. NoSQL: uses at RTV Slovenia
To-do (2013-?)
- Memcached protocol translator to Redis
- Look at twemcached, avoid client based sharding
- Implement webdis deployment
- Redis scripting with LUA
Tit Petric / Twitter @titpetric
7. NoSQL: Redis data types
Redis is still a Key/Value store!
Only we can have different values:
- Strings (essentialy the same as memcache)
- Hashes (nested k/v pairs)
- Lists (simple arrays)
- Sets (unique arrays)
- Sorted sets (weighted unique arrays)
Tit Petric / Twitter @titpetric
8. NoSQL: Redis data types
LETS SEE SOME EXAMPLES!
Tit Petric / Twitter @titpetric
9. Redis data types: Strings
Limiting Google bot crawl rate
“A 503 (Service Unavailable) error will result in fairly
frequent retrying. To temporarily suspend crawling, it is
recommended to serve a 503 HTTP result code.”
Tit Petric / Twitter @titpetric
10. Redis data types: Strings
Limiting Google bot crawl rate
SETNX – set a key if it doesn’t exist
EXPIRE – expire a key after TTL seconds
INCR – increment value by one
Tit Petric / Twitter @titpetric
11. Redis data types: Strings
Limiting Google bot crawl rate
Tit Petric / Twitter @titpetric
12. Redis data types: Hashes
News ratings
HMSET – set multiple hash fields / values
HGETALL – get all fields and values
HINCRBY – increment integer value of a hash
field
Tit Petric / Twitter @titpetric
14. Redis data types: Hashes
News ratings
Vote data Why Expire?
Race condition.
We need HMSETNX
Could be better.
Tit Petric / Twitter @titpetric
15. Redis data types: Hashes
Other use cases include:
- User data, partial data retrieval
- select username, realname, birthday from users where id=?
- HMGET users:$id username realname birthday
- Using SORT (list|set) BY hash values
- Don’t use HASHes to store session. Eviction policy
works on KEYS not on hash values!
Tit Petric / Twitter @titpetric
16. Redis data types: Lists
Any kind of information log (statistics,…)
LPUSH – push values to the beginning of the list
RPUSH – push values to the end of the list
LRANGE – get a range of values
LTRIM – trim a list to the specified range
LLEN – get the length of the list
Tit Petric / Twitter @titpetric
17. Redis data types: Lists
Collecting statistics
We can skip the database completely
Tit Petric / Twitter @titpetric
18. Redis data types: Lists
Process data
Into SQL
database
Tit Petric / Twitter @titpetric
19. Redis data types: Lists
Well, it’s a way to scale writes to SQL
Processing job can DIE for ages, because:
- Back of the envelope calculation for redis memory use:
100M keys use 16 GB ram
- Logs get processed in small chunks (200 items), avoiding
memory limits. Could increase this by a lot.
- We also use sharding so writes are distributed per $table
Tit Petric / Twitter @titpetric
20. Redis data types: Sets
Set values are UNIQUE
SADD – Add one or more members to a set
Perfect use case: set insersection with
SINTERSTORE, find duplicates.
MySQL is too slow for this, even with
indexes…
Tit Petric / Twitter @titpetric
21. Redis data types: Sets
SET Intersection in MySQL
List1 = first table of data
List2 = second table of data
Tit Petric / Twitter @titpetric
22. Redis data types: Sets
Bulk transfer MySQL data to redis
Via: http://dcw.ca/blog/2013/01/02/mysql-to-redis-in-one-step/
Tit Petric / Twitter @titpetric
23. Redis data types: Sets
SET Intersection in Redis
Much faster, without indexes!
0.118 seconds vs. mysql 1.35 (+0.36 for index)
15x speed increase!
Tit Petric / Twitter @titpetric
24. Redis data types: Sets
Other possible uses for sets:
• Common friends between A and B
• Friend suggestions (You might know…)
• People currently online …
Tit Petric / Twitter @titpetric
25. Redis data types: Sets vs. Sorted sets
Ok, typical use case in sql>
select title, content from news order by
stamp desc limit 0,10
#1) Use SORT from redis + HMGET
#2) Use sorted sets (ZSET type)
Tit Petric / Twitter @titpetric
26. Redis data types: Sorted sets
Sorted sets by time with a PK
auto_increment? NO!
• Most read news items (sort by views)
• Order comments by comment rating
• Friends by most friends in common
Tit Petric / Twitter @titpetric
27. Redis data types: Sorted sets
Order comments by rating
ZINCRBY – increase/decrease score of item
ZRANGE – return portion of sorted set, ASC
ZREVRANGE – portion of sorted set, DESC
Tit Petric / Twitter @titpetric
28. Redis data types: Sorted sets
Sort comments by rating! With pagination!
ZRANGE – return portion of sorted set, ASC
ZREVRANGE – portion of sorted set, DESC
Tit Petric / Twitter @titpetric
29. Scaling Redis deployment
SLAVEOF [host] [port]
Starts replicating from [host]:[port], making this instance a slave
SLAVEOF NO ONE
Promote instance to MASTER role
Tit Petric / Twitter @titpetric
30. Scaling Redis deployment
Phpredis client does not implement
sharding by itself! But …
- Master / Multi-slave scaling is easy to do
- Failover for reads is easy, node ejection possible
- Client deploys still take time – twemproxy is an option
- Twemproxy also provides sharding support, & Memcached
- Want to see what Redis is doing? Issue “MONITOR” command.
- Stale data is better than no data, we still consider Redis volatile
- FlushDB = rebuild cache, we tolerate data loss
Tit Petric / Twitter @titpetric
31. Redis: Q & A section
Questions and answers!
Follow me on Twitter: @titpetric
Read our tech blog: http://foreach.org
Tit Petric / Twitter @titpetric