This document discusses using work stealing techniques to perform background tasks in Redis. It begins by outlining some limitations of Redis, such as keys expiring before their TTL and the inability to expire individual hash fields. It then introduces the concept of work stealing, where idle worker processes have small tasks "stolen" from them to perform. Two examples are provided of how work stealing could be used to implement deterministic key expiration and background cleanup of expired hash fields in Redis. The key aspects are performing small, isolated work in Redis transactions and not retrying on failures to avoid overloading the system.
Choosing a shard key can be difficult, and the factors involved largely depend on your use case. In fact, there is no such thing as a perfect shard key; there are design tradeoffs inherent in every decision. This presentation goes through those tradeoffs, as well as the different types of shard keys available in MongoDB, such as hashed and compound shard keys
Redis - for duplicate detection on real time streamCodemotion
Roberto "frank" Franchini presenta a Codemotion Techmeetup Torino Redis, un data structure server che può utilizzare come chiavi stringhe, hashes, lists, sets, sorted sets, bitmaps e hyperloglogs
.
Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based ShardingMongoDB
In version 2.4, MongoDB introduces hash-based sharding, allowing the user to shard based on a randomized shard key to spread documents evenly across a cluster. Hash-based sharding is an alternative to range-based sharding, making it easier to manage your growing cluster. In this talk, we'll discuss provide an overview of this new feature and discuss the pros and cons of using a hash-based sharding vs. range-based approach.
A lot of data is best represented as time series: Operational data, financial data, and even in data warehouses the dominant dimension is often time. We present Chronix, a time series database based on Apache Solr and Spark which is able to handle trillions of time series data points and perform interactive queries. Chronix Spark is open source software and battle-proven at a German car manufacturer and an international telco.
We demonstrate several use cases of Chronix from real-life. Afterwards we lift the curtain and deep-dive into the Chronix architecture esp. how we're using Solr to store time series data and how we've hooked up Solr with Spark. We provide some benchmarks showing how Chronix has outperformed other time series databases in both performance and storage-efficiency.
Chronix is open source under the Apache License (http://chronix.io).
This presentation will discuss implementing external authentication when using Percona Server for MongoDB and MongoDB Enterprise. It will review authentication using OpenLDAP or ActiveDirectory and ActiveDirectory with Kerberos.
The presentation will also include examples of the configurations required by these external directory services. It will also review the LDAP Authorization features introduced in MongoDB Enterprise 3.4.
Choosing a shard key can be difficult, and the factors involved largely depend on your use case. In fact, there is no such thing as a perfect shard key; there are design tradeoffs inherent in every decision. This presentation goes through those tradeoffs, as well as the different types of shard keys available in MongoDB, such as hashed and compound shard keys
Redis - for duplicate detection on real time streamCodemotion
Roberto "frank" Franchini presenta a Codemotion Techmeetup Torino Redis, un data structure server che può utilizzare come chiavi stringhe, hashes, lists, sets, sorted sets, bitmaps e hyperloglogs
.
Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based ShardingMongoDB
In version 2.4, MongoDB introduces hash-based sharding, allowing the user to shard based on a randomized shard key to spread documents evenly across a cluster. Hash-based sharding is an alternative to range-based sharding, making it easier to manage your growing cluster. In this talk, we'll discuss provide an overview of this new feature and discuss the pros and cons of using a hash-based sharding vs. range-based approach.
A lot of data is best represented as time series: Operational data, financial data, and even in data warehouses the dominant dimension is often time. We present Chronix, a time series database based on Apache Solr and Spark which is able to handle trillions of time series data points and perform interactive queries. Chronix Spark is open source software and battle-proven at a German car manufacturer and an international telco.
We demonstrate several use cases of Chronix from real-life. Afterwards we lift the curtain and deep-dive into the Chronix architecture esp. how we're using Solr to store time series data and how we've hooked up Solr with Spark. We provide some benchmarks showing how Chronix has outperformed other time series databases in both performance and storage-efficiency.
Chronix is open source under the Apache License (http://chronix.io).
This presentation will discuss implementing external authentication when using Percona Server for MongoDB and MongoDB Enterprise. It will review authentication using OpenLDAP or ActiveDirectory and ActiveDirectory with Kerberos.
The presentation will also include examples of the configurations required by these external directory services. It will also review the LDAP Authorization features introduced in MongoDB Enterprise 3.4.
MongoDB: Comparing WiredTiger In-Memory Engine to RedisJason Terpko
This presentation will compare WiredTiger’s In-Memory Engine to Redis. We will review characteristics of each data store, how they are similar, and different. Understanding the similarities and differences will help you decide which data store is best suited for your key-value store needs.
A Fast and Efficient Time Series Storage Based on Apache SolrQAware GmbH
OSDC 2016, Berlin: Talk by Florian Lautenschlager (@flolaut, Senior Software Engineer at QAware)
Abstract: How to store billions of time series points and access them within a few milliseconds? Chronix! Chronix is a young but mature open source project that allows one for example to store about 15 GB (csv) of time series in 238 MB with average query times of 21 ms. Chronix is built on top of Apache Solr a bulletproof distributed NoSQL database with impressive search capabilities. In this code-intense session we show how Chronix achieves its efficiency in both respects by means of an ideal chunking, by selecting the best compression technique, by enhancing the stored data with (pre-computed) attributes, and by specialized query functions.
Speakers: Chris Larsen (Limelight Networks) and Benoit Sigoure (Arista Networks)
The OpenTSDB community continues to grow and with users looking to store massive amounts of time-series data in a scalable manner. In this talk, we will discuss a number of use cases and best practices around naming schemas and HBase configuration. We will also review OpenTSDB 2.0's new features, including the HTTP API, plugins, annotations, millisecond support, and metadata, as well as what's next in the roadmap.
Talk about add proxy user in Spark Task execution time given in Spark Summit East 2017 by Jorge López-Malla and Abel Ricon
full video:
https://www.youtube.com/watch?v=VaU1xC0Rixo&feature=youtu.be
A database trigger is a stored procedure that is executed when specific actions occur within a database. Triggers fit perfectly on a relational schema (foreign keys) and are implemented as a built-in functionality on popular relational database like MySQL.
MongoDB does not have any support for triggers, mainly due to the lack of support for foreign keys. Even if it usually considered an antipattern, there are use cases in MongoDB that benefit from a partially-relational schema. The lack of triggers is an obstacle for a partially-relational schema but there can be workarounds for simulating trigger behavior.
This presentation will guide you through different ways to implement triggers in MongoDB. We will cover the topics streams, tailable cursors, and hooks. We will demonstrate coding examples for each topic and we will explain pros and cons of each implementation.
Chris Larsen (Yahoo!)
This year we'll talk about the joys of the HBase Fuzzy Row Filter, new TSDB filters, expression support, Graphite functions and running OpenTSDB on top of Google’s hosted Bigtable. AsyncHBase now includes per-RPC timeouts, append support, Kerberos auth, and a beta implementation in Go.
Managing Data and Operation Distribution In MongoDBJason Terpko
In a sharded MongoDB cluster, scale and data distribution are defined by your shard keys. Even when choosing the correct shards key, ongoing maintenance and review can still be required to maintain optimal performance.
This presentation will review shard key selection and how the distribution of chunks can create scenarios where you may need to manually move, split, or merge chunks in your sharded cluster. Scenarios requiring these actions can exist with both optimal and sub-optimal shard keys. Example use cases will provide tips on selection of shard key, detecting an issue, reasons why you may encounter these scenarios, and specific steps you can take to rectify the issue.
MongoDB Basics. Talk at University of León, Spain. A whole description of MongoDB power, characteristics, capabilities and products. Updated to 3.2 version.
openTSDB - Metrics for a distributed worldOliver Hankeln
These are the slides for my talk at the IPC13/WTC13 in Munich on openTSDB. openTSDB ist the software that we at gutefrage.net use to store about 200 million data points in several thousand time series per day.
I will talk about how openTSDB stores the data to efficiently query them afterwards. Some cultural issues and some myths are also covered.
Another year, another talk about OpenTSDB running on HBase.
We'll discuss topics like:
Yahoo's append co-processor saving CPU resources by resolving atomic appends at compaction or query time.
The pros and cons of HBASE-15181, Date Tiered compaction for time series data.
Yahoo's experiments with unbounded secondary index on HBase.
OpenTSDB's 3.0 featuring a new query engine and API.
by Chris Larsen of Yahoo!
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...Hernan Costante
Nowadays in an increasingly more complex and dynamic network its not enough to be a regex ninja and storing only the logs you think you might need. From network traffic to custom logs you won't know which logs will be crucial to stop the next attacker, and if you are not planning to spend a half of your security budget in a commercial solution we will show you a way to building you own SIEM with open source. The talk will go from how to build a powerful logging environment for your organization to scaling on the cloud and storing everything forever. We will walk through how to build such a system with open source solutions as Elasticsearch and Hadoop, and creating your own custom monitoring rules to monitor everything you need. The talk will also include how to secure the environment and allow restricted access to other teams as well as avoiding common pitfalls and ensuring compliance standards.
Starting with v4, modules hold a promise for changing how Redis is used and developed for. Enabling custom data types and commands, Redis Modules build upon and extend the core functionality to handle any use case.
The video of the webinar given with these slides is at: https://youtu.be/EglSYFodaqw
MongoDB: Comparing WiredTiger In-Memory Engine to RedisJason Terpko
This presentation will compare WiredTiger’s In-Memory Engine to Redis. We will review characteristics of each data store, how they are similar, and different. Understanding the similarities and differences will help you decide which data store is best suited for your key-value store needs.
A Fast and Efficient Time Series Storage Based on Apache SolrQAware GmbH
OSDC 2016, Berlin: Talk by Florian Lautenschlager (@flolaut, Senior Software Engineer at QAware)
Abstract: How to store billions of time series points and access them within a few milliseconds? Chronix! Chronix is a young but mature open source project that allows one for example to store about 15 GB (csv) of time series in 238 MB with average query times of 21 ms. Chronix is built on top of Apache Solr a bulletproof distributed NoSQL database with impressive search capabilities. In this code-intense session we show how Chronix achieves its efficiency in both respects by means of an ideal chunking, by selecting the best compression technique, by enhancing the stored data with (pre-computed) attributes, and by specialized query functions.
Speakers: Chris Larsen (Limelight Networks) and Benoit Sigoure (Arista Networks)
The OpenTSDB community continues to grow and with users looking to store massive amounts of time-series data in a scalable manner. In this talk, we will discuss a number of use cases and best practices around naming schemas and HBase configuration. We will also review OpenTSDB 2.0's new features, including the HTTP API, plugins, annotations, millisecond support, and metadata, as well as what's next in the roadmap.
Talk about add proxy user in Spark Task execution time given in Spark Summit East 2017 by Jorge López-Malla and Abel Ricon
full video:
https://www.youtube.com/watch?v=VaU1xC0Rixo&feature=youtu.be
A database trigger is a stored procedure that is executed when specific actions occur within a database. Triggers fit perfectly on a relational schema (foreign keys) and are implemented as a built-in functionality on popular relational database like MySQL.
MongoDB does not have any support for triggers, mainly due to the lack of support for foreign keys. Even if it usually considered an antipattern, there are use cases in MongoDB that benefit from a partially-relational schema. The lack of triggers is an obstacle for a partially-relational schema but there can be workarounds for simulating trigger behavior.
This presentation will guide you through different ways to implement triggers in MongoDB. We will cover the topics streams, tailable cursors, and hooks. We will demonstrate coding examples for each topic and we will explain pros and cons of each implementation.
Chris Larsen (Yahoo!)
This year we'll talk about the joys of the HBase Fuzzy Row Filter, new TSDB filters, expression support, Graphite functions and running OpenTSDB on top of Google’s hosted Bigtable. AsyncHBase now includes per-RPC timeouts, append support, Kerberos auth, and a beta implementation in Go.
Managing Data and Operation Distribution In MongoDBJason Terpko
In a sharded MongoDB cluster, scale and data distribution are defined by your shard keys. Even when choosing the correct shards key, ongoing maintenance and review can still be required to maintain optimal performance.
This presentation will review shard key selection and how the distribution of chunks can create scenarios where you may need to manually move, split, or merge chunks in your sharded cluster. Scenarios requiring these actions can exist with both optimal and sub-optimal shard keys. Example use cases will provide tips on selection of shard key, detecting an issue, reasons why you may encounter these scenarios, and specific steps you can take to rectify the issue.
MongoDB Basics. Talk at University of León, Spain. A whole description of MongoDB power, characteristics, capabilities and products. Updated to 3.2 version.
openTSDB - Metrics for a distributed worldOliver Hankeln
These are the slides for my talk at the IPC13/WTC13 in Munich on openTSDB. openTSDB ist the software that we at gutefrage.net use to store about 200 million data points in several thousand time series per day.
I will talk about how openTSDB stores the data to efficiently query them afterwards. Some cultural issues and some myths are also covered.
Another year, another talk about OpenTSDB running on HBase.
We'll discuss topics like:
Yahoo's append co-processor saving CPU resources by resolving atomic appends at compaction or query time.
The pros and cons of HBASE-15181, Date Tiered compaction for time series data.
Yahoo's experiments with unbounded secondary index on HBase.
OpenTSDB's 3.0 featuring a new query engine and API.
by Chris Larsen of Yahoo!
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...Hernan Costante
Nowadays in an increasingly more complex and dynamic network its not enough to be a regex ninja and storing only the logs you think you might need. From network traffic to custom logs you won't know which logs will be crucial to stop the next attacker, and if you are not planning to spend a half of your security budget in a commercial solution we will show you a way to building you own SIEM with open source. The talk will go from how to build a powerful logging environment for your organization to scaling on the cloud and storing everything forever. We will walk through how to build such a system with open source solutions as Elasticsearch and Hadoop, and creating your own custom monitoring rules to monitor everything you need. The talk will also include how to secure the environment and allow restricted access to other teams as well as avoiding common pitfalls and ensuring compliance standards.
Starting with v4, modules hold a promise for changing how Redis is used and developed for. Enabling custom data types and commands, Redis Modules build upon and extend the core functionality to handle any use case.
The video of the webinar given with these slides is at: https://youtu.be/EglSYFodaqw
Developing a Redis Module - Hackathon KickoffItamar Haber
Slides deck for kicking off Redis Labs' Modules Hackathon - https://www.hackerearth.com/sprints/redislabs-hackathon-global
Video of the webinar is at: https://youtu.be/LPxx4QPyUPw
Redis Use Patterns (DevconTLV June 2014)Itamar Haber
An introduction to Redis for the SQL practitioner, covering data types and common use cases.
The video of this session can be found at: https://www.youtube.com/watch?v=8Unaug_vmFI
Speed up your Symfony2 application and build awesome features with RedisRicard Clau
Redis is an extremely fast data structure server that can be easily added to your existing stack and act like a Swiss army knife to help solve many problems that would be extremely difficult to workaround with the traditional RDBMS. In this session we will focus on what Redis is, how it works, what awesome features we can build with it and how we can use it with PHP and integrate it with Symfony2 applications making them blazing fast.
10 Ways to Scale Your Website Silicon Valley Code Camp 2019Dave Nielsen
Redis has 10 different data structures (String, Hash, List, Set, Sorted Set, Bit Array, Bit Field, Hyperloglog, Geospatial Index, Streams) plus Pub/Sub and many Redis Modules. In this talk, Dave will give 10 examples of how to use these data structures to scale your website. I will start with the basics, such as a cache and User session management. Then I demonstrate user generated tags, leaderboards and counting things with hyberloglog. I will with a demo of Redis Pub/Sub vs Redis Streams which can be used to scale your Microservices-based architecture.
An introduction and status update on Redis' upcoming new data structure - Stream - that is not unlike a log, has some Apache Kafka-like thingamagigs and can be also used for time series data
Get more than a cache back! The Microsoft Azure Redis Cache (NDC Oslo)Maarten Balliauw
Serving up content on the Internet is something our web sites do daily. But are we doing this in the fastest way possible? How are users in faraway countries experiencing our apps? Why do we have three webservers serving the same content over and over again? In this session, we’ll explore the Azure Content Delivery Network or CDN, a service which makes it easy to serve up blobs, videos and other content from servers close to our users. We’ll explore simple file serving as well as some more advanced, dynamic edge caching scenarios.
OSDC 2016 - Chronix - A fast and efficient time series storage based on Apach...NETWAYS
How to store billions of time series points and access them within a few milliseconds? Chronix!
Chronix is a young but mature open source project that allows one for example to store about 15 GB (csv) of time series in 238 MB with average query times of 21 ms. Chronix is built on top of Apache Solr a bulletproof distributed NoSQL database with impressive search capabilities. In this code-intense session we show how Chronix achieves its efficiency in both respects by means of an ideal chunking, by selecting the best compression technique, by enhancing the stored data with (pre-computed) attributes, and by specialized query functions.
DSD-INT 2017 The use of big data for dredging - De BoerDeltares
Presentation by Gerben de Boer (van Oord) at the Symposium Earth Observation and Data Science, during Delft Software Days - Edition 2017. Thursday, 2 November 2017, Delft.
Chronix Time Series Database - The New Time Series Kid on the BlockQAware GmbH
Apache Big Data Conference 2016, Vancouver BC: Talk by Florian Lautenschlager (@flolaut, Senior Software Engineer).
Abstract: There is a new open source time series database on the block that allows one to store billions of time series points and access them within a few milliseconds.
Chronix is a young but mature open source time series database that catches a compression rate of 98% compared to data in CSV files while an average query took 21 milliseconds. Chronix is built on top of Apache Solr, a bulletproof NoSQL database with impressive search capabilities. Chronix relies on Solr plugins and everyone who has a Solr running can create a new Chronix core within a few minutes.
In this presentation Florian shows how Chronix achieves its efficiency in both by means of an ideal chunking, by selecting the best compression technique, by enhancing the stored data with pre-computed attributes, and by specialized time series query functions.
AWS provides a wide set of services to manage your data, which allow our customers to choose the right tool to the right workload. Learn how to make your databases up to 10x faster and less expensive with Amazon ElastiCache for Redis and utilize DynamoDB Accelerator (DAX) to access your data on DynamoDB faster with no additional development efforts. If you need fast access to your data, these services might be the right services for your workload.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Work Stealing For Fun & Profit: Jim Nelson
1. PRESENTED BY
Work stealing for fun & profit:
Making Redis do things you thought it couldn't
do
Jim Nelson
Internet Archive, Cluster Operations Engineer
2. Internet Archive / archive.org
Universal Access to All Knowledge
● Nonprofit, founded 1996, based in San Francisco
● Archive of digital and physical media
● Includes Web, books, music, film, software & more
● One million visitors per day
● 22+ years of the Web archived for replay: 300 billion pages
● 30+ million media items
Over 45 petabytes of unique data stored on a
bespoke archival-centric computing cluster located
in San Francisco & Richmond
3. Internet Archive ♡ Redis
● IA metadata directory service backed by 10-node sharded Redis cluster
● Sharding performed client-side via consistent hashing (PHP, Predis)
● Each node supported by two replicated mirrors (fail-over)
● Specialized Redis instances also used throughout IA’s services, including
○ Wayback Machine
○ Search (item metadata and full-text)
○ Administrative & usage tracking
○ …and more
4. Before we get started
● I’m assuming you’ve written Redis client code already
● You don’t need to know the exact details of the Redis protocol…
● …but you need some understanding of how Redis “works”
○ single-threaded
○ different data types (strings, hash, sorted sets, etc.)
○ Basic commands (GET, SET, HGET, etc.)
○ transactions (MULTI/EXEC)
6. Redis wishlist
Expire keys only after their time-to-live passes
● Redis may evict volatile keys at any time when under memory pressure
(MAXMEMORY policy)
● EXPIRE marks the key as “volatile”
● If Redis runs out of memory, random volatile keys are freed
7. Redis wishlist
Expire keys only after their time-to-live passes
There are some situations where it’s not merely important,
but vital the key remains available until the time-to-live expires
(locks, expensive recomputes, leases, etc.)
8. Redis wishlist
● Our Redis cluster is constantly under memory pressure
● In heavy load, up to 1,000 keys / minute are evicted
● 30+ million media items means 30+ million content identifiers
● All of their metadata are competing for precious bytes on our Redis cluster
9. Redis wishlist
Deterministic expiration
Examples of keys with important TTLs:
● Distributed locks (Redlock, etc.)
● Leases / licenses
● Expensive recomputes held in cache for extended periods of time
For these applications we need deterministic expiration
For other keys, we’re happy to allow Redis to evict them as necessary
to make room for more frequently used values
10. Redis wishlist
Expire individual fields in a hash map
● In Redis, only keys can be set to expire
● The entire hash map is evicted when the key expires
● Would be nice if individual fields in the hash map had time-to-live
13. cron jobs, daemons, etc.
Use automated background tasks to monitor Redis
However:
● cron jobs may fail to run (or, worse, too many run at same time)
● daemons can be tricky to write
● single points of failure / lack of failover
● dedicated hardware to run background jobs
If you’ve ever had a machine go down over the weekend and your cleanup tasks
stopped running, you know what I’m talking about
15. Work stealing
Term & concept borrowed from multiprocessor/multithreaded computing
Basic idea:
● “Steal” spare cycles from worker pools (“mugging” or “recruiting”)
● Have recruited workers perform a small slice of work
● If unable to finish work, don’t sweat it—the next recruit will pick up where the
current one left off
That’s it—a very simple idea that turns out to be quite powerful.
What’s more, it works well with Redis.
16. Finding workers with spare cycles to steal
● Web server pools
● Task / job / priority queue workers
● Cache fetches (cache hits in particular)
17. Finding workers with spare cycles to steal
In general, look for
● A pool or group of workers constantly servicing requests…
● …where the worker code may have a “fast exit” (such as a
cache hit or a no-op or a “little work to do” signal)
18. Finding workers with spare cycles to steal
In our distributed cluster, we steal workers from a content directory service
The directory service is constantly accessed by our Web servers
If a page paint worker has a cache hit, it’s a candidate for recruiting
We only need to recruit ~ 0.3% of the candidate workers
to get all our background work completed
with workers to spare
19. Example #1: Volatile hash map fields
To go back to my wish for hash map with expiring fields, here’s how I might
implement it:
To store a value:
● In a Redis hash map, store fields + their values
● In a Redis sorted set, store the field names with an expiration time
20. Example #1: Volatile hash map fields
Set a value (“HSETEX”):
; start transaction
MULTI
; set hash field
HSET hash:user:1 <field-name> <value>
; add field name to sorted set (sorted by expiration)
ZADD zset:user:1 <time_t> <field-name>
; commit transaction
EXEC
21. Example #1: Volatile hash map fields
To read a value:
● Reading value from hash map and read the expiration time from the sorted set
● If the field has expired, return to the caller NULL, Undefined, etc.
22. Example #1: Volatile hash map fields
Get a value:
; start transaction
MULTI
; get hash field
<value> = HGET hash:user:1 <field-name>
; get field’s expiration time
<expiration> = ZSCORE zset:user:1 <field-name>
; commit transaction
EXEC
23. Example #1: Volatile hash map fields
When are the expired fields deleted?
Think about those “30+ million content identifiers” mentioned earlier—
If a user accesses a piece of content once and never asks for it again…
… the client code never has a chance to delete expired hash fields.
It will remain stored in Redis permanently,
wasting memory that could be used for “active” records.
24. Example #1: Volatile hash map fields
field value
expiration
content-id hey-diddle-diddle TTL: 12 hours
likes 42 TTL: 5 minutes
related-content cat,fiddle,dish,spoon TTL: 1 hour
25. Example #1: Volatile hash map fields
To expire those fields in the background:
● A worker is “recruited” from the Web pool when it gets a cache hit
● The recruit randomly selects a single hash map
● It reads the sorted set to determine which fields have expired
● It deletes from the hash map those fields
26. Example #1: Volatile hash map fields
Work stealing allows for garbage collection
to occur “in the background” without
creating specialized scripts or bots
that may fail or not run often enough
to keep up with demand
27. Example #1: Volatile hash map fields
hash map (hash:user:1)
sorted set (zset:user:1)
content-id hey-diddle-diddle
likes 42
related spoon,dish,fork
last-referrer /details/spoon
related 9:05am
likes 10:00am
last-referrer 9:00pm
28. Example #1: Volatile hash map fields
Background expiration (work stealing):
WATCH hash:user:1 zset:user:1
; read five oldest hash fields which have expired
<expired-fields> = ZRANGEBYSCORE zset:user:1 -inf time() LIMIT 0 5
; start transaction
MULTI
; delete expired hash fields
HDEL hash:user:1 <expired-fields>
; remove fields from sorted set of expiration times
ZREM zset:user:1 <expired-fields>
; commit transaction or abort if hash:user:1 or zset:user:1 modified
EXEC
29. Things to notice
WATCH is used to prevent modifying the hash map or sorted set
if another writer modifies the tables
WATCH hash:user:1 zset:user:1
WATCH will cause the EXEC command to abort
if either of the above keys have been modified
30. Work stealing: things to notice
Any failure or outlying condition is ignored:
● hash map is empty
● no fields have expired
● failure to connect to server
● network error
● WATCH triggers an abort
● fail to acquire a distributed lock
The whole idea with work stealing is to “steal” slivers of spare cycles from workers
If there’s an error, don’t retry—give up and let the next recruit make an attempt
31. Work stealing: things to notice
Only a small amount of work is done in any stolen cycle:
ZRANGEBYSCORE zset:user:1 -inf time() LIMIT 0 5
Although this could be coded to fetch all expired fields, instead only the five (5)
oldest fields are retrieved
A large number of workers,
each performing a small amount of work,
means a lot can be done in aggregate
32.
33. Example #2: Task queue follower
At Internet Archive, we have a custom task queuing system
that performs all content transformations and mutations
Some subsystems need to know quickly
if a particular content object has work queued or running
(We have 30+ million content identifiers,
but only a small percentage of them
are being worked on at any one time)
34. Example #2: Task queue follower
We use Redis to track when a content identifier has queued tasks
with two work stealing jobs
The first adds content identifiers by walking the database table and
adding identifiers to a Redis set
(Again, each Redis worker is only responsible for adding a small
number of tasks to Redis: 50 rows at most)
35. Example #2: Task queue follower
The other work stealing job is to remove content identifiers when
they no longer have work queued
Randomly samples a small number of content identifiers in the Redis
set and queries the database to see if the identifier still has any tasks
queued
Any content identifiers with no queued work is removed from the
Redis set
36. Example #2: Task queue follower
WATCH set:tasks:queued
; sample 5 content identifiers
<ids> = SRANDMEMBER set:tasks:queued 5
<quiesced> = <database query asking which of hose 5 have no pending tasks>
; start transaction
MULTI
; remove all with no pending work
SREM set:tasks:queued <quiesced>
; commit transaction or abort if set:tasks:queued was modified
EXEC
37. Writing work stealing routines for Redis
● Use WATCH and transactions to abort when other writers modify
tables
● Random sampling is your friend
● Log failures and outlying conditions…
● …but don’t retry, just exit and let the next worker make an attempt
● Don’t retry locks either, let the next worker make the attempt
● Each recruited worker performs only a small slice of work
39. Recruiting workers (“mugging”) for work stealing
The approach I like (and has worked well for us):
#1: Workers are “enlisted” if they fast-exit in a particular bit of work:
● get a cache hit
● paint a simple Web page
● don’t need to query database
● etc.
40. Recruiting workers (“mugging”) for work stealing
The approach I like (and has worked well for us):
#2: Each work stealing job declares a rate—a percentage of enlisted
workers it needs for its work
● For example, 0.001 to receive 0.1% of the enlisted workers
● If you have more traffic, use a lower job rate
41. Recruiting workers (“mugging”) for work stealing
The approach I like (and has worked well for us):
#3: Each recruited worker generates a random number
● If that number if below the job’s rate, the enlistee is “recruited” and performs a
little bit of work
● Otherwise, the enlistee is dismissed—no extra work needs to be done at this
time
42. Recruiting workers (pseudo-code)
method enlist() {
rand = Random::float(); // from 0.0 to 1.0
target = 0.0;
foreach (self.registered_jobs as job) {
target += job.rate;
if (rand <= target)
return job.perform();
}
}
43. Recruiting workers
It’s deceptively simple
It works surprisingly well
It requires no extra state (tracking the number of workers,
distributed locks, job queues, etc.)
We’re doing this in production right now
44. Recruiting workers
One caveat:
Instrument your code!
(StatsD, Prometheus, etc.)
It’s important to know
● How many workers are selected for background work
● How many finish their work, how many stop due to errors/aborts
● Fine-tune your job rates (percentages) so enough workers are being recruited
45. “Show me the code.”
https://github.com/internetarchive/work-stealing