Streaming API clients perform many of the same operations as our Streaming API servers. We'll discuss our streaming server's internal architecture, stream processing algorithms and how they relate to a typical client implementation. The focus will be on techniques for sorting and de-duplicating infinite roughly-sorted at-least-once delivery streams, data loss prevention, scaling for the Firehose, and practical operational experience.
This presentation describes the challenges we faced building, scaling and operating a Kubernetes cluster of more than 1000 nodes to host the Datadog applications
Running large Kubernetes clusters is challenging. This talk focus on how you can optimize your network setup in clusters with 1000-2000 nodes. It discusses standard ingresses solutions and their drawbacks as well as potential solutions
ZooKeeper - wait free protocol for coordinating processesJulia Proskurnia
ZooKeeper is a service for coordinating processes within distributed systems. Stress test of the tool was applied. Reliable Multicast and Dynamic LogBack system Configuration management were implemented with ZooKeeper.
More details: http://proskurnia.in.ua/wiki/zookeeper_research
Docker is all the rage these days. While one doesn't hear much about Solr on Docker, we're here to tell you not only that it can be done, but also share how it's done.
We'll quickly go over the basic Docker ideas - containers are lighter than VMs, they solve "but it worked on my laptop" issues - so we can dive into the specifics of running Solr on Docker.
We'll do a live demo showing you how to run Solr master - slave as well as SolrCloud using containers, how to manage CPU assignments, constraint memory and use Docker data volumes when running Solr in containers. We will also show you how to create your own containers with custom configurations.
Finally, we'll address one of the core Solr questions - which deployment type should I use? We will demonstrate performance differences between the following deployment types:
- Single Solr instance running on a bare metal machine
- Multiple Solr instances running on a single bare metal machine
- Solr running in containers
- Solr running on virtual machine
- Solr running on virtual machine using unikernel
For each deployment type we'll address how it impacts performance, operational flexibility and all other key pros and cons you ought to keep in mind.
My personal highlights from the Reactive Summit 2017. I loved the conference from the beginning till the end and I shared some of that with my Reactive Amsterdam meetup. All content belongs to the respective speakers.
This presentation describes the challenges we faced building, scaling and operating a Kubernetes cluster of more than 1000 nodes to host the Datadog applications
Running large Kubernetes clusters is challenging. This talk focus on how you can optimize your network setup in clusters with 1000-2000 nodes. It discusses standard ingresses solutions and their drawbacks as well as potential solutions
ZooKeeper - wait free protocol for coordinating processesJulia Proskurnia
ZooKeeper is a service for coordinating processes within distributed systems. Stress test of the tool was applied. Reliable Multicast and Dynamic LogBack system Configuration management were implemented with ZooKeeper.
More details: http://proskurnia.in.ua/wiki/zookeeper_research
Docker is all the rage these days. While one doesn't hear much about Solr on Docker, we're here to tell you not only that it can be done, but also share how it's done.
We'll quickly go over the basic Docker ideas - containers are lighter than VMs, they solve "but it worked on my laptop" issues - so we can dive into the specifics of running Solr on Docker.
We'll do a live demo showing you how to run Solr master - slave as well as SolrCloud using containers, how to manage CPU assignments, constraint memory and use Docker data volumes when running Solr in containers. We will also show you how to create your own containers with custom configurations.
Finally, we'll address one of the core Solr questions - which deployment type should I use? We will demonstrate performance differences between the following deployment types:
- Single Solr instance running on a bare metal machine
- Multiple Solr instances running on a single bare metal machine
- Solr running in containers
- Solr running on virtual machine
- Solr running on virtual machine using unikernel
For each deployment type we'll address how it impacts performance, operational flexibility and all other key pros and cons you ought to keep in mind.
My personal highlights from the Reactive Summit 2017. I loved the conference from the beginning till the end and I shared some of that with my Reactive Amsterdam meetup. All content belongs to the respective speakers.
Running Kubernetes at scale is challenging and you can often end up in situations where you have to debug complex and unexpected issues. This requires understanding in detail how the different components work and interact with each other. Over the last 3 years, Datadog migrated most of its workloads to Kubernetes and now manages dozens of clusters consisting of thousands of nodes each. During this journey, engineers have debugged complex issues with root causes that were sometimes very surprising. In this talk Laurent and Tabitha will share some of these stories, including a favorite: how a complex interaction between familiar Kubernetes components allowed an OOM-killer invocation to trigger the deletion of a namespace.
Kube-proxy enables access to Kubernetes services (virtual IPs backed by pods) by configuring client-side load-balancing on nodes. The first implementation relied on a userspace proxy which was not very performant. The second implementation used iptables and is still the one used in most Kubernetes clusters. Recently, the community introduced an alternative based on IPVS. This talk will start with a description of the different modes and how they work. It will then focus on the IPVS implementation, the improvements it brings, the issues we encountered and how we fixed them as well as the remaining challenges and how they could be addressed. Finally, the talk will present alternative solutions based on eBPF such as Cilium.
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you! ...Laurent Bernaille
Kubernetes is a very powerful and complicated system, and many users don’t understand the underlying systems. Come learn how your users can abuse container runtimes, overwhelm your control plane, and cause outages - it’s actually quite easy!
In the last year, we have containerized hundreds of applications and deployed them in large scale clusters (more than 1000 nodes). The journey was eventful and we learned a lot along the way. We’ll share stories of our ten favorite Kubernetes foot guns, including the dangers of cargo culting, rolling updates gone wrong, the pitfalls of initContainers, and nightmarish daemonset upgrades. The talk will present solutions we adopted to avoid or work around some these problems and will finally show several improvements we plan deploy in the future.
Similar to the Kubecon talk with the same title with a few new incidents.
The Kubernetes audit logs are a rich source of information: all of the calls made to the API server are stored, along with additional metadata such as usernames, timings, and source IPs. They help to answer questions such as “What is overloading my control plane?” or “Which sequence of events led to this problematic situation?”. These questions are hard to answer otherwise—especially in large clusters. At Datadog, we have been running clusters with 1000+ nodes for more than a year and during that time, the audit logs have proved invaluable.
In this presentation, we will first introduce the audit logs, explain how they are configured, and review the type of data they store. Finally, we will describe in detail several scenarios where they have helped us to diagnose complex problems.
Docker and Maestro for fun, development and profitMaxime Petazzoni
Presentation on MaestroNG, an orchestration and management tool for multi-host container deployments with Docker.
#lspe meetup, February 20th, 2014 at Yahoo!'s URL café.
SaltConf14 - Anita Kuno, HP & OpenStack - Using SaltStack for event-driven or...SaltStack
This talk will highlight how the OpenStack Infrastructure team uses SaltStack for event-driven orchestration of its various cloud infrastructure components. The speakers will review the flexibility of Salt in a complex automation environment. Salt plays very well with other tools, including Puppet, which is especially critical in the OpenStack Infrastructure environment which requires the event-driven orchestration functions of Salt to synchronize workflow timing of OpenStack Infrastructure components and events.
To learn when and where the next SaltConf will be, subscribe to our newsletter here: http://www.saltstack.com/salt-ink-newsletter or follow us on Twitter: http://www.twitter.com/saltstackinc
DNS is one of the Kubernetes core systems and can quickly become a source of issues when you’re running clusters at scale. For over a year at Datadog, we’ve run Kubernetes clusters with thousands of nodes that host workloads generating tens of thousands of DNS queries per second. It wasn’t easy to build an architecture able to handle this load, and we’ve had our share of problems along the way.
This talk starts with a presentation of how Kubernetes DNS works. It then dives into the challenges we’ve faced, which span a variety of topics related to load, connection tracking, upstream servers, rolling updates, resolver implementations, and performance. We then show how our DNS architecture evolved over time to address or mitigate these problems. Finally, we share our solutions for detecting these problems before they happen—and identifying misbehaving clients.
Scaling an invoicing SaaS from zero to over 350k customersSpeck&Tech
ABSTRACT: Fatture in Cloud was born in late 2013 on a single-server machine and scaled from zero to 35k customers at the end of 2018. Then, we faced the mandatory electronic invoicing which came into effect in Italy on 1st January 2019, and we experienced a huge growth to 350k customers in few months. In these 5 years, I've learned a lot about cloud architecture, scalability, optimization, DevOps, and we eventually achieved a 99,99% uptime even in the huge growth period.
BIO: Daniele Ratti is the Founder and CEO of Fatture in Cloud, which is currently the leader invoicing platform in Italy, counting more than 350k customers.
Integration testing for salt states using aws ec2 container serviceSaltStack
A SaltConf16 use case talk by Steven Braverman of Dun & Bradstreet. Testing configuration changes for multiple server roles can be time consuming when real instances or legacy container systems are used. Applying configuration changes to each role in parallel can be difficult. So what's the best way to test configuration changes efficiently, quickly, and securely prior to applying them? See how an integrated test setup using AWS EC2 Container Service (ECS), AWS AutoScaling Group, and SaltStack simplifies the application of configuration changes and allows you to test configuration changes in parallel to reduce the time spent testing.
Ara Pulido, Datadog -
Container technologies, although not new, have increased their popularity in the past few years, with container orchestrators allowing companies around the world to adopt these technologies to help them ship and scale microservices with precision and velocity. Kubernetes is currently the most popular container orchestration platform, and while many organizations are migrating their workloads to it, Kubernetes is still relatively immature. New corner cases, errors, and quirks are regularly discovered as users push the boundaries of size and scale. When Datadog adopted Kubernetes we discovered some of these boundaries the hard way, and we continuously challenge and modify our infrastructure decisions in order to fit our use case. Join me in this talk for our story on what we learned while we scaled our Kubernetes clusters, the contributions to Kubernetes we made along the way, and how you can apply those learnings when growing your Kubernetes clusters from a handful to hundreds or thousands of nodes.
In the big data world, our data stores communicate over an asynchronous, unreliable network to provide a facade of consistency. However, to really understand the guarantees of these systems, we must understand the realities of networks and test our data stores against them.
Jepsen is a tool which simulates network partitions in data stores and helps us understand the guarantees of our systems and its failure modes. In this talk, I will help you understand why you should care about network partitions and how can we test datastores against partitions using Jepsen. I will explain what Jepsen is and how it works and the kind of tests it lets you create. We will try to understand the subtleties of distributed consensus, the CAP theorem and demonstrate how different data stores such as MongoDB, Cassandra, Elastic and Solr behave under network partitions. Finally, I will describe the results of the tests I wrote using Jepsen for Apache Solr and discuss the kinds of rare failures which were found by this excellent tool.
Running Kubernetes at scale is challenging and you can often end up in situations where you have to debug complex and unexpected issues. This requires understanding in detail how the different components work and interact with each other. Over the last 3 years, Datadog migrated most of its workloads to Kubernetes and now manages dozens of clusters consisting of thousands of nodes each. During this journey, engineers have debugged complex issues with root causes that were sometimes very surprising. In this talk Laurent and Tabitha will share some of these stories, including a favorite: how a complex interaction between familiar Kubernetes components allowed an OOM-killer invocation to trigger the deletion of a namespace.
Kube-proxy enables access to Kubernetes services (virtual IPs backed by pods) by configuring client-side load-balancing on nodes. The first implementation relied on a userspace proxy which was not very performant. The second implementation used iptables and is still the one used in most Kubernetes clusters. Recently, the community introduced an alternative based on IPVS. This talk will start with a description of the different modes and how they work. It will then focus on the IPVS implementation, the improvements it brings, the issues we encountered and how we fixed them as well as the remaining challenges and how they could be addressed. Finally, the talk will present alternative solutions based on eBPF such as Cilium.
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you! ...Laurent Bernaille
Kubernetes is a very powerful and complicated system, and many users don’t understand the underlying systems. Come learn how your users can abuse container runtimes, overwhelm your control plane, and cause outages - it’s actually quite easy!
In the last year, we have containerized hundreds of applications and deployed them in large scale clusters (more than 1000 nodes). The journey was eventful and we learned a lot along the way. We’ll share stories of our ten favorite Kubernetes foot guns, including the dangers of cargo culting, rolling updates gone wrong, the pitfalls of initContainers, and nightmarish daemonset upgrades. The talk will present solutions we adopted to avoid or work around some these problems and will finally show several improvements we plan deploy in the future.
Similar to the Kubecon talk with the same title with a few new incidents.
The Kubernetes audit logs are a rich source of information: all of the calls made to the API server are stored, along with additional metadata such as usernames, timings, and source IPs. They help to answer questions such as “What is overloading my control plane?” or “Which sequence of events led to this problematic situation?”. These questions are hard to answer otherwise—especially in large clusters. At Datadog, we have been running clusters with 1000+ nodes for more than a year and during that time, the audit logs have proved invaluable.
In this presentation, we will first introduce the audit logs, explain how they are configured, and review the type of data they store. Finally, we will describe in detail several scenarios where they have helped us to diagnose complex problems.
Docker and Maestro for fun, development and profitMaxime Petazzoni
Presentation on MaestroNG, an orchestration and management tool for multi-host container deployments with Docker.
#lspe meetup, February 20th, 2014 at Yahoo!'s URL café.
SaltConf14 - Anita Kuno, HP & OpenStack - Using SaltStack for event-driven or...SaltStack
This talk will highlight how the OpenStack Infrastructure team uses SaltStack for event-driven orchestration of its various cloud infrastructure components. The speakers will review the flexibility of Salt in a complex automation environment. Salt plays very well with other tools, including Puppet, which is especially critical in the OpenStack Infrastructure environment which requires the event-driven orchestration functions of Salt to synchronize workflow timing of OpenStack Infrastructure components and events.
To learn when and where the next SaltConf will be, subscribe to our newsletter here: http://www.saltstack.com/salt-ink-newsletter or follow us on Twitter: http://www.twitter.com/saltstackinc
DNS is one of the Kubernetes core systems and can quickly become a source of issues when you’re running clusters at scale. For over a year at Datadog, we’ve run Kubernetes clusters with thousands of nodes that host workloads generating tens of thousands of DNS queries per second. It wasn’t easy to build an architecture able to handle this load, and we’ve had our share of problems along the way.
This talk starts with a presentation of how Kubernetes DNS works. It then dives into the challenges we’ve faced, which span a variety of topics related to load, connection tracking, upstream servers, rolling updates, resolver implementations, and performance. We then show how our DNS architecture evolved over time to address or mitigate these problems. Finally, we share our solutions for detecting these problems before they happen—and identifying misbehaving clients.
Scaling an invoicing SaaS from zero to over 350k customersSpeck&Tech
ABSTRACT: Fatture in Cloud was born in late 2013 on a single-server machine and scaled from zero to 35k customers at the end of 2018. Then, we faced the mandatory electronic invoicing which came into effect in Italy on 1st January 2019, and we experienced a huge growth to 350k customers in few months. In these 5 years, I've learned a lot about cloud architecture, scalability, optimization, DevOps, and we eventually achieved a 99,99% uptime even in the huge growth period.
BIO: Daniele Ratti is the Founder and CEO of Fatture in Cloud, which is currently the leader invoicing platform in Italy, counting more than 350k customers.
Integration testing for salt states using aws ec2 container serviceSaltStack
A SaltConf16 use case talk by Steven Braverman of Dun & Bradstreet. Testing configuration changes for multiple server roles can be time consuming when real instances or legacy container systems are used. Applying configuration changes to each role in parallel can be difficult. So what's the best way to test configuration changes efficiently, quickly, and securely prior to applying them? See how an integrated test setup using AWS EC2 Container Service (ECS), AWS AutoScaling Group, and SaltStack simplifies the application of configuration changes and allows you to test configuration changes in parallel to reduce the time spent testing.
Ara Pulido, Datadog -
Container technologies, although not new, have increased their popularity in the past few years, with container orchestrators allowing companies around the world to adopt these technologies to help them ship and scale microservices with precision and velocity. Kubernetes is currently the most popular container orchestration platform, and while many organizations are migrating their workloads to it, Kubernetes is still relatively immature. New corner cases, errors, and quirks are regularly discovered as users push the boundaries of size and scale. When Datadog adopted Kubernetes we discovered some of these boundaries the hard way, and we continuously challenge and modify our infrastructure decisions in order to fit our use case. Join me in this talk for our story on what we learned while we scaled our Kubernetes clusters, the contributions to Kubernetes we made along the way, and how you can apply those learnings when growing your Kubernetes clusters from a handful to hundreds or thousands of nodes.
In the big data world, our data stores communicate over an asynchronous, unreliable network to provide a facade of consistency. However, to really understand the guarantees of these systems, we must understand the realities of networks and test our data stores against them.
Jepsen is a tool which simulates network partitions in data stores and helps us understand the guarantees of our systems and its failure modes. In this talk, I will help you understand why you should care about network partitions and how can we test datastores against partitions using Jepsen. I will explain what Jepsen is and how it works and the kind of tests it lets you create. We will try to understand the subtleties of distributed consensus, the CAP theorem and demonstrate how different data stores such as MongoDB, Cassandra, Elastic and Solr behave under network partitions. Finally, I will describe the results of the tests I wrote using Jepsen for Apache Solr and discuss the kinds of rare failures which were found by this excellent tool.
Abstract:
Cassandra is a new kind of database: it is more than a single-machine system. It naturally runs in a High-Availability configuration. All nodes in the system are symmetric; there is no single point of failure. As you add machines, failure becomes routine, and Cassandra is built to tolerate that with no interruptions.
Cassandra is linearly scalable with good performance characteristics for very small and very large data stores. Unlike earlier efforts, Cassandra is more than just a key-value store; it is a structured data store which can facilitate complex use cases and queries. Cassandra allows for random access to your data organized into rows and columns.
Cassandra is different, and exciting. This presentation will discuss the pros and cons of using Cassandra, and why it has seen such amazing adoption in the past year.
Bio:
Ben Coverston is Director of Operations at DataStax (formerly knows as Riptano), a provider of software, support, services, training, resources and help for Cassandra. He has been involved in enterprise software his entire career. Working in the airline industry, he helped to build some of the highest volume online booking sites in the world. He saw first hand the consequences of trying to solve real world scalability problems at the limit of what traditional relational databases are capable of.
Generators, Coroutines and Other Brain Unrolling Sweetness. Adi Shavit ➠ Cor...corehard_by
C++20 brings us coroutines and with them the power to create generators, iterables and ranges. We'll see how coroutines allow for cleaner, more readable, code, easier abstraction and genericity, composition and avoiding callbacks and inversion of control. We'll discuss the pains of writing iterator types with distributed internal state and old-school co-routines. Then we'll look at C++20 coroutines and how easy they are to write clean linear code. Coroutines prevent inversion of control and reduce callback hell. We'll see how they compose and play with Ranges with examples from math, filtering, rasterization. The talk will focus more on co_yield and less on co_await and async related usages.
Как сделать высоконагруженный сервис, не зная количество нагрузки / Олег Обле...Ontico
Существует множество архитектур и способов масштабирования систем. Сегодня многие компании мигрируют в облачные сервисы или используют контейнеры. Но действительно ли это так необходимо и нужно ли следовать трендам?
В данном докладе мне бы хотелось рассказать об архитектуре, которую я спланировал и внедрил в компании InnoGames. Архитектура, не требующая вмешательства администратора в случае лавинообразного увеличения нагрузки и, что ещё более важно, умеющая редуцироваться в случае отсутствия её для экономии затрат.
Вы узнаете об опыте создания сервиса с очень непростыми критериями и поймёте, что не обязательно платить в 3 раза дороже за AWS или любую подобную систему.
- Что такое CRM. Зачем нам этот сервис.
- Инфраструктура.
-- Graphite. Почему он должен быть надежным и быстрым.
-- Puppet + gitlab.
-- Балансировка нагрузки.
-- Наше облако. Зачем нам openstack, когда есть serveradmin!? Как роль сервера определяется несколькими атрибутами в веб-интерфейсе.
-- Nagios + аггрегаторы. Другой взгляд на то, как мониторить сервисы через Graphite.
-- Мониторинг кластеров. Clusterhc и Grafsy.
-- Brassmonkey. Как мы написали своего сисадмина на python.
-- Бэкапы.
- Архитектура CRM3.
- Autoscaling или как проанализировать кучу данных и принять решения.
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...confluent
In the financial industry, losing data is unacceptable. Financial firms are adopting Kafka for their critical applications. Kafka provides the low latency, high throughput, high availability, and scale that these applications require. But can it also provide complete reliability? As a system architect, when asked “Can you guarantee that we will always get every transaction,” you want to be able to say “Yes” with total confidence.
In this session, we will go over everything that happens to a message – from producer to consumer, and pinpoint all the places where data can be lost – if you are not careful. You will learn how developers and operation teams can work together to build a bulletproof data pipeline with Kafka. And if you need proof that you built a reliable system – we’ll show you how you can build the system to prove this too.
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...ScyllaDB
Outbrain is the world's largest content discovery program. Learn about their use case with Scylla where they lowered latency while doing 20X IOPS of Cassandra.
"It's important that even under load, Apache Kafka ensures user topics are fully replicated in synch.
Replication is essential to endure resilience to data loss, so both users and operators care about it.
If a topic partition falls out of the ISR (In-Synch-replicas) set, a user experiences unavailability (when producing with the default acknowledgment setting).
Users may use non-default acks mode to work around it, but the effect on a Kafka cluster is to make the under-replication worse.
Even simple Under replication with no Under Min Isr is to be avoided as a cluster update may cause the dreaded Under Min ISR.
There are a number of settings that can be used, from quotas to number of replication threads to more low-level settings.
This session wants to show how we successfully measured and evolved our Kafkas configuration, with the goal of giving the best possible user experience (and resilience to their data).
Hofstadter's Law applied!
""It always takes longer than you expect, even when you take into account Hofstadter's Law."""
Slides from the presentation "Modern Cryptography" delivered at Deovxx UK 2013. See Parleys.com for the full video https://www.parleys.com/speaker/5148920c0364bc17fc5697a5
We designed a new framework, made for Microservices. Making it easier for developers to build microservices-based systems – systems that communicate asynchronously, self-heal, scale elastically and remain responsive no matter what bad stuff is happening.
And all this without the pain of selecting and mixing components, from a plethora of libraries that were originally built for other things.
In this presentation, we reveal this new way for Java developers to not only understand and begin building microservices, but also to seamlessly push them into staging and production
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2kvXlPd
This CloudxLab Introduction to Apache ZooKeeper tutorial helps you to understand ZooKeeper in detail. Below are the topics covered in this tutorial:
1) Data Model
2) Znode Types
3) Persistent Znode
4) Sequential Znode
5) Architecture
6) Election & Majority Demo
7) Why Do We Need Majority?
8) Guarantees - Sequential consistency, Atomicity, Single system image, Durability, Timeliness
9) ZooKeeper APIs
10) Watches & Triggers
11) ACLs - Access Control Lists
12) Usecases
13) When Not to Use ZooKeeper
Writing concurrent program is hard; maintaining concurrent program even is a nightmare. Actually, a pattern which helps us to write good concurrent code is available, that is, using “channels” to communicate.
This talk will share the channel concept with common libraries, like threading and multiprocessing, to make concurrent code elegant.
It's the talk at PyCon TW 2017 [1] and PyCon APAC/MY 2017 [2].
[1]: https://tw.pycon.org/2017
[2]: https://pycon.my/pycon-apac-2017-program-schedule/
Similar to Thinking in Streaming - Twitter Streaming API (20)
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
3. Turtles All The Way Down
• Your client ≅ Our server
• Gather Events
• Parse JSON
• Match on Predicates
• Route to Consumers
4. Properties
• Offered
• At Least Once
• Roughly Sorted (K-Sorted)
• Desired
• Exactly Once
• Sorted
5. Plan
• Over-deliver
Ensure At Least Once
• De-duplicate
Unordered Exactly Once
• Sort
Ordered Exactly Once
6. Why At Least Once?
• Exactly Once impractical across streams
• Clients must handle reconnect over-delivery
• Reuse this capability
• Mask upstream failures
• Relax server restart issues
7. Why At Least Once?
• Exactly Once impractical across streams
• Clients must handle reconnect over-delivery
• Reuse this capability
• Mask upstream failures
• Relax server restart issues
8. Why At Least Once?
• Exactly Once impractical across streams
• Clients must handle reconnect over-delivery
• Reuse this capability to
• Mask upstream failures
• Relax server restart issues
9.
10. Startup
• Prefetch from peer to populate circular buffer
• Go multi-user
• Consume Kestrel backlog - duplicates between:
• Buffer and backlog
• Previous connection and backlog
• Steady State: Exactly Once Delivery
11. Startup
• Prefetch from peer to populate circular buffer
• Go multi-user
• Consume Kestrel backlog - duplicates between:
• Buffer and backlog
• Previous connection and backlog
• Steady State: Exactly Once Delivery
12. Upstream Failure
• Cascaded source fails
• Fail over to next peer
• Over-request to avoid loss
• Steady State: Exactly Once Delivery
13. Client Over-delivery
• Use Count Parameter after fast reconnect
• Deep backfill from REST API
• Client offline offline for a while
• User first issues new query
• Overlap connections slightly
15. Infinite Streams
• De-duplicating a randomly ordered infinite
stream requires infinite time and storage
• Sorting? Ditto
• I have neither infinite time nor storage
16. Roughly Sorted
• A sequence α is k-sorted IFF ∀ i, r, 1 ≤ i ≤ r ≤
n, i ≤ r - k implies aᵢ ≤ aᵣ
• Strictly sorted is 0-sorted.
• Transpose two adjacent values in a 0-sorted
sequence, becomes 1-sorted.
• K For the firehose?
19. Pessimist’s K
• In theory could be hours & millions of events
• Practically, if current and stale queues exist:
• We’ll flush the stale queues before exposing
• You’ll never know this happened
• If all queues stale:
• We’ll deliver the backlog
• K remains reasonable
20. Unordered
De-duplication
• Create two HashSets: Primary, Secondary,
each preallocated to size K
• New event is duplicate if ID exists in Primary
• Add new ID to both HashSets
• When Primary.size > K / 2
Primary.clear
Swap Primary & Secondary
21. Unordered
De-duplication
• Bounded memory consumption
• O(n) behavior
• Low latency
• Emit first tweet
• Discard subsequent duplicates
• Cheaper than de-duplication by sorting?
Probably depends on K
22. Ordered & De-duplicated
• Insertion sort and de-duplicate by ID into a
decreasing order list
• While length > K, remove sorted tail
23. Ordered & De-duplicated
• O(n) --- O(n * K)
• Bounded memory consumption
• Induces latency of K
• Assumes average items not very unsorted
• K is usually large to handle the outliers
24. Routing Events
• By Keyword or by UserId
• Add predicates to HashMap
• Apply events to Map
• Query holds private predicate set for later Map
removal
• O(n)
26. Monitoring
• What to look at?
• Latency
• Throughput
• Errors
• Alerting
27. Horizontal Scale
• Firehose keeps Growing.
• Eventually Firehose stream will become
impractical.
• Partition the Firehose into N streams.
Editor's Notes
There is a lot of symmetry in what the Streaming API servers do and what your streaming clients do.
In both cases we’re gathering events, parsing them, and farming them out to various consumers.
The issues are similar at all processing points in the stream.
We present a stream of events that is roughly sorted by created at time.
This means that the events are mostly in created at time order, but not exactly so.
We’ve designed our system to publish each event at least once --
which means none are lost, but there may, at times, be duplicates.
I’ll discuss why our streams have these properties.
Also, you’ll probably want to display or process tweets exactly once -- none missing and none duplicated.
You might also want to present them sorted, or you might be OK with a rough sorting.
I’ll go over two algorithms for converting what the API offers into the stream that you want.
The basic plan is to over deliver events and then de-duplicate them to provide an exactly once quality of service.
One technique is to just de-duplicate with set logic, the other is to sort and de-duplicate.
There are trade offs with each.
First, let’s see why the Streaming API offers events at least once.
It would be nice if we could offer everything transactionally, that is, exactly once.
But, it’s impractical to synchronize this state across client reconnections.
For example, it’s unlikely that you’ll reconnect to the same server.
Also, event streams aren’t strictly ordered, so we wouldn’t know what to deliver.
We’d have to coordinate a large vector of sent events between servers.
And, clients would have to transactionally acknowledge all events received.
This is quite impractical at scale unless we sorted streams, but this would introduce latency.
We’ll see why sorting induces latency later.
Yet, first and foremost, we want a very low latency experience.
And, we want a simple programming model for clients.
So, we assume that clients can over-request when reconnecting, and post process to get the required stream properties.
Once we make this fundamental assumption, we can reuse this to also handle the internal data loss risk as well.
Our Streaming API server is called Hosebird.
Hosebird receives events from the rest of the Twitter system through Kestrel message queues.
Two hosebird processes in each cluster read transactionally from Kestrel.
The rest of the servers in a cluster cascade via Streaming HTTP.
When a hosebird server starts, it prefetches events from a peer to pre-populate its circular buffers.
These buffers are used to support the count parameter, which allows some historical back fill on streaming queries.
Count allows your stream to start back a few minutes, then catch up and transition to real time streaming.
This startup prefetching creates a window where you might see the same event twice, if you are unlucky enough to connect to a very recently restarted server.
The backlog read from kestrel will contain some of the same events that were prefetched into the buffer.
The backlog may also have events that you read on your last connection.
You might have to suffer through a minute or so of duplicates as the backlog is processed and displaces the prefetched events in the circular buffer.
Outside of this restart case, during steady state processing, we deliver each event exactly once on fanout servers.
When a cascaded server has its source Hosebird restart, say during a deploy, the server needs to quickly fail over to another source.
A gap in the stream would be introduced during the failure, detection and reconnection window.
We cover this gap by requesting some back-fill from the new source.
This causes a short period of duplicated events.
During steady state processing, however, we deliver each event exactly once on cascaded servers.
Your client should use these same techniques on reconnect.
Over request with the count parameter if the connection was momentarily lost.
If the client has been disconnected for an extended period, you’ll have to back fill from the REST API.
When you need to make a predicate change, you can create a new connection, wait for the first event to arrive, then disconnect the old connection.
This should generally produce an at least once stream.
Let’s talk about de-duplication on your end.
A finite stream looks a lot like a relational database table -- a finite relation.
We’re used to thinking about finite relations.
But, a stream appears as an infinite relation, you can’t ever read to the end.
Also, since we want very low latency, we can’t wait to read to the end.
We have to present results immediately.
A roughly sorted sequence is mostly sorted, where no element is more than K positions away from its strictly sorted position.
At Twitter, we talk about K sorted things all the time.
K this, K that. Nothing is strictly ordered.
We have relaxed various legs of the CAP theorem to make our distributed system feasible.
We’ve never had strictly ordered event processing. Tweets are applied to your timelines in a rough ordering.
On the REST API, we sort the vector before we present it to you, but it’s very loose behind the scenes.
Likewise, events show up in the Streaming API roughly sorted by created at time.
Here are two samples from the status firehose.
I took five hundred thousand status ids, and did an insertion sort into a reverse sorted list.
The most recent id at the head, the oldest status at the tail.
These distributions show the number of list elements traversed before finding the sorted insertion point.
So, the average and median number of hops are pretty small.
The hundred percent case, the worst case, shows a much larger K.
Assuming about 600 events per second on this stream,
back when I took this sample,
we can see that events show up as much as 5 seconds out of order.
Close comparison of the distributions shows that they’re very noisy.
If you took many samples, they’d all have a different shape.
Having an idea of K helps us tune our de-duplication algorithms.
Daily operational issues cause K to grow beyond 5 seconds now and then.
It’s hard to say what a good upper bound for a display client should be.
Something around a few minutes would cover most issues we’ve had over the last six months.
A long-term storage client might want to assume a K of a few hours or a day or so.
In the unlikely event that something goes really wrong with the system, we’ll make a judgement call on recovery.
We’ll probably bias towards delivering the backlog, but, if there’s a partial failure, we’ll keep your K in mind.
Now that we have a handle on K we can think about de-duplication.
An infinite, but roughly sorted, stream can be de duplicated with some set logic.
The key is efficiently aging out irrelevant set members.
One way is to keep two hashes, and alternately clear them.
You don’t have to do any fancy tracking of items, and off the shelf HashSets will work just fine.
The union of the two sets contain at least K items and allow deduplication of a K sorted sequence.
Given the Firehose K, you don’t even need all that much space to de-duplicate.
Please don’t resort to using mySQL primary keys to de-duplicate streams. Unnecessary.
The nice thing here is that we can emit events as they arrive and throw away late arriving dups.
We don’t need to add any latency.
On the other hand, if we want a sorted and deduplicated stream, we have to do a little more work.
Given the Firehose K distribution, doing an insertion sort isn’t the worst thing.
Most events don’t need to traverse too deeply into the list.
Elements dequeued from the tail of the list are sorted and deduplicated.
This algorithm does, however introduce a latency of K.
We can’t emit a sorted event unless we have at least K elements to examine.
Still, this is quite practical to do in memory.
You can plow through a lot of ids per second even in a scripting language like Ruby.
Now that we have a de-duplicated stream, we need to route it to consumers.
This can be done very cheaply by registering every consumer’s predicates in a HashMap.
If, say, you are displaying columns of search results, like TweetDeck,
you can have each column register its keywords in the HashMap.
Each new event is applied to the HashMap, and routed to all consumers easily.
Duplicates can arise, as a given column may have several OR predicates that match.
Hosebird uses a generational de-duplication scheme to solve this.
This scheme is the degenerate case of the sorted algorithm above.
Each client stream maintains just the primary key of the last event.
If the same id is presented twice in a row, it can be discarded.
Break things up into components.
Host components in separate processes.
Measure what happens between components.
Use (reliable) queues between components.