Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2hmKFK1.
John Rizzo introduces Twitch's chat's architecture, telling how their engineers investigated and worked through the issues in what turned out to be a make-or-break situation for the company. Filmed at qconsf.com.
John Rizzo is a Senior Software Engineer at Twitch.
Linux Kernel vs DPDK: HTTP Performance ShowdownScyllaDB
In this session I will use a simple HTTP benchmark to compare the performance of the Linux kernel networking stack with userspace networking powered by DPDK (kernel-bypass).
It is said that kernel-bypass technologies avoid the kernel because it is "slow", but in reality, a lot of the performance advantages that they bring just come from enforcing certain constraints.
As it turns out, many of these constraints can be enforced without bypassing the kernel. If the system is tuned just right, one can achieve performance that approaches kernel-bypass speeds, while still benefiting from the kernel's battle-tested compatibility, and rich ecosystem of tools.
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Henning Jacobs
Kubernetes has the concept of resource requests and limits. Pods get scheduled on the nodes based on their requests and optionally limited in how much of the resource they can consume. Understanding and optimizing resource requests/limits is crucial both for reducing resource "slack" and ensuring application performance/low-latency. This talk shows our approach to monitoring and optimizing Kubernetes resources for 80+ clusters to achieve cost-efficiency and reducing impact for latency-critical applications. All shown tools are Open Source and can be applied to most Kubernetes deployments.
Introduction to Apache NiFi dws19 DWS - DC 2019Timothy Spann
A quick introduction to Apache NiFi and it's ecosystem. Also a hands on demo on using processors, examining provenance, ingesting REST Feeds, XML, Cameras, Files, Running TensorFlow, Running Apache MXNet, integrating with Spark and Kafka. Storing to HDFS, HBase, Phoenix, Hive and S3.
Communication between Microservices is inherently unreliable. These integration points may produce cascading failures, slow responses, service outages. We will walk through stability patterns like timeouts, circuit breaker, bulkheads and discuss how they improve stability of Microservices.
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudNoritaka Sekiyama
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud (Hadoop / Spark Conference Japan 2019)
# English version #
http://hadoop.apache.jp/hcj2019-program/
Linux Kernel vs DPDK: HTTP Performance ShowdownScyllaDB
In this session I will use a simple HTTP benchmark to compare the performance of the Linux kernel networking stack with userspace networking powered by DPDK (kernel-bypass).
It is said that kernel-bypass technologies avoid the kernel because it is "slow", but in reality, a lot of the performance advantages that they bring just come from enforcing certain constraints.
As it turns out, many of these constraints can be enforced without bypassing the kernel. If the system is tuned just right, one can achieve performance that approaches kernel-bypass speeds, while still benefiting from the kernel's battle-tested compatibility, and rich ecosystem of tools.
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Henning Jacobs
Kubernetes has the concept of resource requests and limits. Pods get scheduled on the nodes based on their requests and optionally limited in how much of the resource they can consume. Understanding and optimizing resource requests/limits is crucial both for reducing resource "slack" and ensuring application performance/low-latency. This talk shows our approach to monitoring and optimizing Kubernetes resources for 80+ clusters to achieve cost-efficiency and reducing impact for latency-critical applications. All shown tools are Open Source and can be applied to most Kubernetes deployments.
Introduction to Apache NiFi dws19 DWS - DC 2019Timothy Spann
A quick introduction to Apache NiFi and it's ecosystem. Also a hands on demo on using processors, examining provenance, ingesting REST Feeds, XML, Cameras, Files, Running TensorFlow, Running Apache MXNet, integrating with Spark and Kafka. Storing to HDFS, HBase, Phoenix, Hive and S3.
Communication between Microservices is inherently unreliable. These integration points may produce cascading failures, slow responses, service outages. We will walk through stability patterns like timeouts, circuit breaker, bulkheads and discuss how they improve stability of Microservices.
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudNoritaka Sekiyama
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud (Hadoop / Spark Conference Japan 2019)
# English version #
http://hadoop.apache.jp/hcj2019-program/
Apache Kafka is an open-source message broker project developed by the Apache Software Foundation written in Scala. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds.
Presentation at Strata Data Conference 2018, New York
The controller is the brain of Apache Kafka. A big part of what the controller does is to maintain the consistency of the replicas and determine which replica can be used to serve the clients, especially during individual broker failure.
Jun Rao outlines the main data flow in the controller—in particular, when a broker fails, how the controller automatically promotes another replica as the leader to serve the clients, and when a broker is started, how the controller resumes the replication pipeline in the restarted broker.
Jun then describes recent improvements to the controller that allow it to handle certain edge cases correctly and increase its performance, which allows for more partitions in a Kafka cluster.
This presentation provides an overview of the Dell PowerEdge R730xd server performance results with Red Hat Ceph Storage. It covers the advantages of using Red Hat Ceph Storage on Dell servers with their proven hardware components that provide high scalability, enhanced ROI cost benefits, and support of unstructured data.
Kafka's basic terminologies, its architecture, its protocol and how it works.
Kafka at scale, its caveats, guarantees and use cases offered by it.
How we use it @ZaprMediaLabs.
Accelerating Envoy and Istio with Cilium and the Linux KernelThomas Graf
This talk will provide an introduction to injection options of Envoy and then deep dive into ongoing Linux kernel work that enables injecting Envoy while introducing as little latency as possible.
The servicemesh and the sidecar proxy model are on a steep trajectory to redefine many networking and security use cases. This talk explains and demos a new socket redirect Linux kernel technology that allows running Envoy with similar performance as if the sidecar was linked to the application using a UNIX domain socket. The talk will also give an outlook on how Envoy can use the recently merged kernel TLS functionality to gain access to the clear text payload transparently for end to end encrypted applications without requiring to decrypt and re-encrypt any data to further reduce the overhead and latency.
Disaster Recovery and High Availability with Kafka, SRM and MM2Abdelkrim Hadjidj
In this talk, we will present Streams Replication Manager, a new open source Kafka mirroring solution designed specifically to provide disaster recovery and high availability for Kafka. We will describe and demo various replication topologies and recovery strategies using SRM and associated tooling. Finally, we will provide an update on the ongoing work to make this engine available for the Apache Kafka community as MirrorMaker2 (KIP-382).
Apache Kafka is an open-source message broker project developed by the Apache Software Foundation written in Scala. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds.
Presentation at Strata Data Conference 2018, New York
The controller is the brain of Apache Kafka. A big part of what the controller does is to maintain the consistency of the replicas and determine which replica can be used to serve the clients, especially during individual broker failure.
Jun Rao outlines the main data flow in the controller—in particular, when a broker fails, how the controller automatically promotes another replica as the leader to serve the clients, and when a broker is started, how the controller resumes the replication pipeline in the restarted broker.
Jun then describes recent improvements to the controller that allow it to handle certain edge cases correctly and increase its performance, which allows for more partitions in a Kafka cluster.
This presentation provides an overview of the Dell PowerEdge R730xd server performance results with Red Hat Ceph Storage. It covers the advantages of using Red Hat Ceph Storage on Dell servers with their proven hardware components that provide high scalability, enhanced ROI cost benefits, and support of unstructured data.
Kafka's basic terminologies, its architecture, its protocol and how it works.
Kafka at scale, its caveats, guarantees and use cases offered by it.
How we use it @ZaprMediaLabs.
Accelerating Envoy and Istio with Cilium and the Linux KernelThomas Graf
This talk will provide an introduction to injection options of Envoy and then deep dive into ongoing Linux kernel work that enables injecting Envoy while introducing as little latency as possible.
The servicemesh and the sidecar proxy model are on a steep trajectory to redefine many networking and security use cases. This talk explains and demos a new socket redirect Linux kernel technology that allows running Envoy with similar performance as if the sidecar was linked to the application using a UNIX domain socket. The talk will also give an outlook on how Envoy can use the recently merged kernel TLS functionality to gain access to the clear text payload transparently for end to end encrypted applications without requiring to decrypt and re-encrypt any data to further reduce the overhead and latency.
Disaster Recovery and High Availability with Kafka, SRM and MM2Abdelkrim Hadjidj
In this talk, we will present Streams Replication Manager, a new open source Kafka mirroring solution designed specifically to provide disaster recovery and high availability for Kafka. We will describe and demo various replication topologies and recovery strategies using SRM and associated tooling. Finally, we will provide an update on the ongoing work to make this engine available for the Apache Kafka community as MirrorMaker2 (KIP-382).
Project Manager - Carolyn Jao
Lead Researcher - Marga Javier
Lead UI Designer - Sean Gabay
We built a walkthrough feature for Twitch to enable users to locator video walkthroughs quickly.
Research includes: Generated user personas, stories, flows. Created mobile prototype on InVision and Responsive prototype from Axure. On-boarding process was a major part of creating this project.
The Livestream Way - Building a Culture of Sales Excellence & AccountabilitySam Jacobs
The Livestream Way is a synthesis of the best teachings and best practices of the Livestream sales organization. At Livestream, we're focused on sales career development and helping professionals become top performing sales account executives. Included in this deck is a survey of sales training techniques, methodologies, and structures for building top sales orgs.
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013Amazon Web Services
Amazon Redshift is a fast, fully-managed, petabyte-scale data warehouse service that costs less than $1,000 per terabyte per year—less than a tenth the price of most traditional data warehousing solutions. In this session, you get an overview of Amazon Redshift, including how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. Finally, we announce new features that we've been working on over the past few months.
Twitch Plays Pokemon: Collaborative Learning Opportunities with Video Game S...Eric Sembrat
The idea of Twitch Plays Pokemon is simple. The Twitch stream shows the gameplay (screen), alongside a total timer and list of commands being sent to the game and the author of said command. The social stream (chat window) is where users of Twitch type commands (button presses) that the game should perform given a current action.
This concept is fascinating because I could see this tool being used for instructional experiments, both in problem-based learning, collaborative learning, and anchored instruction.
eSports: Quite Possibly the Next Big Thing in Media/EntertainmentColin Sebastian
We estimate eSports-related revenues will increase from ~$200 million in 2015 to roughly $2 billion by 2020 driven by a combination of tournaments, advertising, sponsorships, broadcast rights, and fantasy/wagering, etc.
Please cite Baird as a source.
Presentation about facebook's chat architecture which explains how erlang was used to build a high scaling chat. This is an extract from - http://www.erlang-factory.com/upload/presentations/31/EugeneLetuchy-ErlangatFacebook.pdf
Note that facebook moved out of erlang as it was difficult to find the programmers and a few other small problems that forced to reimplement this with C++ on thrift.
Breaking Smart Speakers: We are Listening to You.Priyanka Aash
"In the past two years, smart speakers have become the most popular IoT device, Amazon_ Google and Apple have introduced their own smart speaker products. Most of these smart speakers have natural language recognition, chat, music playback, IoT device control, shopping, and so on. Manufacturers use artificial intelligence technology to make smart speakers have similar human capabilities in the chat conversation. However, with the smart speakers coming into more and more homes, and the function is becoming more powerful, its security has been questioned by many people. People are worried that smart speakers will be hacked to leak their privacy, and our research proves that this concern is very necessary.
In this talk, we will present how to use multiple vulnerabilities to achieve remote attack some of the most popular smart speakers. Our final attack effects include silent listening, control speaker speaking content and other demonstrations. And we're also going to talk about how to extract firmware from BGA packages Flash chips such as EMMC, EMCP, NAND Flash, etc. In addition, it contains how to turn on debug interfaces and get root privileges by modifying firmware content and Re-soldering Flash chips, which can be of great help for subsequent vulnerability analysis and debugging. Finally, we will play several demo videos to demonstrate how we can remotely access some Smart Speaker Root permissions and use smart speakers for eavesdropping and playing voice."
Let's Get Real (time): Server-Sent Events, WebSockets and WebRTC for the soulSwanand Pagnis
A guided tour of modern browser networking. We'll look at SSEs, WebSockets and WebRTC and see how to integrate them in your Ruby based app.
( This slides accompanied my talk at RubyConf India, 2014 0
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/20SN0dP.
Tammer Saleh talks about the mistakes people make when building a microservices architecture. He also talks about: when microservices are appropriate, and where to draw the lines between services, dealing with performance issues, testing and debugging techniques, managing a polyglot landscape and the explosion of platforms, managing failure and graceful degradation. Filmed at qconlondon.com.
Tammer Saleh is a long time developer, leader, and author of the acclaimed book *Rails AntiPatterns*. Saleh is currently building the Cloud Foundry platform at Pivotal.
WebRTC gives us a way to do real-time, peer-to-peer communication on the web. In this talk, we'll go over the current state of WebRTC (both the awesome parts and the parts which need to be improved) as well as what could come in the future. Mostly though, we'll take a look at how to combine WebRTC with other web technologies to create great experiences on the front-end for real-time, p2p web apps.
WebRTC has had a tough 3 or 4 years. But it's gone through a rebirth. Node.js developers are a perfect match for the technology. Come and play with it!
Talk given at Full Stack Toronto in Toronto
Tips from Support: Always Carry a Towel and Don’t Panic!Perforce
What should you do if you think you’ve got a problem with Perforce Helix? In this session, understand what the common issues are, what to look for, where to find help and how our Support engineers can assist you.
The Windows Communication Foundation (WCF) framework is being used in almost all .NET development platforms: Windows clients, ASP.NET applications, Windows Phone, Server side applications, and in Windows Azure; but have you ever wondered how WCF works? How you can extend it to your organization’s needs? How to monitor its work? How to tune it for better performance and scalability? WCF is the second largest assembly in the .NET Framework and as complex to understand.
In this 1-day workshop we will deep dive into WCF, learn how to monitor WCF services and how to troubleshoot them, how to tweak our services for better performance, how to secure them with transport and message security and discuss the pros and cons of each technique, and how to extend the WCF service pipeline to accommodate our needs.
Have Your Cake and Eat It Too -- Further Dispelling the Myths of the Lambda A...C4Media
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/15ACXCw.
Tyler Akidau from Google demonstrates Google's Millwheel, a streaming system that promises low latency, strong consistency, and flexibility without relying on Lambda Architecture. Filmed at qconsf.com.
Tyler Akidau is a Senior Software Engineer at Google. The current Tech Lead for the MillWheel team, he’s spent five years working on massive-scale streaming data processing systems.
[System design] Design a tweeter-like systemAree Oh
Presented in Simple step system design study.
As an interviewer, what should I do during system design interviews?
There are five steps.
Understanding requirements of a system is the first step that every interviewer should do.
Next step is to design interface and data model.
Based on the interface and requirements, the interviewer draws an initial system design.
Now, I the interviewer should think how to improve system qualities by asking right questions (scalability, availability, load balancing, caching, …)
At the end of the interview, it should have the final architecture of the system.
This presentation will go through the above steps with example scenario: design a tweeter-like system.
Video: https://www.hashicorp.com/resources/operating-consul-at-scale
With more than 35k machines and first external Contributor, Criteo is a very large Consul User. This presentation describes how we operate Consul at Scale at Criteo.
Presentation at HashiTalks 2019.
WebRTC Standards & Implementation Q&A - Legacy API Support ChangesAmir Zmora
The past few months have seen several discussions regarding the so-called “Legacy APIs”, meaning anything not officially supported in the spec that might have been implemented in the past. Some APIs have had support removed, others retained. This session will briefly review the recent decisions in addition to the normal Q&A.
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/39NIjLV.
Akhilesh Gupta does a technical deep-dive into how Linkedin uses the Play/Akka Framework and a scalable distributed system to enable live interactions like likes/comments at massive scale at extremely low costs across multiple data centers. Filmed at qconlondon.com.
Akhilesh Gupta is the technical lead for LinkedIn's Real-time delivery infrastructure and LinkedIn Messaging. He has been working on the revamp of LinkedIn’s offerings to instant, real-time experiences. Before this, he was the head of engineering for the Ride Experience program at Uber Technologies in San Francisco.
Next Generation Client APIs in Envoy MobileC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2x0Fav8.
Jose Nino guides the audience through the journey of Mobile APIs at Lyft. He focuses on how the team has reaped the benefits of API generation to experiment with the network transport layer. He also discusses recent developments the team has made with Envoy Mobile and the roadmap ahead. Filmed at qconlondon.com.
Jose Nino works as a Software Engineer at Lyft.
Software Teams and Teamwork Trends Report Q1 2020C4Media
How do we cope with an environment that has been radically disrupted, where people are suddenly thrust into remote work in a chaotic state? What are the emerging good practices and new ideas that are shaping the way in which software development teams work? What can we do to make the workplace a more secure and diverse one while increasing the productivity of our teams? This report aims to assist technical leaders in making mid- to long-term decisions that will have a positive impact on their organisations and teams and help individual contributors find the practices, approaches, tools, techniques, and frameworks that can help them get a better experience at work - irrespective of where they are working from.
Understand the Trade-offs Using Compilers for Java ApplicationsC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2QCmmJ0.
Mark Stoodley examines some of the strengths and weaknesses of the different Java compilation technologies, if one was to apply them in isolation. Stoodley discusses how production JVMs are assembling a combination of these tools that work together to provide excellent performance across the large spectrum of applications written in Java and JVM based languages. Filmed at qconsf.com.
Mark Stoodley joined IBM Canada to build Java JIT compilers for production use and led the team that delivered AOT compilation in the IBM SDK for Java 6. He spent the last five years leading the effort to open source nearly 4.3 million lines of source code from the IBM J9 Java Virtual Machine to create the two open source projects Eclipse OMR and Eclipse OpenJ9, and now co-leads both projects.
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2y2yPiS.
Colin McCabe talks about the ongoing effort to replace the use of Zookeeper in Kafka: why they want to do it and how it will work. He discusses the limitations they have found and how Kafka benefits both in terms of stability and scalability by bringing consensus in house. He talks about their progress, what work is remaining, and how contributors can help. Filmed at qconsf.com.
Colin McCabe is a Kafka committer at Confluent, working on the scalability and extensibility of Kafka. Previously, he worked on the Hadoop Distributed Filesystem and the Ceph Filesystem.
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2SXXXiD.
Katharina Probst talks about what it means to act like an owner and why teams need ownership to be high-performing. When team members, regardless of whether they have a formal leadership role or not, act like owners, magical things can happen. She shares ideas that we can apply to our own work, and talks about how to recognize when we don’t live up to our own expectations of acting like an owner. Filmed at qconsf.com.
Katharina Probst is a Senior Engineering Leader, Kubernetes & SaaS at Google. Before this, she was leading engineering teams at Netflix, being responsible for the Netflix API, which helps bring Netflix streaming to millions of people around the world. Prior to joining Netflix, she was in the cloud computing team at Google, where she saw cloud computing from the provider side.
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2T04Lw4.
Sergey Kuksenko talks about the performance benefits inline types bring to Java and how to exploit them. Inline/value types are the key part of experimental project Valhalla, which should bring new abilities to the Java language. Filmed at qconsf.com.
Sergey Kuksenko is a Java Performance Engineer at Oracle working on a variety of Java and JVM performance enhancements. He started working as Java Engineer in 1996 and as Java Performance Engineer in 2005. He has had a passion for exploring how Java works on modern hardware.
Do you need service meshes in your tech stack?
This on-line guide aims to answer pertinent questions for software architects and technical leaders, such as: what is a service mesh?, do I need a service mesh?, how do I evaluate the different service mesh offerings? In software architecture, a service mesh is a dedicated infrastructure layer for facilitating service-to-service communications between microservices, often using a sidecar proxy.
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2UgQ3BU.
Christie Wilson describes what to expect from CI/CD in 2019, and how Tekton is helping bring that to as many tools as possible, such as Jenkins X and Prow. Wilson talks about Tekton itself and performs a live demo that shows how cloud native CI/CD can help debug, surface and fix mistakes faster. Filmed at qconsf.com.
Christie Wilson is a software engineer at Google, currently leading the Tekton project. Over the past decade, she has worked in the mobile, financial and video game industries. Prior to working at Google she led a team of software developers to build load testing tools for AAA video game titles, and founded the Vancouver chapter of PyLadies.
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2S7lDiS.
Sasha Rosenbaum shows how a CI/CD pipeline for Machine Learning can greatly improve both productivity and reliability. Filmed at qconsf.com.
Sasha Rosenbaum is a Program Manager on the Azure DevOps engineering team, focused on improving the alignment of the product with open source software. She is a co-organizer of the DevOps Days Chicago and the DeliveryConf conferences, and recently published a book on Serverless computing in Azure with .NET.
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/36epVKg.
Todd Montgomery discusses the techniques and lessons learned from implementing Aeron Cluster. His focus is on how Raft can be implemented on Aeron, minimizing the network round trip overhead, and comparing single process to a fully distributed cluster. Filmed at qconsf.com.
Todd Montgomery is a networking hacker who has researched, designed, and built numerous protocols, messaging-oriented middleware systems, and real-time data systems, done research for NASA, contributed to the IETF and IEEE, and co-founded two startups. He currently works as an independent consultant and is active in several open source projects.
Architectures That Scale Deep - Regaining Control in Deep SystemsC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2FWc5Sk.
Ben Sigelman talks about "Deep Systems", their common properties and re-introduces the fundamentals of control theory from the 1960s, including the original conceptualizations of Observability & Controllability. He uses examples from Google & other companies to illustrate how deep systems have damaged people's ability to observe software, and what needs to be done in order to regain control. Filmed at qconsf.com.
Ben Sigelman is a co-founder and the CEO at LightStep, a co-creator of Dapper (Google’s distributed tracing system), and co-creator of the OpenTracing and OpenTelemetry projects (both part of the CNCF). His work and interests gravitate towards observability, especially where microservices, high transaction volumes, and large engineering organizations are involved.
ML in the Browser: Interactive Experiences with Tensorflow.jsC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/39SddUL.
Victor Dibia provides a friendly introduction to machine learning, covers concrete steps on how front-end developers can create their own ML models and deploy them as part of web applications. He discusses his experience building Handtrack.js - a library for prototyping real time hand tracking interactions in the browser. Filmed at qconsf.com.
Victor Dibia is a Research Engineer with Cloudera’s Fast Forward Labs. Prior to this, he was a Research Staff Member at the IBM TJ Watson Research Center, New York. His research interests are at the intersection of human computer interaction, computational social science, and applied AI.
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2s9T3Vl.
Colin Eberhardt looks at some of the internals of WebAssembly, explores how it works “under the hood”, and looks at how to create a (simple) compiler that targets this runtime. Filmed at qconsf.com.
Colin Eberhardt is the Technology Director at Scott Logic, a UK-based software consultancy where they create complex application for their financial services clients. He is an avid technology enthusiast, spending his evenings contributing to open source projects, writing blog posts and learning as much as he can.
User & Device Identity for Microservices @ Netflix ScaleC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2S9tOgy.
Satyajit Thadeshwar provides useful insights on how Netflix implemented a secure, token-agnostic, identity solution that works with services operating at a massive scale. He shares some of the lessons learned from this process, both from architectural diagrams and code. Filmed at qconsf.com.
Satyajit Thadeshwar is an engineer on the Product Edge Access Services team at Netflix, where he works on some of the most critical services focusing on user and device authentication. He has more than a decade of experience building fault-tolerant and highly scalable, distributed systems.
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2Ezs08q.
Justin Ryan talks about Netflix’ scalability issues and some of the ways they addressed it. He shares successes they’ve had from unintuitively partitioning computation into multiple services to get better runtime characteristics. He introduces us to useful probabilistic data structures, innovative bi-directional data passing, open-source projects available from Netflix that make this all possible. Filmed at qconsf.com.
Justin Ryan is Playback Edge Engineering at Netflix. He works on some of the most critical services at Netflix, specifically focusing on user and device authentication. Years of building developer tools has also given him a healthy set of opinions on developer productivity.
Make Your Electron App Feel at Home EverywhereC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2Z4ZJjn.
Kilian Valkhof discusses the process of making an Electron app feel at home on all three platforms: Windows, MacOS and Linux, making devs aware of the pitfalls and how to avoid them. Filmed at qconsf.com.
Kilian Valkhof is a Front-end Developer & User-experience Designer at Firstversionist. He writes about various topics, from design to machine learning, on his personal website, kilianvalkhof.com and is a frequent contributer to open source software. He is part of the Electron governance team that oversees the development of the Electron framework.
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/344PnB1.
Steve Klabnik goes over the deep details of how async/await works in Rust, covering concepts like coroutines, generators, stack-less vs stack-ful, "pinning", and more. Filmed at qconsf.com.
Steve Klabnik is on the core team of Rust, leads the documentation team, and is an author of "The Rust Programming Language." He is a frequent speaker at conferences and is a prolific open source contributor, previously working on projects such as Ruby and Ruby on Rails.
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2OUz6dt.
Chris Riccomini talks about the current state-of-the-art in data pipelines and data warehousing, and shares some of the solutions to current problems dealing with data streaming and warehousing. Filmed at qconsf.com.
Chris Riccomini works as a Software Engineer at WePay.
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2rm4hFD.
Yevgeniy Brikman talks about how to write automated tests for infrastructure code, including the code written for use with tools such as Terraform, Docker, Packer, and Kubernetes. Topics covered include: unit tests, integration tests, end-to-end tests, dependency injection, test parallelism, retries and error handling, static analysis, property testing and CI / CD for infrastructure code. Filmed at qconsf.com.
Yevgeniy Brikman is the co-founder of Gruntwork, a company that provides DevOps as a Service. He is the author of two books published by O'Reilly Media: Hello, Startup and Terraform: Up & Running. Previously, he worked as a software engineer at LinkedIn, TripAdvisor, Cisco Systems, and Thomson Financial.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
2. InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
twitch-pokemon
3. Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon San Francisco
www.qconsf.com
7. • Over 800k concurrent users
• Tens of BILLIONS of daily messages
• ~10 Servers
• 2 Engineers
• 19 Amp Energy drinks per day
Twitch Introduction
www.twitch.tv
10. Public Reaction
Media outlets have described the proceedings of the game as
being "mesmerizing", "miraculous" and "beautiful chaos", with
one viewer comparing it to "watching a car crash in slow
motion” - Wikipedia
“Some users are hailing the development as part of internet
history…” – BBC
–TwitchChatTeam
www.twitch.tv
11. 10:33pm, February 14: TPP hits 5k concurrent viewers
9:42am, February 15: TPP hits 20k
8:21pm, February 16: Time to take preemptive action
Twitch Plays Pokémon Strikes!
www.twitch.tv
12. We separate servers into several clusters
• Robust in the face of failure
• Can tune each cluster for its specific purpose
Chat Server Organization
www.twitch.tv
Main
Cluster
Event
Cluster
Group
Cluster
Test
Cluster
13. 8:21pm, February 16: Move TPP onto the event cluster
Twitch Plays Pokémon Strikes!
www.twitch.tv
14. 5:43pm, February 16: TPP hits 50k users, chat system
starts to show signs of stress (but…it’s on the event cluster!)
8:01am, February 17: Twitch chat engineers panic, rush to
the office
9:35am, February 17: Investigation begins
Twitch Plays Pokémon Strikes!
www.twitch.tv
15. Debugging principle: Start investigating upstream, move
downstream as necessary
Solving Twitch Plays Pokémon
www.twitch.tv
16. Edge Server: Sits at the “edge” of the public internet and Twitch’s
internal network
Chat Architecture Overview
www.twitch.tv
User Chat Edge
Public Internet Internal Twitch Network
Flash socket
Ingestion
Distribution
17. Any user:room:edge tuple is valid
Solving Twitch Plays Pokémon
www.twitch.tv
user9
user1:lirik
user2:degentp
user3:witwix
…
user4:witwix
user5:armorra
user6:degentp
…
user2:witwix
user7:cep21
user8:lirik
…
18. A Note on Instrumentation
www.twitch.tv
Chat Service
Remote statsd
collector
Graphite
cluster
UDP UDP
HTTP
19. Solving Twitch Plays Pokémon
So let’s take a look at our edge server logs…
Feb 18 06:54:04 tmi_edge: [clue] Timed out (after 5s) while writing request for:
privmsg
Feb 18 06:54:04 tmi_edge: [clue] Message successfully sent to #degentp
Feb 18 06:54:04 tmi_edge: [clue] Timed out (after 5s) while writing request for:
privmsg
Feb 18 06:54:04 tmi_edge: [clue] Timed out (after 5s) while writing request for:
privmsg
Feb 18 06:54:04 tmi_edge: [clue] Timed out (after 5s) while writing request for:
privmsg
Feb 18 06:54:04 tmi_edge: [clue] Timed out (after 5s) while writing request for:
privmsg
Feb 18 06:54:04 tmi_edge: [clue] Timed out (after 5s) while writing request for:
privmsg
Feb 18 06:54:05 tmi_edge: [clue] Timed out (after 5s) while writing request for:
privmsg
Feb 18 06:54:04 tmi_edge: [clue] Message successfully sent to #paragusrants
www.twitch.tv
20. Solving Twitch Plays Pokémon
Let’s dissect one of these log lines
Feb 18 06:54:04 chat_edge: [clue] Timed out (after 5s) while writing request for:
privmsg
www.twitch.tv
21. Solving Twitch Plays Pokémon
Let’s dissect one of these log lines
Feb 18 06:54:04 chat_edge: [clue] Timed out (after 5s) while writing request for:
privmsg
www.twitch.tv
Timestamp - when this action was recorded on the server
22. Solving Twitch Plays Pokémon
Let’s dissect one of these log lines
Feb 18 06:54:04 chat_edge: [clue] Timed out (after 5s) while writing request for:
privmsg
www.twitch.tv
Server - the name of the server that is generating this log file
23. Solving Twitch Plays Pokémon
Let’s dissect one of these log lines
Feb 18 06:54:04 chat_edge: [clue] Timed out (after 5s) while writing request for:
privmsg
www.twitch.tv
Remote service - the name of the service that edge is connecting
to
24. Solving Twitch Plays Pokémon
Let’s dissect one of these log lines
Feb 18 06:54:04 chat_edge: [clue] Timed out (after 5s) while writing request for:
privmsg
www.twitch.tv
Event detail
Why did the clue service take so long to respond? Also, what is
the clue service?
26. Solving Twitch Plays Pokémon
Let’s dissect one of these log lines
Feb 18 06:54:04 tmi_edge: [clue] Timed out (after 5s) while writing request for:
privmsg
www.twitch.tv
Clue server took longer than 5 seconds to process this message.
Why?
27. Solving Twitch Plays Pokémon
Clue logs…
Feb 18 06:54:04 chat_clue: WARNING:tornado.general:Connect error on fd 10:
ECONNREFUSED
Feb 18 06:54:04 chat_clue: WARNING:tornado.general:Connect error on fd 15:
ECONNREFUSED
Feb 18 06:54:05 chat_clue: WARNING:tornado.general:Connect error on fd 9:
ECONNREFUSED
Feb 18 06:54:05 chat_clue: WARNING:tornado.general:Connect error on fd 18:
ECONNREFUSED
www.twitch.tv
Not very useful…but we get some info. Clue’s connections are
being refused.
Which machine is clue failing to connect to? Why?
28. Let’s take a step back…these errors are happening on both main
AND the event clusters. Why?
Are there common services or dependencies?
• Databases (store chat color, badges, bans, etc)
• Cache (to speed up database access)
• Services (authentication, user data, etc)
Investigating Clue
www.twitch.tv
30. Can rule out databases and services – rest of site is functional
Let’s look closer at our cache – this one is specific to chat servers
Investigating Clue
www.twitch.tv
31. Redis: where do we start investigating?
Strategy: start high-level then drill down
Investigating Redis
www.twitch.tv
Redis server isn’t being stressed very hard
38. Redis Configuration
www.twitch.tv
Are we load balancing across many Redis instances?
DB_SERVER=db.internal.twitch.tv
DB_NAME=twitch_db
DB_TIMEOUT=1s
CACHE_SERVER=localhost
CACHE_PORT=2000
CACHE_MAX_CONNS=10
CACHE_TIMEOUT=100ms
…
…
39. Redis Configurtaion
www.twitch.tv
Are we load balancing across many Redis instances?
class twitch::haproxy::listeners::chat_redis (
$haproxy_instance = ‘chat-backend',
$proxy_name = 'chat-redis',
$servers = [
'redis2.internal.twitch.tv:6379',
],
...
...
...
40. Redis Configurtaion
www.twitch.tv
Are we load balancing across many Redis instances?
class twitch::haproxy::listeners::chat_redis (
$haproxy_instance = ’chat-backend',
$proxy_name = 'chat-redis',
$servers = [
'redis2.internal.twitch.tv:6379',
],
...
...
...
We are not load balancing across several instances
41. Investigating Redis
$ top
Tasks: 281 total, 1 running, 311 sleeping, 0 stopped, 0 zombie
Cpu(s): 10.3%us, 10.5%sy, 0.0%ni, 95.4%id, 0.0%wa, 0.0%hi,
Mem: 24682328k total,6962336k used, 17719992k free, 13644k buffers
Swap: 7999484k total, 0k used, 7999484k free, 4151420k cached
PID PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
26109 20 0 76048 128m 1340 S 99 0.2 6133:28 redis-server
3342 20 0 9040 1320 844 R 2 0.0 0:00.01 top
1 20 0 90412 3920 2576 S 0 0.0 103:45.82 init
2 20 0 0 0 0 S 0 0.0 0:05.17 kthreadd
Let’s take a look at the Redis box…
www.twitch.tv
42. Investigating Redis
Redis is unsprisingly maxing out the CPU
$ top
Tasks: 281 total, 1 running, 311 sleeping, 0 stopped, 0 zombie
Cpu(s): 10.3%us, 10.5%sy, 0.0%ni, 95.4%id, 0.0%wa, 0.0%hi
Mem: 24682328k total,6962336k used, 17719992k free, 13644k buffers
Swap: 7999484k total, 0k used, 7999484k free, 4151420k cached
PID PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
26109 20 0 76048 128m 1340 S 99 0.2 6133:28 redis-server
3342 20 0 9040 1320 844 R 2 0.0 0:00.01 top
1 20 0 90412 3920 2576 S 0 0.0 103:45.82 init
2 20 0 0 0 0 S 0 0.0 0:05.17 kthreadd
www.twitch.tv
46. • 2:23pm: There are some challenges (cache coherence,
implementation difficulty)
• Is there any low-hanging fruit?
• Yes! Rate limiting code!
• 2:33pm: Change has little effect...
Redis Optimization Options?
www.twitch.tv
48. 2:56pm: Yes! Sharding by key!
Redis Optimization Options?
www.twitch.tv
set “effinga.chat_color”
get “degentp.chat_color”
get “effinga.chat_color”
set “degentp.chat_color”
Redis
Redis
Redis
Redis
Distribution
Function
49. 3:03pm: How do we implement this?
• Puppet configuration changes
• HAProxy changes
• Redis deploy changes (copy/paste!)
• Do we need to modify any code?
• Can we let HAProxy load balance for us?
• No – LB needs to be aware of Redis protocol
• Changes required at the application level
Distributing Redis
www.twitch.tv
57. Solving Twitch Plays Pokémon
Let’s use some tools to dig deeper…
$ redis-cli -h redis2.internal.twitch.tv -p 6379 INFO
# Clients
connected_clients:3021
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0
$ redis-cli -h redis2.internal.twitch.tv -p 6379 CLIENT LIST |
grep idle | wc -l
2311
www.twitch.tv
58. Solving Twitch Plays Pokémon
Lots of bad connections
$ redis-cli -h redis2.internal.twitch.tv -p 6379 INFO
# Clients
connected_clients:3021
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0
$ redis-cli -h redis2.internal.twitch.tv -p 6379 CLIENT LIST |
grep idle | wc -l
2311
www.twitch.tv
59. Solving Twitch Plays Pokémon
Let’s grab the pid of one Redis instance
$ sudo svstat /etc/service/redis_*
/etc/service/redis_6379: up (pid 26109) 3543 seconds
/etc/service/redis_6380: up (pid 26111) 3543 seconds
/etc/service/redis_6381: up (pid 26113) 3543 seconds
/etc/service/redis_6382: up (pid 26114) 3544 seconds
www.twitch.tv
60. Solving Twitch Plays Pokémon
$ sudo svstat /etc/service/redis_*
/etc/service/redis_6379: up (pid 26109) 3543 seconds
/etc/service/redis_6380: up (pid 26111) 3543 seconds
/etc/service/redis_6381: up (pid 26113) 3543 seconds
/etc/service/redis_6382: up (pid 26114) 3544 seconds
www.twitch.tv
Let’s grab the pid of one Redis instance
64. Solving Twitch Plays Pokémon
11:12pm: Shut off testing cluster completely
www.twitch.tv
65. Users can connect to chat again!
Users can send messages again!
Chat team leaves the office at 11:31pm
Twitch Plays Pokémon is
Solved(?)
www.twitch.tv
70. • Edge/Clue instrumentation show no errors
• Exchange isn’t even instrumented!
• Let’s fix that…
Solving the Distribution Problem
www.twitch.tv
71. Solving the Distribution Problem
Let’s look at our new exchange logs
Feb 19 14:11:06 exchange: [exchange] i/o timeout
Feb 19 14:11:06 exchange: [exchange] Exchange success
Feb 19 14:11:06 exchange: [exchange] i/o timeout
Feb 19 14:11:06 exchange: [exchange] i/o timeout
Feb 19 14:11:06 exchange: [exchange] i/o timeout
Feb 19 14:11:06 exchange: [exchange] i/o timeout
Feb 19 14:11:06 exchange: [exchange] i/o timeout
Feb 19 14:11:06 exchange: [exchange] i/o timeout
Feb 19 14:11:06 exchange: [exchange] Exchange success
www.twitch.tv
Ideas?
72. Solving the Distribution Problem
• These are extremely short and simple requests, but there are
many of them
• We aren’t using HTTP keepalives
www.twitch.tv
74. Lessons Learned
• Better logs/instrumentation to make debugging easier
• Generate fake traffic to force us to handle more load than we
need
• Make sure we use and configure our infrastructure wisely!
www.twitch.tv
75. What have we done since?
• We now use AWS servers exclusively
• Better Redis infrastructure
• Python -> Go
• Lots of other big and small changes to support new products
and better QoS
www.twitch.tv