Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1eGbVJv.
Chris Riccomini discusses: Samza's feature set, how Samza integrates with YARN and Kafka, how it's used at LinkedIn, and what's next on the roadmap. Filmed at qconsf.com.
Chris Riccomini is a Staff Software Engineer at LinkedIn, where he's is currently working as a committer and PMC member for Apache Samza. He's been involved in a wide range of projects at LinkedIn, including, "People You May Know", REST.li, Hadoop, engineering tooling, and OLAP systems. Prior to LinkedIn, he worked on data visualization and fraud modeling at PayPal.
Apache Incubator Samza: Stream Processing at LinkedInChris Riccomini
This is the slide deck that was presented at the Hadoop Users Group at LinkedIn on November 5, 2013.
The presentation covers what Samza is, why we built it, and how it works.
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent) K...confluent
In this talk we'll look at the relationship between three of the most disruptive software engineering paradigms: event sourcing, stream processing and serverless. We'll debunk some of the myths around event sourcing. We'll look at the inevitability of event-driven programming in the serverless space and we'll see how stream processing links these two concepts together with a single 'database for events'. As the story unfolds we'll dive into some use cases, examine the practicalities of each approach-particularly the stateful elements-and finally extrapolate how their future relationship is likely to unfold. Key takeaways include: The different flavors of event sourcing and where their value lies. The difference between stream processing at application- and infrastructure-levels. The relationship between stream processors and serverless functions. The practical limits of storing data in Kafka and stream processors like KSQL.
Extending the Stream/Table Duality into a Trinity, with Graphs (David Allen &...confluent
The stream table duality in Kafka lets us look at our data in two different ways, whichever is more convenient for our use. But what about when the connections between the data points add much more value to our data? For this, we need to look at our data as a graph. Graphs help drive financial fraud investigations, social media analyses, network & IT management use cases, recommendation engines, and knowledge management. These are all cases where patterns of interaction in your data (for example, a pattern of structured financial transactions) matter more than the individual data points (a single transfer). We'll cover how to easily transform Kafka streams or tables into graphs, and query them declaratively using Cypher or GraphQL. In graph shape, we can enrich our social network streams with powerful graph algorithms that tell us about user and event influence through graph centrality, then streaming results back to Kafka. Stream/table duality becomes the stream / table / graph trinity. We will demonstrate the trinity by: - Getting started with regular kafka streams, - Using confluent hub's Neo4j sink - Exposing query-able graphs with Cypher & GraphQL - Analyzing data with Neo4j's graph algorithms - Transforming graphs back into streams The trinity means not choosing between representations, but using the best one for your use case. We'll demonstrate how it can be used to tackle social network analysis problems and discuss how the approach can be extended to real-time financial fraud detection and more.
Span Conference: Why your company needs a unified logAlexander Dean
Apache Kafka and Amazon Kinesis are more than just message queues — they can serve as a unified log which you can put at the heart of your business, effectively creating a "digital nervous system" which your company's applications and processes can be re-structured around.
In this talk, Alex will provide an introduction to unified log technology, highlight some killer use cases and also show how Kinesis is being used "in anger" at Snowplow. Alex's talk will draw on his experiences working with event streams over the last two and a half years at Snowplow; it’s also heavily influenced by Jay Kreps’ unified log monograph, and by Alex's recent work penning Unified Log Processing, a Manning book. Alex's talk will show how event streams inside a unified log are an incredibly powerful primitive for building rich event-centric applications, unbundling local transactional silos and creating a single version of truth for a company.
Alex's talk will conclude with a live demo of Amazon Kinesis in action processing Snowplow events.
Observability for developer ( Inny So & Andrew Jones, ThoughtWorks) Kafka Su...confluent
Have you ever tried to debug a production outage, when your system comprises apps your team has written, third-party apps your team runs, with logs going into some system, application performance metrics going into another system, and cloud platform metrics going somewhere else? Did you find yourself switching tabs, trying to correlate metrics with logs and alerts and finding yourself in a huge tangle? It is a nightmare. In the data world, we talk about aggregating all our data so we can derive new insights quickly, but what about our operational data? Observability is your ability to be able to ask questions of your system without having to write new code, or grab new data. When you've got an observable system, it feels like you have debugging superpowers, but can be challenging to even know where to start. If you can even convince your colleagues to start, finding the right tools can be challenging. In this talk Inny and Andrew will talk about what monitoring and logging are not sufficient anymore (if they ever were), observability basics, and demo an observability platform that you can use to start your observability journey today.
Apache Incubator Samza: Stream Processing at LinkedInChris Riccomini
This is the slide deck that was presented at the Hadoop Users Group at LinkedIn on November 5, 2013.
The presentation covers what Samza is, why we built it, and how it works.
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent) K...confluent
In this talk we'll look at the relationship between three of the most disruptive software engineering paradigms: event sourcing, stream processing and serverless. We'll debunk some of the myths around event sourcing. We'll look at the inevitability of event-driven programming in the serverless space and we'll see how stream processing links these two concepts together with a single 'database for events'. As the story unfolds we'll dive into some use cases, examine the practicalities of each approach-particularly the stateful elements-and finally extrapolate how their future relationship is likely to unfold. Key takeaways include: The different flavors of event sourcing and where their value lies. The difference between stream processing at application- and infrastructure-levels. The relationship between stream processors and serverless functions. The practical limits of storing data in Kafka and stream processors like KSQL.
Extending the Stream/Table Duality into a Trinity, with Graphs (David Allen &...confluent
The stream table duality in Kafka lets us look at our data in two different ways, whichever is more convenient for our use. But what about when the connections between the data points add much more value to our data? For this, we need to look at our data as a graph. Graphs help drive financial fraud investigations, social media analyses, network & IT management use cases, recommendation engines, and knowledge management. These are all cases where patterns of interaction in your data (for example, a pattern of structured financial transactions) matter more than the individual data points (a single transfer). We'll cover how to easily transform Kafka streams or tables into graphs, and query them declaratively using Cypher or GraphQL. In graph shape, we can enrich our social network streams with powerful graph algorithms that tell us about user and event influence through graph centrality, then streaming results back to Kafka. Stream/table duality becomes the stream / table / graph trinity. We will demonstrate the trinity by: - Getting started with regular kafka streams, - Using confluent hub's Neo4j sink - Exposing query-able graphs with Cypher & GraphQL - Analyzing data with Neo4j's graph algorithms - Transforming graphs back into streams The trinity means not choosing between representations, but using the best one for your use case. We'll demonstrate how it can be used to tackle social network analysis problems and discuss how the approach can be extended to real-time financial fraud detection and more.
Span Conference: Why your company needs a unified logAlexander Dean
Apache Kafka and Amazon Kinesis are more than just message queues — they can serve as a unified log which you can put at the heart of your business, effectively creating a "digital nervous system" which your company's applications and processes can be re-structured around.
In this talk, Alex will provide an introduction to unified log technology, highlight some killer use cases and also show how Kinesis is being used "in anger" at Snowplow. Alex's talk will draw on his experiences working with event streams over the last two and a half years at Snowplow; it’s also heavily influenced by Jay Kreps’ unified log monograph, and by Alex's recent work penning Unified Log Processing, a Manning book. Alex's talk will show how event streams inside a unified log are an incredibly powerful primitive for building rich event-centric applications, unbundling local transactional silos and creating a single version of truth for a company.
Alex's talk will conclude with a live demo of Amazon Kinesis in action processing Snowplow events.
Observability for developer ( Inny So & Andrew Jones, ThoughtWorks) Kafka Su...confluent
Have you ever tried to debug a production outage, when your system comprises apps your team has written, third-party apps your team runs, with logs going into some system, application performance metrics going into another system, and cloud platform metrics going somewhere else? Did you find yourself switching tabs, trying to correlate metrics with logs and alerts and finding yourself in a huge tangle? It is a nightmare. In the data world, we talk about aggregating all our data so we can derive new insights quickly, but what about our operational data? Observability is your ability to be able to ask questions of your system without having to write new code, or grab new data. When you've got an observable system, it feels like you have debugging superpowers, but can be challenging to even know where to start. If you can even convince your colleagues to start, finding the right tools can be challenging. In this talk Inny and Andrew will talk about what monitoring and logging are not sufficient anymore (if they ever were), observability basics, and demo an observability platform that you can use to start your observability journey today.
Asynchronous micro-services and the unified logAlexander Dean
On Friday October 7th 2016 at Crunch Conference in Budapest I gave a talk entitled "Asynchronous micro-services and the unified log".
The unified log enabled by Apache Kafka and Amazon Kinesis has been mostly understood as a better data processing architecture, replacing traditional data warehousing techniques. But the unified log also enables a new way of building transactional software, by enabling asynchronous micro-services. In this talk, I showed how event-driven micro-services designed around Kafka or Kinesis resolve many of the issues associated with traditional monolithic and synchronous micro-service based architectures.
Jay Kreps, Confluent | Kafka Summit SF 2019 Keynote ft. Dev Tagare, Lyft + Pr...confluent
Jay Kreps, Confluent Co-Founder and Co-Creator of Apache Kafka, delivers the keynote presentation at Kafka Summit San Francisco 2019. He explains modern stream processing, real-time databases, KSQL, and Confluent Cloud's newest offering – a fully managed, serverless Kafka. In an effort to bring event streaming to even more developers, Priya Shivakumar announces Kafka made serverless in Confluent Cloud, with $50 free for the first three months.
Recording includes a Q&A session between Jay Kreps and Devendra Tagare, Engineering Manager at Lyft. They discuss the enhanced features Confluent Cloud offers on top of typical Kafka use cases – from mission critical reliability, scaling billions of messages at under 50ms latency, to multicloud data streaming and 24/7 Kafka support."
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017Thomas Weise
https://berlinbuzzwords.de/17/session/batch-streaming-etl-apache-apex
Stream data processing is increasingly required to support business needs for faster actionable insight with growing volume of information from more sources. Apache Apex is a true stream processing framework for low-latency, high-throughput and reliable processing of complex analytics pipelines on clusters. Apex is designed for quick time-to-production, and is used in production by large companies for real-time and batch processing at scale.
This session will use an Apex production use case to walk through the incremental transition from a batch pipeline with hours of latency to an end-to-end streaming architecture with billions of events per day which are processed to deliver real-time analytical reports. The example is representative for many similar extract-transform-load (ETL) use cases with other data sets that can use a common library of building blocks. The transform (or analytics) piece of such pipelines varies in complexity and often involves business logic specific, custom components.
Topics include:
Pipeline functionality from event source through queryable state for real-time insights.
API for application development and development process.
Library of building blocks including connectors for sources and sinks such as Kafka, JMS, Cassandra, HBase, JDBC and how they enable end-to-end exactly-once results.
Stateful processing with event time windowing.
Fault tolerance with exactly-once result semantics, checkpointing, incremental recovery
Scalability and low-latency, high-throughput processing with advanced engine features for auto-scaling, dynamic changes, compute locality.
Recent project development and roadmap.
Following the session attendees will have a high level understanding of Apex and how it can be applied to use cases at their own organizations.
Crossing the Streams: Event Streaming with Kafka Streams
Viktor Gamov, Developer Advocate, Confluent
https://www.meetup.com/Lille-Kafka/events/270888493/
Streams and Tables: Two Sides of the Same Coin (BIRTE 2018)confluent
Presentation given at BIRTE 2018
Twelfth International Workshop on Real-Time Business Intelligence and Analytics
27 August, 2018, Rio de Janeiro
Speakers:
Matthias Sax, Confluent Inc.
Guozhang Wang, Confluent Inc.
Matthias Weidlich, Humboldt-Universität zu Berlin
Johann-Christoph Freytag, Humboldt-Universität zu Berlin
Introducing Tupilak, Snowplow's unified log fabricAlexander Dean
In this talk at Snowplow London Meetup #3 I introduced Tupilak, Snowplow’s unified log fabric. Putting a real-time event pipeline into production has many challenges: we need the pipeline to scale automatically based on event volumes, we need constant monitoring to prevent data loss and minimise end-to-end lag, and we need the ability to upgrade and extend the pipeline with zero downtime. We call software which does all this a “unified log fabric”, to distinguish it from the unified logs (e.g. Kafka and Kinesis) and stream processing frameworks (e.g. Spark Streaming and Kafka Streams) which such a fabric monitors and orchestrates.
As part of incorporating Snowplow’s Kinesis-based event pipeline into our Managed Service, we developed our own unified log fabric, called Tupilak. In this talk, I introduced Tupilak, explaining the core monitoring and scaling functions of Tupilak and showing live real-time pipelines visualised in the Tupilak UI. I dived into the architecture of Tupilak, shared its basic scaling algorithm and also took a look at how Tupilak itself is built on a Snowplow event stream. I also talked about the roadmap for Tupilak, including our plans for introducing lag-based auto-scaling and porting Tupilak to Kubernetes.
More info: https://cnfl.io/cloud-native-experience-for-kafka-in-cloud | Neha Narkhede is co-founder and CTO at Confluent, a company backing the popular Apache Kafka messaging system. Prior to founding Confluent, Neha led streams infrastructure at LinkedIn, where she was responsible for LinkedIn’s streaming infrastructure built on top of Apache Kafka and Apache Samza. She is one of the initial authors of Apache Kafka and a committer and PMC member on the project.
Jun Rao, Confluent | Kafka Summit SF 2019 Keynote ft. Chris Kasten, Walmart Labsconfluent
Jun Rao, Confluent Co-Founder discusses the power of Kafka, why it was created at LinkedIn, and what it's used for at Kafka Summit SF 2019's keynote. Featuring Chris Kasten, VP Walmart Cloud.
Data Stream Processing - Concepts and FrameworksMatthias Niehoff
An overview on various concepts used in data stream processing. Most of them are used for solving problems in the field of time, focussing on processing time compared to event time. The techniques shown include the Dataflow API as it was introduced by Google and the concepts of stream and table duality. But I will also come up with other problems like data lookup and deployment of streaming applications and various strategies on solving these problems.
In the end I will give a brief outline on the implementation status of those strategies in the popular streaming frameworks Apache Spark Streaming, Apache Flink and Kafka Streams.
Detecting Real-Time Financial Fraud with Cloudflow on KubernetesLightbend
Deploying a robust streaming data pipeline can be a daunting task when your company’s financial information is at risk. For starters, how do you ensure proper provisioning of resources? How do you preserve end-to-end application and data consistency? How do you make all of this work in the cloud with Kubernetes and avoid YAML hell? Answer: Cloudflow, a new open-source toolkit for simplifying the development, deployment, and operation of streaming data pipelines.
Using Apache Kafka to Analyze Session Windowsconfluent
Speaker: Michael Noll, Product Manager, Confluent
In this online talk, we’ll introduce the concept of a session window, talk about common use cases, and walk through how Apache Kafka can be used for session-oriented use cases.
Speaker: Joseph Rea, Engineer, Confluent
Joseph will talk about how to visualize topics in Apache Kafka®, the difference between a stream and table in KSQL and his lessons learned on tackling this technical challenge with millions of Kafka messages consumed per second. With such functionalities, users can understand their data easier and in a highly performant and scalable way. This talk covers understanding web workers as they relate to webpack, web socket management, debugging browser performance and the future of the applications that can now be built.
Joseph Rea started engineering with the LAMP stack building custom e-commerce checkouts, ERP systems and enterprise water/sewer billing software. He worked at Yahoo as a front end engineer in the media org before doing Android and iOS development for the video SDK. He also worked at LifeLock to build an application that updated PII on various service sites. He likes turtles. He currently works at Confluent building so much UI. He blogs at https://cnfl.io/blog-joseph-rea.
Speaker: Neil Avery, Technologist, Office of the CTO, Confluent
Stream processing is now at the forefront of many company strategies. Over the last couple of years we have seen streaming use cases explode and now proliferate the landscape of any modern business.
Use cases including digital transformation, IoT, real-time risk, payments microservices and machine learning are all built on the fundamental that they need fast data and they need it at scale.
Apache Kafka® has long been the streaming platform of choice, its origins of being dumb pipes for big data have long since been left behind and now it is the goto-streaming platform of choice.
Stream processing beckons as being the vehicle for driving those streams, and along with it brings a world of real-time semantics surrounding windowing, joining, correctness, elasticity, and accessibility. The ‘current state of stream processing’ walks through the origins of stream processing, applicable use cases and then dives into the challenges currently facing the world of stream processing as it drives the next data revolution.
Neil is a Technologist in the Office of the CTO at Confluent, the company founded by the creators of Apache Kafka. He has over 20 years of expertise of working on distributed computing, messaging and stream processing. He has built or redesigned commercial messaging platforms, distributed caching products as well as developed large scale bespoke systems for tier-1 banks. After a period at ThoughtWorks, he went on to build some of the first distributed risk engines in financial services. In 2008 he launched a startup that specialised in distributed data analytics and visualization. Prior to joining Confluent he was the CTO at a fintech consultancy.
Watch the recording: https://videos.confluent.io/watch/rmU6GHrd4EKFaZrRhdTE3s?.
In this talk, Confluent co-founder and CEO, Jay Kreps will cover the rise of two trends:
1. The rise of Apache Kafka and event streams
2. The rise of the public cloud and cloud-native data systems
... and the problems we need to solve as these two trends come together.
Kafka as an Event Store (Guido Schmutz, Trivadis) Kafka Summit NYC 2019confluent
Event Sourcing and CQRS are two popular patterns for implementing a Microservices architectures. With Event Sourcing we do not store the state of an object, but instead store all the events impacting its state. Then to retrieve an object state, we have to read the different events related to a certain object and apply them one by one. CQRS (Command Query Responsibility Segregation) on the other hand is a way to dissociate writes (Command) and reads (Query). Event Sourcing and CQRS are frequently grouped and used together to form something bigger. While it is possible to implement CQRS without Event Sourcing, the opposite is not necessarily correct. In order to implement Event Sourcing, an efficient Event Store is needed. But is that also true when combining Event Sourcing and CQRS? And what is an event store in the first place and what features should it implement? This presentation will first discuss what functionalities an event store should offer and then present how Apache Kafka can be used to implement an event store. But is Kafka good enough or do specific event store solutions such as AxonDB or Event Store provide a better solution?
AWS User Group UK: Why your company needs a unified logAlexander Dean
Apache Kafka and Amazon Kinesis are more than just message queues — they can serve as a unified log which you can put at the heart of your business, effectively creating a "digital nervous system" which your company's applications and processes can be re-structured around.
In this talk, Alex will provide an introduction to unified log technology, highlight some killer use cases and also show how Kinesis is being used "in anger" at Snowplow. Alex's talk will draw on his experiences working with event streams over the last two and a half years at Snowplow; it’s also heavily influenced by Jay Kreps’ unified log monograph, and by Alex's recent work penning Unified Log Processing, a Manning book. Alex's talk will show how event streams inside a unified log are an incredibly powerful primitive for building rich event-centric applications, unbundling local transactional silos and creating a single version of truth for a company.
Leveraging services in stream processor apps at Ticketmaster (Derek Cline, Ti...confluent
Is your organization adopting Kafka as their messaging bus but you've found that it will take too long to migrate your existing service-oriented architecture to a log-oriented architecture? Some of the biggest challenges in building a new stream processor can be implementing all the business logic again. It has become increasingly common for companies with high-throughput source streams and change-data-capture logs to want to build systems fast. At Ticketmaster, we have found a solution to the problem by leveraging the business logic in our existing services and calling them from our Java based KafkaStreams processor applications in an efficient manner. In this talk, we will examine the initial challenges we faced in our transition, then we will explore the solutions we built to address the use cases at Ticketmaster. The primary focus will address our workflow around calling services to bring stream processor applications to market fast. We will review our challenges and share tips for success.
Speaker: Matthias J. Sax, Software Engineer, Confluent
KSQL is the Streaming SQL engine for Apache Kafka that allows for continuous data stream processing. While KSQL looks very similar to SQL, it provides quite different semantics. First, KSQL queries can be defined over data streams, not just tables. Second, queries over tables are no snapshot queries, but run forever. And third, time is a core concept in KSQL and data stream processing in general. In this talk, we explore the nature of Streaming SQL and its temporal semantics that apply to both streams and tables. We will explain continuous queries semantics, the relationship between streams and tables, and demystify the temporal nature of KSQL tables. Furthermore, we dig into filter, aggregation, and join operations over stream and tables as well as stream specific operators like windowing. At the end, you will be equipped to query streams and tables using KSQL and understand their temporal relationship to each other.
Samza at LinkedIn: Taking Stream Processing to the Next LevelMartin Kleppmann
Slides from my talk at Berlin Buzzwords, 27 May 2014. Unfortunately Slideshare has screwed up the fonts. See https://speakerdeck.com/ept/samza-at-linkedin-taking-stream-processing-to-the-next-level for a version of the deck with correct fonts.
Stream processing is an essential part of real-time data systems, such as news feeds, live search indexes, real-time analytics, metrics and monitoring. But writing stream processes is still hard, especially when you're dealing with so much data that you have to distribute it across multiple machines. How can you keep the system running smoothly, even when machines fail and bugs occur?
Apache Samza is a new framework for writing scalable stream processing jobs. Like Hadoop and MapReduce for batch processing, it takes care of the hard parts of running your message-processing code on a distributed infrastructure, so that you can concentrate on writing your application using simple APIs. It is in production use at LinkedIn.
This talk will introduce Samza, and show how to use it to solve a range of different problems. Samza has some unique features that make it especially interesting for large deployments, and in this talk we will dig into how they work under the hood. In particular:
• Samza is built to support many different jobs written by different teams. Isolation between jobs ensures that a single badly behaved job doesn't affect other jobs. It is robust by design.
• Samza can handle jobs that require large amounts of state, for example joining multiple streams, augmenting a stream with data from a database, or aggregating data over long time windows. This makes it a very powerful tool for applications.
Asynchronous micro-services and the unified logAlexander Dean
On Friday October 7th 2016 at Crunch Conference in Budapest I gave a talk entitled "Asynchronous micro-services and the unified log".
The unified log enabled by Apache Kafka and Amazon Kinesis has been mostly understood as a better data processing architecture, replacing traditional data warehousing techniques. But the unified log also enables a new way of building transactional software, by enabling asynchronous micro-services. In this talk, I showed how event-driven micro-services designed around Kafka or Kinesis resolve many of the issues associated with traditional monolithic and synchronous micro-service based architectures.
Jay Kreps, Confluent | Kafka Summit SF 2019 Keynote ft. Dev Tagare, Lyft + Pr...confluent
Jay Kreps, Confluent Co-Founder and Co-Creator of Apache Kafka, delivers the keynote presentation at Kafka Summit San Francisco 2019. He explains modern stream processing, real-time databases, KSQL, and Confluent Cloud's newest offering – a fully managed, serverless Kafka. In an effort to bring event streaming to even more developers, Priya Shivakumar announces Kafka made serverless in Confluent Cloud, with $50 free for the first three months.
Recording includes a Q&A session between Jay Kreps and Devendra Tagare, Engineering Manager at Lyft. They discuss the enhanced features Confluent Cloud offers on top of typical Kafka use cases – from mission critical reliability, scaling billions of messages at under 50ms latency, to multicloud data streaming and 24/7 Kafka support."
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017Thomas Weise
https://berlinbuzzwords.de/17/session/batch-streaming-etl-apache-apex
Stream data processing is increasingly required to support business needs for faster actionable insight with growing volume of information from more sources. Apache Apex is a true stream processing framework for low-latency, high-throughput and reliable processing of complex analytics pipelines on clusters. Apex is designed for quick time-to-production, and is used in production by large companies for real-time and batch processing at scale.
This session will use an Apex production use case to walk through the incremental transition from a batch pipeline with hours of latency to an end-to-end streaming architecture with billions of events per day which are processed to deliver real-time analytical reports. The example is representative for many similar extract-transform-load (ETL) use cases with other data sets that can use a common library of building blocks. The transform (or analytics) piece of such pipelines varies in complexity and often involves business logic specific, custom components.
Topics include:
Pipeline functionality from event source through queryable state for real-time insights.
API for application development and development process.
Library of building blocks including connectors for sources and sinks such as Kafka, JMS, Cassandra, HBase, JDBC and how they enable end-to-end exactly-once results.
Stateful processing with event time windowing.
Fault tolerance with exactly-once result semantics, checkpointing, incremental recovery
Scalability and low-latency, high-throughput processing with advanced engine features for auto-scaling, dynamic changes, compute locality.
Recent project development and roadmap.
Following the session attendees will have a high level understanding of Apex and how it can be applied to use cases at their own organizations.
Crossing the Streams: Event Streaming with Kafka Streams
Viktor Gamov, Developer Advocate, Confluent
https://www.meetup.com/Lille-Kafka/events/270888493/
Streams and Tables: Two Sides of the Same Coin (BIRTE 2018)confluent
Presentation given at BIRTE 2018
Twelfth International Workshop on Real-Time Business Intelligence and Analytics
27 August, 2018, Rio de Janeiro
Speakers:
Matthias Sax, Confluent Inc.
Guozhang Wang, Confluent Inc.
Matthias Weidlich, Humboldt-Universität zu Berlin
Johann-Christoph Freytag, Humboldt-Universität zu Berlin
Introducing Tupilak, Snowplow's unified log fabricAlexander Dean
In this talk at Snowplow London Meetup #3 I introduced Tupilak, Snowplow’s unified log fabric. Putting a real-time event pipeline into production has many challenges: we need the pipeline to scale automatically based on event volumes, we need constant monitoring to prevent data loss and minimise end-to-end lag, and we need the ability to upgrade and extend the pipeline with zero downtime. We call software which does all this a “unified log fabric”, to distinguish it from the unified logs (e.g. Kafka and Kinesis) and stream processing frameworks (e.g. Spark Streaming and Kafka Streams) which such a fabric monitors and orchestrates.
As part of incorporating Snowplow’s Kinesis-based event pipeline into our Managed Service, we developed our own unified log fabric, called Tupilak. In this talk, I introduced Tupilak, explaining the core monitoring and scaling functions of Tupilak and showing live real-time pipelines visualised in the Tupilak UI. I dived into the architecture of Tupilak, shared its basic scaling algorithm and also took a look at how Tupilak itself is built on a Snowplow event stream. I also talked about the roadmap for Tupilak, including our plans for introducing lag-based auto-scaling and porting Tupilak to Kubernetes.
More info: https://cnfl.io/cloud-native-experience-for-kafka-in-cloud | Neha Narkhede is co-founder and CTO at Confluent, a company backing the popular Apache Kafka messaging system. Prior to founding Confluent, Neha led streams infrastructure at LinkedIn, where she was responsible for LinkedIn’s streaming infrastructure built on top of Apache Kafka and Apache Samza. She is one of the initial authors of Apache Kafka and a committer and PMC member on the project.
Jun Rao, Confluent | Kafka Summit SF 2019 Keynote ft. Chris Kasten, Walmart Labsconfluent
Jun Rao, Confluent Co-Founder discusses the power of Kafka, why it was created at LinkedIn, and what it's used for at Kafka Summit SF 2019's keynote. Featuring Chris Kasten, VP Walmart Cloud.
Data Stream Processing - Concepts and FrameworksMatthias Niehoff
An overview on various concepts used in data stream processing. Most of them are used for solving problems in the field of time, focussing on processing time compared to event time. The techniques shown include the Dataflow API as it was introduced by Google and the concepts of stream and table duality. But I will also come up with other problems like data lookup and deployment of streaming applications and various strategies on solving these problems.
In the end I will give a brief outline on the implementation status of those strategies in the popular streaming frameworks Apache Spark Streaming, Apache Flink and Kafka Streams.
Detecting Real-Time Financial Fraud with Cloudflow on KubernetesLightbend
Deploying a robust streaming data pipeline can be a daunting task when your company’s financial information is at risk. For starters, how do you ensure proper provisioning of resources? How do you preserve end-to-end application and data consistency? How do you make all of this work in the cloud with Kubernetes and avoid YAML hell? Answer: Cloudflow, a new open-source toolkit for simplifying the development, deployment, and operation of streaming data pipelines.
Using Apache Kafka to Analyze Session Windowsconfluent
Speaker: Michael Noll, Product Manager, Confluent
In this online talk, we’ll introduce the concept of a session window, talk about common use cases, and walk through how Apache Kafka can be used for session-oriented use cases.
Speaker: Joseph Rea, Engineer, Confluent
Joseph will talk about how to visualize topics in Apache Kafka®, the difference between a stream and table in KSQL and his lessons learned on tackling this technical challenge with millions of Kafka messages consumed per second. With such functionalities, users can understand their data easier and in a highly performant and scalable way. This talk covers understanding web workers as they relate to webpack, web socket management, debugging browser performance and the future of the applications that can now be built.
Joseph Rea started engineering with the LAMP stack building custom e-commerce checkouts, ERP systems and enterprise water/sewer billing software. He worked at Yahoo as a front end engineer in the media org before doing Android and iOS development for the video SDK. He also worked at LifeLock to build an application that updated PII on various service sites. He likes turtles. He currently works at Confluent building so much UI. He blogs at https://cnfl.io/blog-joseph-rea.
Speaker: Neil Avery, Technologist, Office of the CTO, Confluent
Stream processing is now at the forefront of many company strategies. Over the last couple of years we have seen streaming use cases explode and now proliferate the landscape of any modern business.
Use cases including digital transformation, IoT, real-time risk, payments microservices and machine learning are all built on the fundamental that they need fast data and they need it at scale.
Apache Kafka® has long been the streaming platform of choice, its origins of being dumb pipes for big data have long since been left behind and now it is the goto-streaming platform of choice.
Stream processing beckons as being the vehicle for driving those streams, and along with it brings a world of real-time semantics surrounding windowing, joining, correctness, elasticity, and accessibility. The ‘current state of stream processing’ walks through the origins of stream processing, applicable use cases and then dives into the challenges currently facing the world of stream processing as it drives the next data revolution.
Neil is a Technologist in the Office of the CTO at Confluent, the company founded by the creators of Apache Kafka. He has over 20 years of expertise of working on distributed computing, messaging and stream processing. He has built or redesigned commercial messaging platforms, distributed caching products as well as developed large scale bespoke systems for tier-1 banks. After a period at ThoughtWorks, he went on to build some of the first distributed risk engines in financial services. In 2008 he launched a startup that specialised in distributed data analytics and visualization. Prior to joining Confluent he was the CTO at a fintech consultancy.
Watch the recording: https://videos.confluent.io/watch/rmU6GHrd4EKFaZrRhdTE3s?.
In this talk, Confluent co-founder and CEO, Jay Kreps will cover the rise of two trends:
1. The rise of Apache Kafka and event streams
2. The rise of the public cloud and cloud-native data systems
... and the problems we need to solve as these two trends come together.
Kafka as an Event Store (Guido Schmutz, Trivadis) Kafka Summit NYC 2019confluent
Event Sourcing and CQRS are two popular patterns for implementing a Microservices architectures. With Event Sourcing we do not store the state of an object, but instead store all the events impacting its state. Then to retrieve an object state, we have to read the different events related to a certain object and apply them one by one. CQRS (Command Query Responsibility Segregation) on the other hand is a way to dissociate writes (Command) and reads (Query). Event Sourcing and CQRS are frequently grouped and used together to form something bigger. While it is possible to implement CQRS without Event Sourcing, the opposite is not necessarily correct. In order to implement Event Sourcing, an efficient Event Store is needed. But is that also true when combining Event Sourcing and CQRS? And what is an event store in the first place and what features should it implement? This presentation will first discuss what functionalities an event store should offer and then present how Apache Kafka can be used to implement an event store. But is Kafka good enough or do specific event store solutions such as AxonDB or Event Store provide a better solution?
AWS User Group UK: Why your company needs a unified logAlexander Dean
Apache Kafka and Amazon Kinesis are more than just message queues — they can serve as a unified log which you can put at the heart of your business, effectively creating a "digital nervous system" which your company's applications and processes can be re-structured around.
In this talk, Alex will provide an introduction to unified log technology, highlight some killer use cases and also show how Kinesis is being used "in anger" at Snowplow. Alex's talk will draw on his experiences working with event streams over the last two and a half years at Snowplow; it’s also heavily influenced by Jay Kreps’ unified log monograph, and by Alex's recent work penning Unified Log Processing, a Manning book. Alex's talk will show how event streams inside a unified log are an incredibly powerful primitive for building rich event-centric applications, unbundling local transactional silos and creating a single version of truth for a company.
Leveraging services in stream processor apps at Ticketmaster (Derek Cline, Ti...confluent
Is your organization adopting Kafka as their messaging bus but you've found that it will take too long to migrate your existing service-oriented architecture to a log-oriented architecture? Some of the biggest challenges in building a new stream processor can be implementing all the business logic again. It has become increasingly common for companies with high-throughput source streams and change-data-capture logs to want to build systems fast. At Ticketmaster, we have found a solution to the problem by leveraging the business logic in our existing services and calling them from our Java based KafkaStreams processor applications in an efficient manner. In this talk, we will examine the initial challenges we faced in our transition, then we will explore the solutions we built to address the use cases at Ticketmaster. The primary focus will address our workflow around calling services to bring stream processor applications to market fast. We will review our challenges and share tips for success.
Speaker: Matthias J. Sax, Software Engineer, Confluent
KSQL is the Streaming SQL engine for Apache Kafka that allows for continuous data stream processing. While KSQL looks very similar to SQL, it provides quite different semantics. First, KSQL queries can be defined over data streams, not just tables. Second, queries over tables are no snapshot queries, but run forever. And third, time is a core concept in KSQL and data stream processing in general. In this talk, we explore the nature of Streaming SQL and its temporal semantics that apply to both streams and tables. We will explain continuous queries semantics, the relationship between streams and tables, and demystify the temporal nature of KSQL tables. Furthermore, we dig into filter, aggregation, and join operations over stream and tables as well as stream specific operators like windowing. At the end, you will be equipped to query streams and tables using KSQL and understand their temporal relationship to each other.
Samza at LinkedIn: Taking Stream Processing to the Next LevelMartin Kleppmann
Slides from my talk at Berlin Buzzwords, 27 May 2014. Unfortunately Slideshare has screwed up the fonts. See https://speakerdeck.com/ept/samza-at-linkedin-taking-stream-processing-to-the-next-level for a version of the deck with correct fonts.
Stream processing is an essential part of real-time data systems, such as news feeds, live search indexes, real-time analytics, metrics and monitoring. But writing stream processes is still hard, especially when you're dealing with so much data that you have to distribute it across multiple machines. How can you keep the system running smoothly, even when machines fail and bugs occur?
Apache Samza is a new framework for writing scalable stream processing jobs. Like Hadoop and MapReduce for batch processing, it takes care of the hard parts of running your message-processing code on a distributed infrastructure, so that you can concentrate on writing your application using simple APIs. It is in production use at LinkedIn.
This talk will introduce Samza, and show how to use it to solve a range of different problems. Samza has some unique features that make it especially interesting for large deployments, and in this talk we will dig into how they work under the hood. In particular:
• Samza is built to support many different jobs written by different teams. Isolation between jobs ensures that a single badly behaved job doesn't affect other jobs. It is robust by design.
• Samza can handle jobs that require large amounts of state, for example joining multiple streams, augmenting a stream with data from a database, or aggregating data over long time windows. This makes it a very powerful tool for applications.
Building Real-time Data Products at LinkedIn with Apache SamzaTrieu Nguyen
The world is going real-time. MapReduce, SQL-on-Hadoop and similar batch processing tools are fine for analyzing and processing data after the fact — but sometimes you need to process data continuously as it comes in, and react to it within a few seconds or less. How do you do that at Hadoop scale?
Apache Samza is an open source stream processing framework designed to solve these kinds of problems. It is built upon YARN/Hadoop 2.0 and Apache Kafka. You can think of Samza as a real-time, continuously running version of MapReduce.
Samza has some unique features that make it powerful. It provides high performance for stateful processing jobs, including aggregation and joins between many input streams. It is designed to support an ecosystem of many different jobs written by different teams, and it isolates them from each other, so that one badly behaved job can’t affect the others.
Cassandra's Sweet Spot - an introduction to Apache CassandraDave Gardner
Slides from my NoSQL Exchange 2011 talk introducing Apache Cassandra. This talk explained the fundamental concepts of Cassandra and then demonstrated how to build a simple ad-targeting application using PHP, with a focus on data modeling.
Video of talk: http://skillsmatter.com/podcast/home/cassandra/js-2880
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. Storing such huge event streams into HDFS or a NoSQL datastore is feasible and not such a challenge anymore. But if you want to be able to react fast, with minimal latency, you can not afford to first store the data and doing the analysis/analytics later. You have to be able to include part of your analytics right after you consume the event streams. Products for doing event processing, such as Oracle Event Processing or Esper, are avaialble for quite a long time and also used to be called Complex Event Processing (CEP). In the last 3 years, another family of products appeared, mostly out of the Big Data Technology space, called Stream Processing or Streaming Analytics. These are mostly open source products/frameworks such as Apache Storm, Spark Streaming, Apache Samza as well as supporting infrastructures such as Apache Kafka. In this talk I will present the theoretical foundations for Event and Stream Processing and present what differences you might find between the more traditional CEP and the more modern Stream Processing solutions and show that a combination will bring the most value.
Agenda:
• Background for the development: From commodity
to experience
• Indirect use of experiences: Experience as value
adding
• Experience process
• Selling pure experiences: Using the experience realm
model
• How to develop experiences
• Creating the experience settings
Customer Event Hub - the modern Customer 360° viewGuido Schmutz
Today, companies are using various channels to communicate with their customers. As a consequence, a lot of data is created, more and more also outside of the traditional IT infrastructure of an enterprise. This data often does not have a common format and they are continuously created with ever increasing volume. With Internet of Things (IoT) and their sensors, the volume as well as the velocity of data just gets more extreme.
To achieve a complete and consistent view of a customer, all these customer-related information has to be included in a 360 degree view in a real-time or near-real-time fashion. By that, the Customer Hub will become the Customer Event Hub. It constantly shows the actual view of a customer over all his interaction channels and provides an enterprise the basis for a substantial and effective customer relation.
In this presentation the value of such a platform is shown and how it can be implemented.
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Impetus Technologies
Impetus webcast ‘Real-time Streaming Analytics: Business Value, Use Cases and Architectural Considerations’ available at http://bit.ly/1i6OrwR
The webinar talks about-
• How business value is preserved and enhanced using Real-time Streaming Analytics with numerous use-cases in different industry verticals
• Technical considerations for IT leaders and implementation teams looking to integrate Real-time Streaming Analytics into enterprise architecture roadmap
• Recommendations for making Real-time Streaming Analytics – real – in your enterprise
• Impetus StreamAnalytix – an enterprise ready platform for Real-time Streaming Analytics
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Helena Edelson
Regardless of the meaning we are searching for over our vast amounts of data, whether we are in science, finance, technology, energy, health care…, we all share the same problems that must be solved: How do we achieve that? What technologies best support the requirements? This talk is about how to leverage fast access to historical data with real time streaming data for predictive modeling for lambda architecture with Spark Streaming, Kafka, Cassandra, Akka and Scala. Efficient Stream Computation, Composable Data Pipelines, Data Locality, Cassandra data model and low latency, Kafka producers and HTTP endpoints as akka actors...
Slides of QCon London 2016 talk. How stream processing is used within the Uber's Marketplace system to solve a wide range problems, including but not limited to realtime indexing and querying of geospatial time series, aggregation and computing of streaming data, and extracting patterns from data streams. In addition, it will touch upon various TimeSeries analysis and predictions. The underlying systems utilize many open source technologies such as Apache Kafka, Samza and Spark streaming.
Cassandra Data Modeling - Practical Considerations @ Netflixnkorla1share
Cassandra community has consistently requested that we cover C* schema design concepts. This presentation goes in depth on the following topics:
- Schema design
- Best Practices
- Capacity Planning
- Real World Examples
No REST - Architecting Real-time Bulk Async APIsC4Media
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2eapFFq.
Michael Uzquiano talks about how to scale API to accept many items. He examines how to evolve the Evolution of ReST over HTTP to transactional, asynchronous bulk operations. He covers job descriptors, workers, the job queue and scaling workers across an API cluster elastically. He also talks about polling methods for job completion including HTTP long polling and WebSockets. Filmed at qconnewyork.com.
Michael Uzquiano is Founder and CTO of CloudCMS and Alpaca.js Committer.
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2M35wCo.
Jamund Ferguson talks about some of the challenges PayPal faced with their Node.js application servers and why they think the JAMStack approach improves performance for both their apps and their developers. He includes discussions around performance, security, development experience and deploy speed. Filmed at qconlondon.com.
Jamund Ferguson is a JavaScript architect at PayPal. He loves to look at how following patterns consistently can prevent bugs in applications. He’s previously contributed to the ESLint and StandardJS open-source projects and has as of late become a fan of FlowType and TypeScript.
Streaming Auto-scaling in Google Cloud DataflowC4Media
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1Z2JXhs.
Manuel Fahndrich describes how they tackled one particular resource allocation aspect of Google Cloud Dataflow pipelines, namely, horizontal scaling of worker pools as a function of pipeline input rate. Managing the redistribution of key ranges across new pool sizes and the associated persistent data storage was particularly challenging. Filmed at qconlondon.com.
Manuel Fahndrich earned his Ph.D. in C.S. from UC Berkeley in 1999. He spent the next 15 years as a Research Scientist at Microsoft, working on static and dynamic verification tools for object-oriented programs and system software. After joining Google in 2014 he has been working on data-parallel infrastructure, in particular auto-scaling for batch and streaming pipelines.
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2QbJBsd.
Trisha Gee talks about Java 8, wondering whether we should move to a later version, which one to choose, what sorts of issues we might run into if we do choose to upgrade, and how the support and license changes that came in with Java 11 might impact us. Filmed at qconlondon.com.
Trisha Gee has developed Java applications for a range of industries, including finance, manufacturing, software and non-profit. She has expertise in Java high performance systems, is passionate about enabling developer productivity, and dabbles with Open Source development. As a Developer Advocate for JetBrains, she gets to share all the interesting things she’s constantly discovering.
Opportunities and Pitfalls of Event-Driven UtopiaC4Media
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2B24UoY.
Bernd Rücker goes over the concepts, the advantages, and the pitfalls of event-driven utopia. He shares real-life stories or points to source code examples. Filmed at qconnewyork.com.
Bernd Rücker is co-founder and developer advocate at Camunda. Previously, he has helped automating highly scalable core workflows at global companies including T-Mobile, Lufthansa, Zalando. He is currently focused on new workflow automation paradigms that fit into modern architectures around distributed systems, microservices, domain-driven design, event-driven architecture and reactive systems.
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2UkZRIC.
Monal Daxini presents a blueprint for streaming data architectures and a review of desirable features of a streaming engine. He also talks about streaming application patterns and anti-patterns, and use cases and concrete examples using Apache Flink. Filmed at qconsf.com.
Monal Daxini is the Tech Lead for Stream Processing platform for business insights at Netflix. He helped build the petabyte scale Keystone pipeline running on the Flink powered platform. He introduced Flink to Netflix, and also helped define the vision for this platform. He has over 17 years of experience building scalable distributed systems.
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1DXFg0h.
Ben Christensen summarizes why the Rx programming model was chosen and demonstrates how it is applied to a variety of use cases. Filmed at qconsf.com.
Ben Christensen is a software engineer on the Netflix Edge Services Platform team responsible for fault tolerance, performance, architecture and scale while enabling millions of customers to access the Netflix experience across more than 1,000 different device types.
Building Codealike: a journey into the developers analytics worldOren Eini
Codealike plugins in Visual Studio, Eclipse and Chrome, track developers while they code and perform analytic calculations at the millisecond level. For such write heavy workloads and using RavenDB as the main and only database was not without challenge. In this talk, we will reveal how we built and scaled such a solution, how we were able to improve performance with Voron and glance at our own mistakes and architectural choices down the line.
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2s9T3Vl.
Colin Eberhardt looks at some of the internals of WebAssembly, explores how it works “under the hood”, and looks at how to create a (simple) compiler that targets this runtime. Filmed at qconsf.com.
Colin Eberhardt is the Technology Director at Scott Logic, a UK-based software consultancy where they create complex application for their financial services clients. He is an avid technology enthusiast, spending his evenings contributing to open source projects, writing blog posts and learning as much as he can.
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Flink Forward
Flink Forward San Francisco 2022.
Probably everyone who has written stateful Apache Flink applications has used one of the fault-tolerant keyed state primitives ValueState, ListState, and MapState. With RocksDB, however, retrieving and updating items comes at an increased cost that you should be aware of. Sometimes, these may not be avoidable with the current API, e.g., for efficient event-time stream-sorting or streaming joins where you need to iterate one or two buffered streams in the right order. With FLIP-220, we are introducing a new state primitive: BinarySortedMultiMapState. This new form of state offers you to (a) efficiently store lists of values for a user-provided key, and (b) iterate keyed state in a well-defined sort order. Both features can be backed efficiently by RocksDB with a 2x performance improvement over the current workarounds. This talk will go into the details of the new API and its implementation, present how to use it in your application, and talk about the process of getting it into Flink.
by
Nico Kruber
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/39NIjLV.
Akhilesh Gupta does a technical deep-dive into how Linkedin uses the Play/Akka Framework and a scalable distributed system to enable live interactions like likes/comments at massive scale at extremely low costs across multiple data centers. Filmed at qconlondon.com.
Akhilesh Gupta is the technical lead for LinkedIn's Real-time delivery infrastructure and LinkedIn Messaging. He has been working on the revamp of LinkedIn’s offerings to instant, real-time experiences. Before this, he was the head of engineering for the Ride Experience program at Uber Technologies in San Francisco.
Next Generation Client APIs in Envoy MobileC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2x0Fav8.
Jose Nino guides the audience through the journey of Mobile APIs at Lyft. He focuses on how the team has reaped the benefits of API generation to experiment with the network transport layer. He also discusses recent developments the team has made with Envoy Mobile and the roadmap ahead. Filmed at qconlondon.com.
Jose Nino works as a Software Engineer at Lyft.
Software Teams and Teamwork Trends Report Q1 2020C4Media
How do we cope with an environment that has been radically disrupted, where people are suddenly thrust into remote work in a chaotic state? What are the emerging good practices and new ideas that are shaping the way in which software development teams work? What can we do to make the workplace a more secure and diverse one while increasing the productivity of our teams? This report aims to assist technical leaders in making mid- to long-term decisions that will have a positive impact on their organisations and teams and help individual contributors find the practices, approaches, tools, techniques, and frameworks that can help them get a better experience at work - irrespective of where they are working from.
Understand the Trade-offs Using Compilers for Java ApplicationsC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2QCmmJ0.
Mark Stoodley examines some of the strengths and weaknesses of the different Java compilation technologies, if one was to apply them in isolation. Stoodley discusses how production JVMs are assembling a combination of these tools that work together to provide excellent performance across the large spectrum of applications written in Java and JVM based languages. Filmed at qconsf.com.
Mark Stoodley joined IBM Canada to build Java JIT compilers for production use and led the team that delivered AOT compilation in the IBM SDK for Java 6. He spent the last five years leading the effort to open source nearly 4.3 million lines of source code from the IBM J9 Java Virtual Machine to create the two open source projects Eclipse OMR and Eclipse OpenJ9, and now co-leads both projects.
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2y2yPiS.
Colin McCabe talks about the ongoing effort to replace the use of Zookeeper in Kafka: why they want to do it and how it will work. He discusses the limitations they have found and how Kafka benefits both in terms of stability and scalability by bringing consensus in house. He talks about their progress, what work is remaining, and how contributors can help. Filmed at qconsf.com.
Colin McCabe is a Kafka committer at Confluent, working on the scalability and extensibility of Kafka. Previously, he worked on the Hadoop Distributed Filesystem and the Ceph Filesystem.
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2SXXXiD.
Katharina Probst talks about what it means to act like an owner and why teams need ownership to be high-performing. When team members, regardless of whether they have a formal leadership role or not, act like owners, magical things can happen. She shares ideas that we can apply to our own work, and talks about how to recognize when we don’t live up to our own expectations of acting like an owner. Filmed at qconsf.com.
Katharina Probst is a Senior Engineering Leader, Kubernetes & SaaS at Google. Before this, she was leading engineering teams at Netflix, being responsible for the Netflix API, which helps bring Netflix streaming to millions of people around the world. Prior to joining Netflix, she was in the cloud computing team at Google, where she saw cloud computing from the provider side.
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2T04Lw4.
Sergey Kuksenko talks about the performance benefits inline types bring to Java and how to exploit them. Inline/value types are the key part of experimental project Valhalla, which should bring new abilities to the Java language. Filmed at qconsf.com.
Sergey Kuksenko is a Java Performance Engineer at Oracle working on a variety of Java and JVM performance enhancements. He started working as Java Engineer in 1996 and as Java Performance Engineer in 2005. He has had a passion for exploring how Java works on modern hardware.
Do you need service meshes in your tech stack?
This on-line guide aims to answer pertinent questions for software architects and technical leaders, such as: what is a service mesh?, do I need a service mesh?, how do I evaluate the different service mesh offerings? In software architecture, a service mesh is a dedicated infrastructure layer for facilitating service-to-service communications between microservices, often using a sidecar proxy.
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2UgQ3BU.
Christie Wilson describes what to expect from CI/CD in 2019, and how Tekton is helping bring that to as many tools as possible, such as Jenkins X and Prow. Wilson talks about Tekton itself and performs a live demo that shows how cloud native CI/CD can help debug, surface and fix mistakes faster. Filmed at qconsf.com.
Christie Wilson is a software engineer at Google, currently leading the Tekton project. Over the past decade, she has worked in the mobile, financial and video game industries. Prior to working at Google she led a team of software developers to build load testing tools for AAA video game titles, and founded the Vancouver chapter of PyLadies.
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2S7lDiS.
Sasha Rosenbaum shows how a CI/CD pipeline for Machine Learning can greatly improve both productivity and reliability. Filmed at qconsf.com.
Sasha Rosenbaum is a Program Manager on the Azure DevOps engineering team, focused on improving the alignment of the product with open source software. She is a co-organizer of the DevOps Days Chicago and the DeliveryConf conferences, and recently published a book on Serverless computing in Azure with .NET.
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/36epVKg.
Todd Montgomery discusses the techniques and lessons learned from implementing Aeron Cluster. His focus is on how Raft can be implemented on Aeron, minimizing the network round trip overhead, and comparing single process to a fully distributed cluster. Filmed at qconsf.com.
Todd Montgomery is a networking hacker who has researched, designed, and built numerous protocols, messaging-oriented middleware systems, and real-time data systems, done research for NASA, contributed to the IETF and IEEE, and co-founded two startups. He currently works as an independent consultant and is active in several open source projects.
Architectures That Scale Deep - Regaining Control in Deep SystemsC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2FWc5Sk.
Ben Sigelman talks about "Deep Systems", their common properties and re-introduces the fundamentals of control theory from the 1960s, including the original conceptualizations of Observability & Controllability. He uses examples from Google & other companies to illustrate how deep systems have damaged people's ability to observe software, and what needs to be done in order to regain control. Filmed at qconsf.com.
Ben Sigelman is a co-founder and the CEO at LightStep, a co-creator of Dapper (Google’s distributed tracing system), and co-creator of the OpenTracing and OpenTelemetry projects (both part of the CNCF). His work and interests gravitate towards observability, especially where microservices, high transaction volumes, and large engineering organizations are involved.
ML in the Browser: Interactive Experiences with Tensorflow.jsC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/39SddUL.
Victor Dibia provides a friendly introduction to machine learning, covers concrete steps on how front-end developers can create their own ML models and deploy them as part of web applications. He discusses his experience building Handtrack.js - a library for prototyping real time hand tracking interactions in the browser. Filmed at qconsf.com.
Victor Dibia is a Research Engineer with Cloudera’s Fast Forward Labs. Prior to this, he was a Research Staff Member at the IBM TJ Watson Research Center, New York. His research interests are at the intersection of human computer interaction, computational social science, and applied AI.
User & Device Identity for Microservices @ Netflix ScaleC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2S9tOgy.
Satyajit Thadeshwar provides useful insights on how Netflix implemented a secure, token-agnostic, identity solution that works with services operating at a massive scale. He shares some of the lessons learned from this process, both from architectural diagrams and code. Filmed at qconsf.com.
Satyajit Thadeshwar is an engineer on the Product Edge Access Services team at Netflix, where he works on some of the most critical services focusing on user and device authentication. He has more than a decade of experience building fault-tolerant and highly scalable, distributed systems.
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2Ezs08q.
Justin Ryan talks about Netflix’ scalability issues and some of the ways they addressed it. He shares successes they’ve had from unintuitively partitioning computation into multiple services to get better runtime characteristics. He introduces us to useful probabilistic data structures, innovative bi-directional data passing, open-source projects available from Netflix that make this all possible. Filmed at qconsf.com.
Justin Ryan is Playback Edge Engineering at Netflix. He works on some of the most critical services at Netflix, specifically focusing on user and device authentication. Years of building developer tools has also given him a healthy set of opinions on developer productivity.
Make Your Electron App Feel at Home EverywhereC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2Z4ZJjn.
Kilian Valkhof discusses the process of making an Electron app feel at home on all three platforms: Windows, MacOS and Linux, making devs aware of the pitfalls and how to avoid them. Filmed at qconsf.com.
Kilian Valkhof is a Front-end Developer & User-experience Designer at Firstversionist. He writes about various topics, from design to machine learning, on his personal website, kilianvalkhof.com and is a frequent contributer to open source software. He is part of the Electron governance team that oversees the development of the Electron framework.
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/344PnB1.
Steve Klabnik goes over the deep details of how async/await works in Rust, covering concepts like coroutines, generators, stack-less vs stack-ful, "pinning", and more. Filmed at qconsf.com.
Steve Klabnik is on the core team of Rust, leads the documentation team, and is an author of "The Rust Programming Language." He is a frequent speaker at conferences and is a prolific open source contributor, previously working on projects such as Ruby and Ruby on Rails.
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2OUz6dt.
Chris Riccomini talks about the current state-of-the-art in data pipelines and data warehousing, and shares some of the solutions to current problems dealing with data streaming and warehousing. Filmed at qconsf.com.
Chris Riccomini works as a Software Engineer at WePay.
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2rm4hFD.
Yevgeniy Brikman talks about how to write automated tests for infrastructure code, including the code written for use with tools such as Terraform, Docker, Packer, and Kubernetes. Topics covered include: unit tests, integration tests, end-to-end tests, dependency injection, test parallelism, retries and error handling, static analysis, property testing and CI / CD for infrastructure code. Filmed at qconsf.com.
Yevgeniy Brikman is the co-founder of Gruntwork, a company that provides DevOps as a Service. He is the author of two books published by O'Reilly Media: Hello, Startup and Terraform: Up & Running. Previously, he worked as a software engineer at LinkedIn, TripAdvisor, Cisco Systems, and Thomson Financial.
Navigating Complexity: High-performance Delivery and Discovery TeamsC4Media
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2NzKMjY.
Conal Scanlon talks about why traditional tactics - ones that have been around since the 19th century - don't always help build a better product, and explores what characteristics are common to both delivery and discovery teams. He talks about his experiences in fledgling startups, successful companies, and larger enterprises, and uses these examples to introduce dual-track development. Filmed at qconnewyork.com.
Conal Scanlon is a Product Manager at Handshake, helping B2B distributors and manufacturers get their products to every shelf, everywhere. He has several years of experience in software development (mainly Ruby on Rails) and Agile leadership roles in private sector and government contracting, as well a brief foray in management consulting.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
2. Watch the video with slide
synchronization on InfoQ.com!
http://www.infoq.com/presentations
/samza-linkedin
InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
3. Presented at QCon San Francisco
www.qconsf.com
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
67. YARN
You: I want to run command X on two machines with
512M of memory.
68. YARN
You: I want to run command X on two machines with
512M of memory.
YARN: Cool, where’s your code?
69. YARN
You: I want to run command X on two machines with
512M of memory.
YARN: Cool, where’s your code?
You: http://some-host/jobs/download/my.tgz
70. YARN
You: I want to run command X on two machines with
512M of memory.
YARN: Cool, where’s your code?
You: http://some-host/jobs/download/my.tgz
YARN: I’ve run your command on grid-node-2 and
grid-node-7.
134. Let’s be Friends!
• We are incubating, and you can help!
• Get up and running in 5 minutes
http://bit.ly/hello-samza
• Grab some newbie JIRAs
http://bit.ly/samza_newbie_issues
135. Watch the video with slide synchronization on
InfoQ.com!
http://www.infoq.com/presentations/samzalinkedin