06/18/2014 - Billing & Payments Engineering Meetup @ Netflix
For this Meetup, we have invited speakers from several tech companies to give a series of lightening talks on challenges related to billing & payments systems.
This event is for engineers who are interested in learning more about billing & payments systems. No previous experience with this kind of system is required to attend.
Presenters:
- Mathieu Chauvin - Engineering Manager for Payments @ Netflix
- Taylor Wicksell - Sr. Software Engineer for Billing @ Netflix
- Jean-Denis Greze - Engineer @ Dropbox
- Alec Holmes - Software Engineer @ Square
- Emmanuel Cron - Software Engineer III, Google Wallet @ Google
- Paul Huang - Engineering Manager @ Survey Monkey
- Anthony Zacharakis - Lead Engineer @ Lumos Labs
- Shengyong Li / Feifeng Yang - Dir. Engineering Commerce / Tech Lead Payment @ Electronic Arts
You’ve decided to develop in Azure and need to make a decision on the messaging technology. Storage Queues, Service Bus, Event Grid, Event Hubs, etc. Which technology should you use? How do you pick the right one if they all deal with messages? This session will help you answer these questions.
Why @Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for ...SolarWinds Loggly
Agenda for this Presentation
• The challenges of Log Management at scale
• Overview of Loggly’s processing pipeline
• Alternative technologies considered
• Why we love Apache Kafka
• How Kafka has added flexibility to our pipeline

The Challenges of Log Management at Scale
• Big data
– >750 billion events logged to date
– Sustained bursts of 100,000+ events per second
– Data space measured in petabytes
• Need for high fault tolerance
• Near real-time indexing requirements
• Time-series index management
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly SolarWinds Loggly
April 2014 update to this presentation: Loggly removed Storm from its architecture. Details here: https://www.loggly.com/blog/what-we-learned-about-scaling-with-apache-storm/
This is a technical architect's case study of how Loggly has employed the latest social-media-scale technologies as the backbone ingestion processing for our multi-tenant, geo-distributed, and real-time log management system. Given by Jim Nisbet and Philip O'Toole, this presentation describes design details of how we built a second-generation system fully leveraging AWS services including Amazon Route 53 DNS with heartbeat and latency-based routing, multi-region VPCs, Elastic Load Balancing, Amazon Relational Database Service, and a number of pro-active and re-active approaches to scaling computational and indexing capacity.
The talk includes lessons learned in our first generation release, validated by thousands of customers; speed bumps and the mistakes we made along the way; various data models and architectures previously considered; and success at scale: speeds, feeds, and an unmeltable log processing engine.
The Problem is Data: Gwen Shapira, Confluent, Serverless NYC 2018iguazio
Real-world architectures are different from code samples and small examples, and as we build more complex and mature Serverless architectures, we often encounter unexpected challenges.
This talk will start by discussing the challenges involved in building data processing architectures using stateless infrastructures. We'll review patterns from event-driven architectures, see how they apply to Serverless architectures and propose practical solutions to common pain-points.
You’ve decided to develop in Azure and need to make a decision on the messaging technology. Storage Queues, Service Bus, Event Grid, Event Hubs, etc. Which technology should you use? How do you pick the right one if they all deal with messages? This session will help you answer these questions.
Why @Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for ...SolarWinds Loggly
Agenda for this Presentation
• The challenges of Log Management at scale
• Overview of Loggly’s processing pipeline
• Alternative technologies considered
• Why we love Apache Kafka
• How Kafka has added flexibility to our pipeline

The Challenges of Log Management at Scale
• Big data
– >750 billion events logged to date
– Sustained bursts of 100,000+ events per second
– Data space measured in petabytes
• Need for high fault tolerance
• Near real-time indexing requirements
• Time-series index management
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly SolarWinds Loggly
April 2014 update to this presentation: Loggly removed Storm from its architecture. Details here: https://www.loggly.com/blog/what-we-learned-about-scaling-with-apache-storm/
This is a technical architect's case study of how Loggly has employed the latest social-media-scale technologies as the backbone ingestion processing for our multi-tenant, geo-distributed, and real-time log management system. Given by Jim Nisbet and Philip O'Toole, this presentation describes design details of how we built a second-generation system fully leveraging AWS services including Amazon Route 53 DNS with heartbeat and latency-based routing, multi-region VPCs, Elastic Load Balancing, Amazon Relational Database Service, and a number of pro-active and re-active approaches to scaling computational and indexing capacity.
The talk includes lessons learned in our first generation release, validated by thousands of customers; speed bumps and the mistakes we made along the way; various data models and architectures previously considered; and success at scale: speeds, feeds, and an unmeltable log processing engine.
The Problem is Data: Gwen Shapira, Confluent, Serverless NYC 2018iguazio
Real-world architectures are different from code samples and small examples, and as we build more complex and mature Serverless architectures, we often encounter unexpected challenges.
This talk will start by discussing the challenges involved in building data processing architectures using stateless infrastructures. We'll review patterns from event-driven architectures, see how they apply to Serverless architectures and propose practical solutions to common pain-points.
URP? Excuse You! The Three Metrics You Have to Know confluent
(Todd Palino, LinkedIn) Kafka Summit SF 2018
What do you really know about how to monitor a Kafka cluster for problems? Is your most reliable monitoring your users telling you there’s something broken? Are you capturing more metrics than the actual data being produced? Sure, we all know how to monitor disk and network, but when it comes to the state of the brokers, many of us are still unsure of which metrics we should be watching, and what their patterns mean for the state of the cluster. Kafka has hundreds of measurements, from the high-level numbers that are often meaningless to the per-partition metrics that stack up by the thousands as our data grows.
We will thoroughly explore three key monitoring concepts in the broker, that will leave you an expert in identifying problems with the least amount of pain:
-Under-replicated Partitions: The mother of all metrics
-Request Latencies: Why your users complain
-Thread pool utilization: How could 80% be a problem?
We will also discuss the necessity of availability monitoring and how to use it to get a true picture of what your users see, before they come beating down your door!
Building an Event-oriented Data Platform with Kafka, Eric Sammer confluent
While we frequently talk about how to build interesting products on top of machine and event data, the reality is that collecting, organizing, providing access to, and managing this data is where most people get stuck. Many organizations understand the use cases around their data – fraud detection, quality of service and technical operations, user behavior analysis, for example – but are not necessarily data infrastructure experts. In this session, we’ll follow the flow of data through an end to end system built to handle tens of terabytes an hour of event-oriented data, providing real time streaming, in-memory, SQL, and batch access to this data. We’ll go into detail on how open source systems such as Hadoop, Kafka, Solr, and Impala/Hive are actually stitched together; describe how and where to perform data transformation and aggregation; provide a simple and pragmatic way of managing event metadata; and talk about how applications built on top of this platform get access to data and extend its functionality.
Attendees will leave this session knowing not just which open source projects go into a system such as this, but how they work together, what tradeoffs and decisions need to be addressed, and how to present a single general purpose data platform to multiple applications. This session should be attended by data infrastructure engineers and architects planning, building, or maintaining similar systems.
Engineering Leader opportunity @ Netflix - Playback Data SystemsPhilip Fisher-Ogden
Across the globe, 75M Netflix members love watching 125M hours per day of TV shows and movies. They love the ease of starting on one device and resuming on another, and the Playback Data Systems team makes that happen. We’re looking for a senior engineering manager to lead this high-impact team at Netflix.
Attributions for images:
https://www.flickr.com/photos/theholyllama/5738164504/ and https://www.flickr.com/photos/brewbooks/7780990192/, no changes made, https://creativecommons.org/licenses/by-sa/2.0/
https://www.flickr.com/photos/crschmidt/2956721498/, no changes made, https://creativecommons.org/licenses/by/2.0/
Kafka makes so many things easier to do, from managing metrics to processing streams of data. Yet it seems that so many things we have done to this point in configuring and managing it have been object studies in how to make our lives, as the plumbers who keep the data flowing, more difficult than they have to be. What are some of our favorites?
* Kafka without access controls
* Multitenant clusters with no capacity controls
* Worrying about message schemas
* MirrorMaker inefficiencies
* Hope and pray log compaction
* Configurations as shared secrets
* One-way upgrades
We’ve made a lot of progress over the last few years improving the situation, in part by focusing some of this incredibly talented community towards operational concerns. We’ll talk about the big mistakes you can avoid when setting up multi-tenant Kafka, and some that you still can’t. And we will talk about how to continue down the path of marrying the hot, new features with operational stability so we can all continue to come back here every year to talk about it.
Beaming flink to the cloud @ netflix ff 2016-monal-daxiniMonal Daxini
Netflix is a data driven company and we process over 700 billion streaming events per day with at-least once processing semantics in the cloud. To enable extracting intelligence from this unbounded stream easily we are building Stream Processing as a Service (SPaaS) infrastructure so that the user can focus on extracting value and not have to worry about boilerplate infrastructure and scale.
We will share our experience in building a scalable SPaaS using Flink, Apache Beam and Kafka as the foundation layer to process over 1.3 PB of event data without service disruption.
Netflix viewing data architecture evolution - EBJUG Nov 2014Philip Fisher-Ogden
Netflix's architecture for viewing data has evolved as streaming usage has grown. Each generation was designed for the next order of magnitude, and was informed by learnings from the previous. From SQL to NoSQL, from data center to cloud, from proprietary to open source, look inside to learn how this system has evolved. (slides from a talk given at the East Bay Java Users Group MeetUp in Nov 2014)
Building Information Systems using Event Modeling (Bobby Calderwood, Evident ...confluent
"Event Modeling is a fairly new information system modeling discipline created by Adam Dymitruk that is heavily influenced by CQRS and Event Sourcing. Its lineage follows from Event Storming, Design Thinking, and other modeling practices from the Agile and Domain-Driven Design communities. The methodology emphasizes simplicity (there are only four model ingredients) and inclusion of non-developer participants.
Like other modeling disciplines, Event Modeling is sufficiently general to enable collaborative learning and knowledge exchange among UI/UX designers, software engineers and architects, and business domain experts. But it's also sufficiently expressive and specific to be directly actionable by the implementors of the information system described by the model.
During this talk, we'll:
* Build an Event Model of a simple information system, including wire-framing the UI/UX experience
* Explore how to proceed from model to implementation using Kafka, its Streams and Connect APIs, and KSQL
* Jump-start the implementation by generating code directly from the Event Model
* Track and measure the work of implementation by generating tasks directly from the Event Model"
Scalable and Reliable Logging at PinterestKrishna Gade
At Pinterest, hundreds of services and third-party tools that are implemented in various programming languages generate billions of events every day. To achieve scalable and reliable low latency logging, there are several challenges: (1) uploading logs that are generated in various formats from tens of thousands of hosts to Kafka in a timely manner; (2) running Kafka reliably on Amazon Web Services where the virtual instances are less reliable than on-premises hardware; (3) moving tens of terabytes data per day from Kafka to cloud storage reliably and efficiently, and guaranteeing exact one time persistence per message.
In this talk, we will present Pinterest’s logging pipeline, and share our experience addressing these challenges. We will dive deep into the three components we developed: data uploading from service hosts to Kafka, data transportation from Kafka to S3, and data sanitization. We will also share our experience in operating Kafka at scale in the cloud.
In this talk, Confluent co-founder and CEO, Jay Kreps will cover the rise of two trends:
1. The rise of Apache Kafka and event streams
2. The rise of the public cloud and cloud-native data systems
... and the problems we need to solve as these two trends come together.
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016Monal Daxini
Keystone processes over 700 billion events per day (1 peta byte) with at-least once processing semantics in the cloud. We will explore in detail how we leverage Kafka, Samza, Docker, and Linux at scale to implement a multi-tenant pipeline in AWS cloud within a year. We will also share our plans on offering a Stream Processing as a Service for all of Netflix use.
Data Streaming with Apache Kafka & MongoDBconfluent
Explore the use-cases and architecture for Apache Kafka, and how it integrates with MongoDB to build sophisticated data-driven applications that exploit new sources of data.
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...confluent
Apache Kafka is now nearly ubiquitous in modern data pipelines and use cases. While the Kafka development model is elegantly simple, operating Kafka clusters in production environments is a challenge. It’s hard to troubleshoot misbehaving Kafka clusters, especially when there are potentially hundreds or thousands of topics, producers and consumers and billions of messages.
The root cause of why real-time applications is lag may be due to an application problem – like poor data partitioning or load imbalance – or due to a Kafka problem – like resource exhaustion or suboptimal configuration. Therefore getting the best performance, predictability, and reliability for Kafka-based applications can be difficult. In the end, the operation of your Kafka powered analytics pipelines could themselves benefit from machine learning (ML).
Presentation "From Local to Global" from Tobias Heintz at the AWS E-Business Web Day for windows applications. All videos and presentations can be found here: http://amzn.to/2ds3aMX
Leveraging services in stream processor apps at Ticketmaster (Derek Cline, Ti...confluent
Is your organization adopting Kafka as their messaging bus but you've found that it will take too long to migrate your existing service-oriented architecture to a log-oriented architecture? Some of the biggest challenges in building a new stream processor can be implementing all the business logic again. It has become increasingly common for companies with high-throughput source streams and change-data-capture logs to want to build systems fast. At Ticketmaster, we have found a solution to the problem by leveraging the business logic in our existing services and calling them from our Java based KafkaStreams processor applications in an efficient manner. In this talk, we will examine the initial challenges we faced in our transition, then we will explore the solutions we built to address the use cases at Ticketmaster. The primary focus will address our workflow around calling services to bring stream processor applications to market fast. We will review our challenges and share tips for success.
High Performance Software Engineering TeamsLars Thorup
Based on my experiences building high performance engineering teams, this presentation focuses on the technical practices required. These practices centers around automation (build, test and deployment) and increased collaboration between Engineering and QA (TDD, exploratory testing, prioritization, feedback cycles).
Building Great Software Engineering TeamsBrian Link
Being an effective software engineering manager is a tricky job. Whether you’re hiring the engineering manager, are already one or report to one, in this session you’ll learn what makes the best engineering managers and how to build, participate in and manage great engineering teams. I provide tips and advice in five areas of focus: people, process, technology, product and execution.
Topics include: hiring, building a team to complement your strengths, management style, effective communication, mentoring, virtual teams, career guidance, technical leadership, team size/structure, agile development, strategic roadmap building and delivering on-time.
URP? Excuse You! The Three Metrics You Have to Know confluent
(Todd Palino, LinkedIn) Kafka Summit SF 2018
What do you really know about how to monitor a Kafka cluster for problems? Is your most reliable monitoring your users telling you there’s something broken? Are you capturing more metrics than the actual data being produced? Sure, we all know how to monitor disk and network, but when it comes to the state of the brokers, many of us are still unsure of which metrics we should be watching, and what their patterns mean for the state of the cluster. Kafka has hundreds of measurements, from the high-level numbers that are often meaningless to the per-partition metrics that stack up by the thousands as our data grows.
We will thoroughly explore three key monitoring concepts in the broker, that will leave you an expert in identifying problems with the least amount of pain:
-Under-replicated Partitions: The mother of all metrics
-Request Latencies: Why your users complain
-Thread pool utilization: How could 80% be a problem?
We will also discuss the necessity of availability monitoring and how to use it to get a true picture of what your users see, before they come beating down your door!
Building an Event-oriented Data Platform with Kafka, Eric Sammer confluent
While we frequently talk about how to build interesting products on top of machine and event data, the reality is that collecting, organizing, providing access to, and managing this data is where most people get stuck. Many organizations understand the use cases around their data – fraud detection, quality of service and technical operations, user behavior analysis, for example – but are not necessarily data infrastructure experts. In this session, we’ll follow the flow of data through an end to end system built to handle tens of terabytes an hour of event-oriented data, providing real time streaming, in-memory, SQL, and batch access to this data. We’ll go into detail on how open source systems such as Hadoop, Kafka, Solr, and Impala/Hive are actually stitched together; describe how and where to perform data transformation and aggregation; provide a simple and pragmatic way of managing event metadata; and talk about how applications built on top of this platform get access to data and extend its functionality.
Attendees will leave this session knowing not just which open source projects go into a system such as this, but how they work together, what tradeoffs and decisions need to be addressed, and how to present a single general purpose data platform to multiple applications. This session should be attended by data infrastructure engineers and architects planning, building, or maintaining similar systems.
Engineering Leader opportunity @ Netflix - Playback Data SystemsPhilip Fisher-Ogden
Across the globe, 75M Netflix members love watching 125M hours per day of TV shows and movies. They love the ease of starting on one device and resuming on another, and the Playback Data Systems team makes that happen. We’re looking for a senior engineering manager to lead this high-impact team at Netflix.
Attributions for images:
https://www.flickr.com/photos/theholyllama/5738164504/ and https://www.flickr.com/photos/brewbooks/7780990192/, no changes made, https://creativecommons.org/licenses/by-sa/2.0/
https://www.flickr.com/photos/crschmidt/2956721498/, no changes made, https://creativecommons.org/licenses/by/2.0/
Kafka makes so many things easier to do, from managing metrics to processing streams of data. Yet it seems that so many things we have done to this point in configuring and managing it have been object studies in how to make our lives, as the plumbers who keep the data flowing, more difficult than they have to be. What are some of our favorites?
* Kafka without access controls
* Multitenant clusters with no capacity controls
* Worrying about message schemas
* MirrorMaker inefficiencies
* Hope and pray log compaction
* Configurations as shared secrets
* One-way upgrades
We’ve made a lot of progress over the last few years improving the situation, in part by focusing some of this incredibly talented community towards operational concerns. We’ll talk about the big mistakes you can avoid when setting up multi-tenant Kafka, and some that you still can’t. And we will talk about how to continue down the path of marrying the hot, new features with operational stability so we can all continue to come back here every year to talk about it.
Beaming flink to the cloud @ netflix ff 2016-monal-daxiniMonal Daxini
Netflix is a data driven company and we process over 700 billion streaming events per day with at-least once processing semantics in the cloud. To enable extracting intelligence from this unbounded stream easily we are building Stream Processing as a Service (SPaaS) infrastructure so that the user can focus on extracting value and not have to worry about boilerplate infrastructure and scale.
We will share our experience in building a scalable SPaaS using Flink, Apache Beam and Kafka as the foundation layer to process over 1.3 PB of event data without service disruption.
Netflix viewing data architecture evolution - EBJUG Nov 2014Philip Fisher-Ogden
Netflix's architecture for viewing data has evolved as streaming usage has grown. Each generation was designed for the next order of magnitude, and was informed by learnings from the previous. From SQL to NoSQL, from data center to cloud, from proprietary to open source, look inside to learn how this system has evolved. (slides from a talk given at the East Bay Java Users Group MeetUp in Nov 2014)
Building Information Systems using Event Modeling (Bobby Calderwood, Evident ...confluent
"Event Modeling is a fairly new information system modeling discipline created by Adam Dymitruk that is heavily influenced by CQRS and Event Sourcing. Its lineage follows from Event Storming, Design Thinking, and other modeling practices from the Agile and Domain-Driven Design communities. The methodology emphasizes simplicity (there are only four model ingredients) and inclusion of non-developer participants.
Like other modeling disciplines, Event Modeling is sufficiently general to enable collaborative learning and knowledge exchange among UI/UX designers, software engineers and architects, and business domain experts. But it's also sufficiently expressive and specific to be directly actionable by the implementors of the information system described by the model.
During this talk, we'll:
* Build an Event Model of a simple information system, including wire-framing the UI/UX experience
* Explore how to proceed from model to implementation using Kafka, its Streams and Connect APIs, and KSQL
* Jump-start the implementation by generating code directly from the Event Model
* Track and measure the work of implementation by generating tasks directly from the Event Model"
Scalable and Reliable Logging at PinterestKrishna Gade
At Pinterest, hundreds of services and third-party tools that are implemented in various programming languages generate billions of events every day. To achieve scalable and reliable low latency logging, there are several challenges: (1) uploading logs that are generated in various formats from tens of thousands of hosts to Kafka in a timely manner; (2) running Kafka reliably on Amazon Web Services where the virtual instances are less reliable than on-premises hardware; (3) moving tens of terabytes data per day from Kafka to cloud storage reliably and efficiently, and guaranteeing exact one time persistence per message.
In this talk, we will present Pinterest’s logging pipeline, and share our experience addressing these challenges. We will dive deep into the three components we developed: data uploading from service hosts to Kafka, data transportation from Kafka to S3, and data sanitization. We will also share our experience in operating Kafka at scale in the cloud.
In this talk, Confluent co-founder and CEO, Jay Kreps will cover the rise of two trends:
1. The rise of Apache Kafka and event streams
2. The rise of the public cloud and cloud-native data systems
... and the problems we need to solve as these two trends come together.
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016Monal Daxini
Keystone processes over 700 billion events per day (1 peta byte) with at-least once processing semantics in the cloud. We will explore in detail how we leverage Kafka, Samza, Docker, and Linux at scale to implement a multi-tenant pipeline in AWS cloud within a year. We will also share our plans on offering a Stream Processing as a Service for all of Netflix use.
Data Streaming with Apache Kafka & MongoDBconfluent
Explore the use-cases and architecture for Apache Kafka, and how it integrates with MongoDB to build sophisticated data-driven applications that exploit new sources of data.
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...confluent
Apache Kafka is now nearly ubiquitous in modern data pipelines and use cases. While the Kafka development model is elegantly simple, operating Kafka clusters in production environments is a challenge. It’s hard to troubleshoot misbehaving Kafka clusters, especially when there are potentially hundreds or thousands of topics, producers and consumers and billions of messages.
The root cause of why real-time applications is lag may be due to an application problem – like poor data partitioning or load imbalance – or due to a Kafka problem – like resource exhaustion or suboptimal configuration. Therefore getting the best performance, predictability, and reliability for Kafka-based applications can be difficult. In the end, the operation of your Kafka powered analytics pipelines could themselves benefit from machine learning (ML).
Presentation "From Local to Global" from Tobias Heintz at the AWS E-Business Web Day for windows applications. All videos and presentations can be found here: http://amzn.to/2ds3aMX
Leveraging services in stream processor apps at Ticketmaster (Derek Cline, Ti...confluent
Is your organization adopting Kafka as their messaging bus but you've found that it will take too long to migrate your existing service-oriented architecture to a log-oriented architecture? Some of the biggest challenges in building a new stream processor can be implementing all the business logic again. It has become increasingly common for companies with high-throughput source streams and change-data-capture logs to want to build systems fast. At Ticketmaster, we have found a solution to the problem by leveraging the business logic in our existing services and calling them from our Java based KafkaStreams processor applications in an efficient manner. In this talk, we will examine the initial challenges we faced in our transition, then we will explore the solutions we built to address the use cases at Ticketmaster. The primary focus will address our workflow around calling services to bring stream processor applications to market fast. We will review our challenges and share tips for success.
High Performance Software Engineering TeamsLars Thorup
Based on my experiences building high performance engineering teams, this presentation focuses on the technical practices required. These practices centers around automation (build, test and deployment) and increased collaboration between Engineering and QA (TDD, exploratory testing, prioritization, feedback cycles).
Building Great Software Engineering TeamsBrian Link
Being an effective software engineering manager is a tricky job. Whether you’re hiring the engineering manager, are already one or report to one, in this session you’ll learn what makes the best engineering managers and how to build, participate in and manage great engineering teams. I provide tips and advice in five areas of focus: people, process, technology, product and execution.
Topics include: hiring, building a team to complement your strengths, management style, effective communication, mentoring, virtual teams, career guidance, technical leadership, team size/structure, agile development, strategic roadmap building and delivering on-time.
Ads personalization / Netflix Ad Tech Event Nov/2017Liviu Tudor
A look at Netflix online advertising personalization platform which supports our digital marketing efforts and delivers billions of highly targeted ad impressions monthly.
2017 10-10 (netflix ml platform meetup) learning item and user representation...Ed Chi
Learning item and user representations with sparse data in recommender systems
Ed H. Chi
Google Inc.
Recommenders match users in a particular context with the best personalized items that they will engage with. The problem is that users have shifting item and topic preferences, and give sparse feedback over time (or no-feedback at all). Contexts shift from interaction-to-interaction at various time scales (seconds to minutes to days). Learning about users and items is hard because of noisy and sparse labels, and the user/item set changes rapidly and is large and long-tailed. Given the enormity of the problem, it is a wonder that we learn anything at all about our items and users.
In this talk, I will outline some research at Google to tackle the sparsity problem. First, I will summarize some work on focused learning, which suggests that learning about subsets of the data requires tuning the parameters for estimating the missing unobserved entries. Second, we utilize joint feature factorization to impute possible user affinity to freshly-uploaded items, and employ hashing-based techniques to perform extremely fast similarity scoring on a large item catalog, while controlling variance. This approach is currently serving a ~1TB model on production traffic using distributed TensorFlow Serving, demonstrating that our techniques work in practice. I will conclude with some remarks on possible future directions.
Event-Driven Architectures Done Right | Tim Berglund, ConfluentHostedbyConfluent
Far from a controversial choice, Kafka is now a technology developers and architects are adopting with enthusiasm. And it’s often not just a good choice, but a technology enabling meaningful improvements in complex, evolvable systems that need to respond to the world in real time. But surely it's possible to do wrong! In this talk, we'll look at common mistakes in event-driven systems built on top of Kafka:
- Deploying Kafka when an event-driven architecture is not the best choice.
- Ignoring schema management. Events are the APIs of event-driven systems!
- Writing bespoke consumers when stream processing is a better fit.
- Using stream processing when you really need a database.
- Trivializing the task of elastic scaling in all parts of the system.
It's highly likely for medium- and large-scale systems that an event-first perspective is the most helpful one to take, but it's early days, and it's still possible to get this wrong. Come to this talk for a survey of mistakes not to make.
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionDmitry Anoshin
This session will cover building the modern Data Warehouse by migration from the traditional DW platform into the cloud, using Amazon Redshift and Cloud ETL Matillion in order to provide Self-Service BI for the business audience. This topic will cover the technical migration path of DW with PL/SQL ETL to the Amazon Redshift via Matillion ETL, with a detailed comparison of modern ETL tools. Moreover, this talk will be focusing on working backward through the process, i.e. starting from the business audience and their needs that drive changes in the old DW. Finally, this talk will cover the idea of self-service BI, and the author will share a step-by-step plan for building an efficient self-service environment using modern BI platform Tableau.
An experience sharing of the OpenStack deployment at Suning.com, a large online retailer in China. The talk presents the challenges and opportunities on orchestrating the enterprise workloads using Heat.
This talk focuses on how we used Amazon Kinesis to build the pub-sub infra at Lyft, that ingests more than a 100 billion events per day. We'll review the strengths and weaknesses of Kinesis as a choice for streaming events in realtime, at Lyft's scale; as well as the best practices and lessons learnt over time.
Speaker: Hafiz Hamid (Lyft)
Hafiz Hamid is a software engineer on the Pub-Sub/Streaming Platform team at Lyft. He has built some of the key pieces in the messaging & streaming infrastructure at Lyft. Previously, Hafiz was a technical lead at Bing Search where he worked on data pipelines, relevance and web crawlers.
JavaOne: Efficiently building and deploying microservicesBart Blommaerts
Since Martin Fowler’s article on microservices in the beginning of 2014, there has been a lot of controversy about the topic. Although most articles talk about microservices from an architectural perspective, this session intends to go further and also provide examples of and best practices for building and deploying polyglot applications in an enterprise Java environment. In the session, the build process focuses on efficiency and shows that microservices don’t necessarily cause overhead for a project. Microservices don't imply copying and pasting the same boilerplate code over and over. The deployment process in the presentation is, of course, automated but also demonstrates best practices for integration testing between different active services.
A talk through the journey we've been through at Snowplow thinking about event data, starting with our focus on web and then mobile analytics, and exploring our current and future technical and analytic approaches
2015 nov 27_thug_paytm_rt_ingest_brief_finalAdam Muise
Paytm Labs provides a quick overview of their Hadoop data ingest platform. We cover our journey from a batch focused ingest system with SQOOP to a streaming ingest supported by Kafka, Confluent.io, Hadoop, Cassandra, and Spark Streaming. This presentation also provides an overview of our complete data platform including our feature creation template
This is a small introduction to microservices. you can find the differences between microservices and monolithic applications. You will find the pros and cons of microservices. you will also find the challenges (Business/ technical) that you may face while implementing microservices.
This webinar by Orkhan Gasimov (Senior Solution Architect, Consultant, GlobalLogic) was delivered at Java Community Webinar #3 on October 16, 2020.
During webinar we had simplified overview of classical and modern architecture patterns and concepts that are used for development of distributed applications during the last decade.
More details and presentation: https://www.globallogic.com/ua/about/events/java-community-webinar-3/
Eric Proegler Oredev Performance Testing in New ContextsEric Proegler
Virtualization, Cloud Deployments, and Cloud-Based Tools have challenged and changed performance testing practices. Today’s performance tester can summons tens of thousands of virtual users from the cloud in a few minutes at a cost far lower than the expensive on-premise installations of yesteryear.
Meanwhile, systems under test have changed more. Updated software stacks have increased the complexity of scripting and performance measurement, but the biggest changes are in the nature and quantities of resources powering the systems. Interpreting resource usage when resources are shared on a private virtualization platform is exceedingly difficult. Understanding resources when they live in a large public cloud is impossible.
Dev Lakhani, Data Scientist at Batch Insights "Real Time Big Data Applicatio...Dataconomy Media
Dev Lakhani, Data Scientist at Batch Insights talks on "Real Time Big Data Applications for Investment Banks and Financial Institutions" at the first Big Data Frankfurt event that took place at Die Zentrale, organised by Dataconomy Media
0x01 - Newton's Third Law: Static vs. Dynamic AbusersOWASP Beja
f you offer a service on the web, odds are that someone will abuse it. Be it an API, a SaaS, a PaaS, or even a static website, someone somewhere will try to figure out a way to use it to their own needs. In this talk we'll compare measures that are effective against static attackers and how to battle a dynamic attacker who adapts to your counter-measures.
About the Speaker
===============
Diogo Sousa, Engineering Manager @ Canonical
An opinionated individual with an interest in cryptography and its intersection with secure software development.
This presentation by Morris Kleiner (University of Minnesota), was made during the discussion “Competition and Regulation in Professions and Occupations” held at the Working Party No. 2 on Competition and Regulation on 10 June 2024. More papers and presentations on the topic can be found out at oe.cd/crps.
This presentation was uploaded with the author’s consent.
Acorn Recovery: Restore IT infra within minutesIP ServerOne
Introducing Acorn Recovery as a Service, a simple, fast, and secure managed disaster recovery (DRaaS) by IP ServerOne. A DR solution that helps restore your IT infra within minutes.
Have you ever wondered how search works while visiting an e-commerce site, internal website, or searching through other types of online resources? Look no further than this informative session on the ways that taxonomies help end-users navigate the internet! Hear from taxonomists and other information professionals who have first-hand experience creating and working with taxonomies that aid in navigation, search, and discovery across a range of disciplines.
Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...Orkestra
UIIN Conference, Madrid, 27-29 May 2024
James Wilson, Orkestra and Deusto Business School
Emily Wise, Lund University
Madeline Smith, The Glasgow School of Art
2. • Mathieu Chauvin – Netflix
Validating Payment Information
• Taylor Wicksell – Netflix
Flexible Billing Integrations Using Events
• Jean-Denis Greze – Dropbox
Mistakes (and Wins) Building Payments at Dropbox
• Anthony Zacharakis – Lumosity
Billing Migrations: Maintaining Your (Data) Sanity
• Alec Holmes – Square
Scaling Merchant Payouts at Square
=== Break ===
• Paul Huang – SurveyMonkey
Billing Globalization and Auto Renewal Optimization
• Emmanuel Cron – Google Wallet
Moving from SE to Host Card Emulation
• Feifeng Yang / Michael Chen – Electronic Arts
Ecommerce at EA
• Krishnan Sridhar – LinkedIn
Real-Time Analytics and Smart Routing
24. Common Event Pipeline
Current Implementation
• SQS Entry Point
• Custom routing service
• Custom code for each
endpoint integration
• Weak ordering of events
Future Implementation
• Suro / Kafka Entry Point
• Configurable Routing and
Transformation
• Pluggable endpoint
integration
• Option for strong ordering
of events
29. Mistakes (and Wins) Building
Payments At Dropbox
Jean-Denis Grèze & Dan Wheeler
June 18th, 2014
30. Backend Tips
• Not about increasing conversion
• Not about pricing
• Not about plan and feature optimizations
• Not about upselling
• Not about consumer SaaS at scale
• Not about self-serve in
SMB/SME/Enterprise
31. Pains of Scaling Payments
• Thousands of customers to millions of
customers
• SMB to Enterprise
– Custom flows!
• International expansion
– Fraud
– New payment methods (delayed settlements)
– Different price points
32. Out Of The Dark Ages
• For a long time, only 0.5 engineers
worked on payments and billing
• March 2013: consult w/ leading payment
engineers, PMs and executives on how to
build an amazing payments team
– 15+ in 1 year
33. Advice
• Build a payments + billing backend that is:
– Flexible
• Migrations (sadface)
• Requirements will change – often
– Auditable
• Always know why and state changed
– Append Only
• Never lose data
37. Migrations
• You will have to migrate
– 3rd-party vaulting to self vaulting
– New markets = new processors
– If you are a growing company, your internals will require
migrations
• Stakes are high
– Double-billing? Forgetting to charge some users?
Inadvertently moving users from one pricing category to
another?
• Old way:
– Ad hoc
– Tons of tests
38. I. Leverage Existing Code:
Equivalence
• Write equivalence between old and new
implementations (database, API, 3rd party
providers, tests, etc.)
• Run everything through both systems at
once, with equivalence being tested
• Every step of the way check that equivalence
relations hold (e.g., old-style invoice has a
new-style invoice equivalent)
• Turn off old system when everything works
for X amount of time
39. Migration Pro-Tip
• If you can migrate in both directions at
will on a per-endpoint basis, your life
will be awesome and people will love you.
41. Logging
• Dark Ages = tons of logging
• Very comprehensive, but ad-hoc = too
much effort to re-create state
• Human error
42. II. Automated Logging
• Automatically log
– Any DB write (graph, relational, etc.)
– Any 3rd party API call (and some internal calls like
email)
• Pre-log
– Any incoming 3rd party payload
• Can recreate past actions if we introduced a
regression
• LOG A REASON (and code version)
– 1 year is a long time
44. Not too much data
• Dropbox = large scale for SaaS
• Hundreds of millions of users
(provisioning is hard)
• Tons of data, but still payments data <<
file storage data
– Although we do have the benefits of amazing
infrastructure
45. III. States, Not Deltas
• Generally # states << # deltas
• States
– Pro 100 + Packrat
– DfB + Harmony Enabled
• Deltas
– Add 5 licenses, 6 licenses, etc.
– Add 20 licenses and switch from monthly to yearly
• Use states and let the system figure out how
to get from start to end
46. IV. Possibility & Transitions Use
Same Code Paths
• No difference between:
– entity.is_valid_transition(end_state)
– entity.perform_transition(end_state)
• Except that writes are turned off for the
former.
• No change for “is something possible”
logic to be different than “do the thing”
logic.
47. States Are Nice
transition_space = MoneytreeTransitionsSpace.build_cross_product(
entity=me,
gateways=ALL_GATEWAYS,
plans=[Pro100, Pro200, Pro500, DfB, DfBTrial],
schedules=[Monthly, Yearly],
currencies=ALL_CURRENCIES,
features=ALL_FEATURES,
tax_profile=[NoTax, SimpleTax, ComplexTax],
)
# …
If transition_space.supports(FEATURE_PACKRAT):
# …
48. Write Protection
• Seems dumb, but need to be careful not
to accidentally change values in payments
world. Have clearly-defined code paths
that can touch state, talk to 3rd party
components, etc.
49. V. One More Lesson
• Payments + Billing != Finance
• Business requirements don’t always
translate to what’s best/easiest in the
world of accounting. You need to flexibly
work in both worlds – can’t risk the user
experience to make your finance dep’t
happy (and vice-versa)
• Get a great PM
50. VI. Why Payments Are Cool?
• Infrastructure?
– Payments service
– Provisioning service
• Product?
– Upsells? (+50% increase in revenue per user)
– Gating features?
• Product Infrastructure!
– Build a successful structure by emphasizing
hard problems in both worlds!
51.
52. Now + The Future
• Other Cool Projects
– ML for risk/anomaly detection (e.g., for payment
methods that don’t settle immediately)
– Price AB testing (*)
– Cross-platform upsell framework
• Questions?
– dan@dropbox.com
– jeandenis@dropbox.com
• Hiring
– Get in touch!
58. Lumos Labs, Inc.
Payment system limitations
• Hard to add additional gateways
• Models don’t reflect business reality
• Not built for reporting
• Code is brittle
60. Lumos Labs, Inc.
New payment system features
• Trivial to add new payment methods
• Subscriptions are the core model
• Built with reporting in mind
• Separate, well encapsulated library
65. Lumos Labs, Inc.
Just deprecate the old one
• Don't migrate anyone, let old users
churn out naturally (will take forever)
66. Lumos Labs, Inc.
Just deprecate the old one
• Don't migrate anyone, let old users
churn out naturally (will take forever)
• Migrate everyone, but only most
critical/current subscription info
(loses history)
67. Lumos Labs, Inc.
Just deprecate the old one
• Don't migrate anyone, let old users
churn out naturally (will take forever)
• Migrate everyone, but only most
critical/current subscription info
(loses history)
• Migrate everyone + full history
(tricky, lots of edge cases)
73. Lumos Labs, Inc.
enter the
sanity check
Just make a sanity check before and
after the migration to ask questions
74. Lumos Labs, Inc.
enter the
sanity check
Both models should answer certain
questions the same way, e.g:
• How much did the user pay for a subscription?
• How many total transactions did the user make?
• Was auto-renewal enabled on X date?
75. Lumos Labs, Inc.
class SanityCheck
Methods = [:subscriber?, :transaction_count] # etc.
def initialize(record)
@record = record
@before_values = SanityCheck.values(record)
end
def self.values(record)
Methods.map { |m| [m, record.send(m)] }.to_h
end
end
enter the
sanity check
76. Lumos Labs, Inc.
class SanityCheck
...
def check
@after_values = SanityCheck.values(record)
@diff = diff(@before_values, @after_values)
end
def diff(a, b)
a.delete_if { |k, v| b[k] == v }
.merge!(b.dup.delete_if { |k, v| a.has_key?(k) })
end
end
enter the
sanity check
77. Lumos Labs, Inc.
enter the
sanity check
User.each do |user|
sanity_check = SanityCheck.new(user)
user.migrate!
sanity_check.check
if sanity_check.diff.any?
# sanity check failed -- log an error, rollback, etc.
raise ActiveRecord::Rollback
else
# woo hoo, success!
user.update_attributes(:migrated => true)
end
end
config/initializers/user_auth.rb
81. Lumos Labs, Inc.
Did not work in all cases
● Payment system behavior changed over
time
82. Lumos Labs, Inc.
Did not work in all cases
● Payment system behavior changed over
time
● Some concepts did not map between
systems
83. Lumos Labs, Inc.
Handling the edge cases
● Replayed history as it would play out in
the new system
● Still kept most critical information the
same (e.g. transaction timestamps)
84. Lumos Labs, Inc.
Skip ahead to today
Migrated 99.9994%* of all our users
successfully to the new system
*The remaining .0006% live on for business, not technical reasons
99. 90MUnique visitors
every month
17K
New sign-ups daily
“SurveyMonkey turns online
surveys into a hot business.”
“ Start-up companies using
‘freemium’ business models,
including SurveyMonkey,
are thriving as the cost of
computer power and storage
falls.”
One of the hottest
startups to watch.
2.4MSurvey responses
are generated daily
SurveyMonkey is the world’s largest survey company
101. Billing Globalization
In 2014...
Currencies: 39 international currencies.
Payment Methods: Credit Card, PayPal, Debit Card, Bank Transfer, iTunes, Invoice.
Pricing: each package with different price per currency, multiple prices allowed for
price testing.
Revenue: $$$$ MM
102. Auto-Renewal Optimization
Retry Logic: retry failed payments at different intervals, ~60% retry success rate
Account Updater: get users’ updated credit card data (number / expiration date)
before next renewal date, ~4% users at 96% success rate
Immediate Renewal: charge pending invoices when users update payment accounts,
~4% retry success rate improvement
104. Upcoming Billing Projects in 2014
VAT 2015: preparations for VAT rule changes in Europe in 2015
Brazil Payment Processing: set up local entity, integrate with a new Payment Service
Provider in Brazil
Continue Improving Retry Logic: increase retry frequencies