Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Streaming at Lyft - Greg Fee - Seattle Apache Flink Meetup


Published on

Streaming at Lyft - Greg Fee - Seattle Apache Flink Meetup 06/28/2018

Published in: Engineering
  • You can hardly find a student who enjoys writing a college papers. Among all the other tasks they get assigned in college, writing essays is one of the most difficult assignments. Fortunately for students, there are many offers nowadays which help to make this process easier. The best service which can help you is ⇒ ⇐
    Are you sure you want to  Yes  No
    Your message goes here
  • MY name is luthor mariam, from, (USA) After 12 years of marriage, me and my husband has been into one quarrel or the other until he finally left me and moved to California to be with another woman. I felt my life was over and my kids thought they would never see their father again. i tried to be strong just for the kids but i could not control the pains that torments my heart, my heart was filled with sorrows and pains because i was really in love with my husband. Every day and night i think of him and always wish he would come back to me, I was really upset and i needed help, so i searched for help online and I came across a website that suggested that Dr Osasu can help get ex back fast. So, I felt I should give him a try. I contacted him and he told me what to do and i did it then he did a Love spell for me. 28 hours later, my husband really called me and told me that he miss me and the kids so much, So Amazing!! So that was how he came back that same day,with lots of love and joy,and he apologized for his mistake,and for the pain he caused me and the kids. Then from that day,our Marriage was now stronger than how it were before, All thanks to Dr Osasu . he is so powerful and i decided to share my story on the internet that Dr. Osasu real and powerful spell caster who i will always pray to live long to help his children in the time of trouble, if you are here and you need your Ex back or your husband moved to another woman, do not cry anymore, contact this powerful spell caster now. Here’s his contact Call/WhatsApp: +2347064365391 Email:
    Are you sure you want to  Yes  No
    Your message goes here
  • Earn Up To $316/day! Social Media Jobs from the comfort of home! 
    Are you sure you want to  Yes  No
    Your message goes here

Streaming at Lyft - Greg Fee - Seattle Apache Flink Meetup

  1. 1. Streaming@Lyft Flink Meetup June 2018 Gregory Fee
  2. 2. About Me ● Engineer @ Lyft ● Teams - ETA, Data Science Platform, Data Platform ● Accomplishments ○ ETA model training from 4 months to every 10 minutes ○ Real-time traffic updates ○ Flyte - Large Scale Orchestration and Batch Compute ○ Lyftlearn - Custom Machine Learning Library ○ Dryft - Real-time Feature Generation for Machine Learning
  3. 3. Streaming Use Cases ● Firehose -> S3 for Hive and Presto ● Real-time Traffic ● Primetime + Heatmaps ● Fraud Detection ● Ride Receipts ● Driver incentives ● Passenger coupons ● Anomaly Detection ● Map Correction ● Location Smoothing ● Bad Experience Detection ● Accident Detection ● ...and so much more
  4. 4. A Brief History of Lyft ● <2015 - monolithic PHP app, Redshift ● 2015 - Python services, “workers”, Fanner, Spark, Job Scheduler ● 2016 - Go services, Hive ● 2017 - Presto ● 2018 - Druid, Kafka, Flink, Beam
  5. 5. Streaming Architecture Overview Mobile Services Ingest/ Enrich Fanner KCL Job Scheduler S3
  6. 6. Fanner ● Pub-sub layer on Kinesis ● Curates output streams based on event type and simple value filters ● Pros - easy to use, integrated with JobScheduler ● Cons - no ordering guarantees, limited scaling Kinesis All Events Fanner Kinesis Event A Kinesis Event B Kinesis Event A + B
  7. 7. Job Scheduler ● Register data to call webhook ● Fire and forget ● Immediate or scheduled ● Pros - easy asynchronous programming, scales well ● Cons - no ordering guarantees, suboptimal outage handling, no replay Service SQS Workers Target Service
  8. 8. “Workers” ● Simple asynchronous/stream programming ● KCL worker to grab data from Kinesis, transform, store in Redis ● Additional workers read from Redis, transform, store in Redis ● Pros - familiar programming paradigm ● Cons - failure handling, checkpointing, joins, replay, etc. are roll your own (aka error prone)
  9. 9. Next Generation Goals ● Build Community ● Lower support burden ● Enhance developer ergonomics ● Scale with the business ● Support additional use cases
  10. 10. Next Generation Overview ● Pub-Sub -> Kafka ○ Large community, mature technology ● Stream Processing -> Flink ○ Growing community, event time processing ● Apache Beam for cross language support ● SPaaS ○ Kubernetes ○ Dryft
  11. 11. Dryft ● Need - Consistent Feature Generation ○ The value of your machine learning results is only as good as the data ○ Subtle changes to how a feature value is generated can significantly impact results ● Solution - Unify feature generation ○ Batch processing for bulk creation of features for training ML models ○ Stream processing for real-time creation of features for scoring ML models ● How - Flink SQL ○ Use Flink as the processing engine using streaming or bulk data ○ Add automation to make it super simple to launch and maintain feature generation programs at scale
  12. 12. Dryft Program { "source": "dryft", "query_file": "decl_ride_completed.sql", "kinesis": { "stream": "declridecompleted" }, "features": { "n_total_rides": { "description": "All time ride count per user", "type": "int", "version": 1 } } } SELECT COALESCE(user_lyft_id, passenger_lyft_id, passenger_id, -1) AS user_id, COUNT(ride_id) as n_total_rides FROM event_ride_completed GROUP BY COALESCE(user_lyft_id, passenger_lyft_id, passenger_id, -1)
  13. 13. Dryft Program Execution ● Backfill - read historic data from S3, process, sink to S3 ● Real-time - read stream data from Kinesis/Kafka, process, sink to DynamoDB SinkS3 Source SQL SinkKinesis/Kafka Source SQL
  14. 14. Bootstrapping ● Read historic data from S3 ● Transition to reading real-time data ● g-state-in-apache-flink S3 Source Kinesis/Kafka Source Business Logic Sink < Target Time >= Target Time
  15. 15. Continuous Window Semantics ● Continuous window counts go up and down ● SELECT user_id, count(ride_id) OVER (PARTITION BY user_id ORDER BY rowtime RANGE INTERVAL '1' HOUR PRECEDING) from event_ride_completed ● Example: rides 1p, 1:30p, 2:15p, 3:45p ● Flink default is one message in, one message out ○ Output = 1@1p, 2@1:30p, 2@2:15p, 1@3:45p ● Dryft is one message in, two messages out ○ Output = 1@1p, 2@1:30p, 1@2p, 2@2:15p, 1@2:30p, 0@3:15p, 1@3:45p, 0@4:45p ● Other fancy SQL analysis tricks too
  16. 16. Beyond Feature Generation ● Reactive Programming ○ When an event occurs, execute some logic ● Asynchronous Programming ○ Perform more processing asynchronously ● Geotriggering ○ Emit an event when someone enters or leaves an area ● Change Data Capture ○ Mirror production data in analytical data stores
  17. 17. The End of the Stream ● Real-time Visualization w/ Druid and Superset ● Presto for <10s over large data sets ● Hive and Spark for extreme data sets ● Flyte - extreme data set workflows
  18. 18. Q&A