Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar

3,056 views

Published on

Siphon is a highly available and reliable distributed pub/sub system built using Apache Kafka. It is used to publish, discover and subscribe to near real-time data streams for operational and product intelligence. Siphon is used as a “Databus” by a variety of producers and subscribers in Microsoft, and is compliant with security and privacy requirements. It has a built-in Auditing and Quality control. This session will provide an overview of the use of Kafka at Microsoft, and then deep dive into Siphon. We will describe an important business scenario and talk about the technical details of the system in the context of that scenario. We will also cover the design and implementation of the service, the scale, and real world production experiences from operating the service in the Microsoft cloud environment.

Published in: Engineering
  • Be the first to comment

Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar

  1. 1. Thursday, April 14, 2016 Siphon – Near Real Time Databus Using Kafka Eric Boyd – CVP Engineering – Microsoft Nitin Kumar – Principal Eng Manager - Microsoft
  2. 2. Linux is a cancer
  3. 3. Thursday, April 14, 2016
  4. 4. Ads Oslo Schedule
  5. 5. Ads Oslo Feature List
  6. 6. Bing Ads Execution • Shipped once every 6 months • Averaged 3 marketplace experiments per month • Big bets on marketplace features that didn’t work. • Focused teams on 6 tracks with independent metrics. • Pushed teams to ship as quickly as they could, focusing only on moving their metric. • Built/borrowed infrastructure to enable much more rapid experimentation. • Over 3 years got to a rate of >1000 experiments a month
  7. 7. Profitability!! Eric joins MSFT
  8. 8. What drove the turnaround? • Focus on small teams with clear metrics each team was driving. • Pushing each team to experiment and iterate as fast as possible. Data alone determines what gets shipped. • Iterated on key metrics until we found the ones with the most impact. • Commitment that we would get 1.5-2% better each month, and ship a package of experimentally tested improvements each month.
  9. 9. Relationship with Open Source • From “Linux is a cancer…” • To contributing to open source • Storm with C# - SCP.NET (http://www.nuget.org/packages/Microsoft.SCP.Net.SDK/) • Spark with C# - Mobius (https://github.com/Microsoft/Mobius) • Kafka with C# - C# Client for Kafka (https://github.com/Microsoft/Kafkanet) • BOND (https://github.com/Microsoft/bond) • Across MSFT • C# • VSCode • Hyper-V drivers for Linux • https://github.com/Microsoft/ with 18 pages of repositories!
  10. 10. Microsoft Big Data History • Massive batch oriented systems • Hundreds of thousands of machines • Exabytes of storage • SQL-like language with C# extensions
  11. 11. Moving to streaming
  12. 12. Data Bus Devices Services Streaming Processing Batch Processing Applications Scalable pub/sub for NRT data streams Interactive analytics
  13. 13. Vision • A Databus for all Near Real Time (NRT) data in an organization. • Quick and Easy Publication, Discovery and Subscription of NRT dataset. • Compatibility with various Stream Processing systems like Storm, Spark, Splunk.
  14. 14. Siphon Adoption 15 months since launch Excel Word Outlook Windows 10
  15. 15. Usage Bing Ads Campaign perf Bing Live site telemetry Cortana Office 365 0 10 20 30 40 50 60 70 80 Throughput(inGBps) Siphon Data Volume (Ingress and Egress) Volume published (GBps) Volume subscribed (GBps) Total Volume (GBps) 0 2 4 6 8 10 12 14 16 18 Throughput(eventspersec)Millions Siphon Events per second (Ingress and Egress) EPS In Eps Out Total EPS 1.3 million EVENTS PER SECOND INGRESS AT PEAK ~1 trillion EVENTS PER DAY PROCESSED AT PEAK 3.5 petabytes PROCESSED PER DAY 100 thousand UNIQUE DEVICES AND MACHINES 1,300 PRODUCTION KAFKA BROKERS
  16. 16. Scale: Kafka at Microsoft (Ads, Bing, Office) Kafka Brokers 1300+ across 5 Datacenters Operating System Windows Server 2012 R2 Hardware Spec 12 Cores, 32 GB RAM, 4x2 TB HDD (JBOD), 10 GB Network Incoming Events 1.3 million per sec, (112 Billion per day, 500 TB per day) Outgoing Events 5 million per sec, (~1 Trillion per day, 3.5 PB per day) Kafka Topics/Partitions 50+/5000+ Kafka version 0.8.1.1 (3 way replication)
  17. 17. Siphon Architecture Asia DC Zookeeper Canary Kafka Collector Agent Services Data Pull (Agent) Services Data Push Device Proxy Services Consumer API (Push/ Pull) Europe DC Zookeeper Canary Kafka US DC Zookeeper Canary Kafka Streaming Batch Audit Trail Open Source Microsoft Internal Siphon
  18. 18. Multiple sources and schemas Siphon Bond Schema PartA Main Header MessageId AuditId TimeStamp PartB Extended Header Key-Value[] PartC Payload CSV XML JSON JSON XML CSV Siphon Bond Schema Bond (https://github.com/Microsoft/bond)  Cross platform framework for working with schematized data.  Cross language (de) serialization.  Similar to Protobuf, Thrift and AVRO.
  19. 19. Collector – Data Ingestion (Producer) • Http(s) Server • Restful API with SSL support. • Abstraction from Kafka internals (Partition, Kafka version) • Throttling, QPS Monitoring • PII scrubbing • Load balancing/failover to multiple DCs • Supported for both Windows and Linux servers. Device Proxy Services Collector Kafka Brokers Broker Broker Broker Broker P0 P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 Collector Collector LoadBalancer Services Data Push Agent Services Data Pull (Agent) Open Source Microsoft Internal Siphon URL : http://localhost/produce/<version>?topic=<toipic> Method : POST
  20. 20. Pull & Push Consumers Virtual Network A HLC Pull Kafka Brokers Broker Broker Broker Broker P0 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P1 Collector Collector RESTAPI Virtual Network B Pull • RESTful API with SSL support • Works for out of network consumers • Supports metadata and data operation • Implement Simple consumer APIs • Spark streaming receiver for Kafka REST Push • Configurable push to destinations like HDFS, Cosmos, Kafka. • Utilizes KafkaNet - .NET High Level Consumer (https://github.com/Microsoft/Kafkanet)
  21. 21. High Level Consumer Monitoring using Canary Device Proxy Services Collector Kafka Brokers Broker Broker Broker Broker P0 P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 Collector Collector LoadBalancer Services Data Push Agent Services Data Pull (Agent) Synthetic message Audit Trail Canary - https://github.com/Microsoft/Availability-Monitor-for-Kafka
  22. 22. High Level Consumer Device Proxy Services Collector Kafka Brokers Broker Broker Broker Broker P0 P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 Collector Collector LoadBalancer Services Data Push Agent Services Data Pull (Agent) Audit Trail Sampled vs Full Auditing support Data completeness – Audit Trail
  23. 23. Production Experience – Telemetry Charts • Monitoring using ELK • E2E Latency • Data Completeness • Processing Lag • EPS breakdown by data center.
  24. 24. Key Takeaways • Scale out with Kafka (50K -> 1M -> multi-million Events Per sec) • Ability to build tunable Auditing/Monitoring • Producer/Consumer Restful API provides a nice abstraction • Config driven Pub/Sub system

×