Routing Trillions of Events Per Day @Twitter

2. 1. 2. 3. 4. 5. 6.

4. ● ● ● ○ ○ Clients Aggregated by Category Storage HDFS Http Clients Clients Client Daemon Client Daemon Client Daemon Http Endpoint

5. Across millions of clients Incoming uncompressed Collocated with HDFS datanodes Event groups by category

6. Clients Local log collection daemon Clients Aggregate log events grouped by Category Storage (HDFS) HTTP Remote Clients Log Processor Storage (HDFS) Storage (HDFS) Log ReplicatorStorage (HDFS) Inside DataCenter Storage (Streaming)

11. Events Events RT Storage (HDFS) Inside DC1 Events Events RT Storage (HDFS) Inside DCN DW Storage (HDFS) Prod Storage (HDFS) DW Storage (HDFS) Cold Storage (HDFS) Prod Storage (HDFS)

12. Events Events RT Storage (HDFS) Inside DC1 Events Events RT Storage (HDFS) Inside DCN DW Storage (HDFS) Prod Storage (HDFS) DW Storage (HDFS) Cold Storage (HDFS) Prod Storage (HDFS)

15. Scribe Client Daemon Scribe Aggregator Daemons Scribe Client Daemon Flume Aggregator Daemon Flume Aggregator Daemon Flume Client Daemon

16. ● ○ ● ● ● ●

17. ● ● ● ● Source Sink Channel Client HDFS Flume Agent

18. ● ● ● Agent 1 Agent 2 Agent 3 Category 1 Category 3Category 2 Category Group

19. Group 1 Category Groups Aggregator Group 1 Aggregator Group 2● ● Agent 1 Agent 2 Agent 3 Agent 8 Group 2

20. ● ● ● ●

21. ● ● ●

24. To process one day of data Output of cleaned, compressed, consolidated, and converted Saved by processing Flume sequence files

25. ● ● ● ●

26. Datacenter 1 ads_group/yyyy/mm/dd/hh ads_click/yyyy/mm/dd/hh ads_view/yyyy/mm/dd/hh login_event/yyyy/mm/dd/hhlogin_group/yyyy/mm/dd/hh Category Groups CategoriesDemux Jobs ads_group_demuxer login_group_demuxer

29. Decode Demux Clean [Convert]

30. ● Scribe’s contract amounts to sending a binary blob to a port ● Scribe used new line characters to delimit records in a binary blob batch of records ● Valid records may include newline characters ● Scribe base64 encoded received binary blobs to avoid confusion with record delimiter ● Base 64 encoding is no longer necessary because we have moved to one serialized Thrift object per binary blob

31. /logs/ads_click/yyyy/mm/dd/hh/1.lzo /logs/ads_view/yyyy/mm/dd/hh/1.lzo /raw/ads_group/yyyy/mm/dd/hh/ads_group_1.seq DEMUX /logs/ads_view/yyyy/mm/dd/hh/1.lzo

34. ● ● ● ● ●

35. ● Some categories are significantly larger than other categories (KBs v TBs) ● MapReduce demux? Each reducer handles a single category ● Streaming demux? Each spout or channel handles a single category ● Massive skew in partitioning by category causes long running tasks which slows down job completion time ● Relatively well understood fault tolerance semantics similar to MapReduce, Spark, etc

36. ● Tez’s dynamic hash partitioner adjusts partitions at runtime if necessary, allowing large partitions to be further partitioned so multiple tasks process events for a single category one task ○ More info at TEZ-3209. ○ Thanks to team member Ming Ma for the contribution! ● Easier horizontal scaling simultaneously providing more predictable processing times

37. Task 3 Task 2 Input File 1 Task 1

38. Task 3 Task 2 Input File 1 Task 1 Task 4 Task 5

41. Across all analytics clusters Replicated to analytics clusters

42. ● ○ ○ ● ● ●

43. Datacenter N Datacenter 1 ads_click/yyyy/mm/dd/hh ads_view/yyyy/mm/dd/hh login_event/yyyy/mm/dd/hh ads_click/yyyy/mm/dd/hh ads_view/yyyy/mm/dd/hh login_event/yyyy/mm/dd/hh ads_click/yyyy/mm/dd/hh ads_view/yyyy/mm/dd/hh login_event/yyyy/mm/dd/hh ads_click/yyyy/mm/dd/hh Replication Jobs ads_click_repl ads_view_repl ads_click_repl login_event_repl

54. Copy Merge Present Publish

55. ● ● ● ● ●

57. ● ● ● ●

Routing Trillions of Events Per Day @Twitter

Recommended

Recommended

More Related Content

What's hot

What's hot (10)

Similar to Routing Trillions of Events Per Day @Twitter

Similar to Routing Trillions of Events Per Day @Twitter (20)

Recently uploaded

Recently uploaded (20)

Routing Trillions of Events Per Day @Twitter