LinkedIn processes huge amounts of data from user events across the globe at scale. They collect 2.3 trillion messages per day totaling 2.5 PB of data and process it using highly reliable fault tolerant batch and stream processing. They access this data by persisting it durably across 120 PB of HDFS storage and make it searchable and available for online services. Their analytics infrastructure includes data ingestion using Gobblin, dataset management using Dali, storage using HDFS and Voldemort, and compute engines like YARN. They use solutions like federated HDFS, Dali, Hadoop OrgQueue and elasticity tuning to scale their system, cluster management and computation across their infrastructure of tens of thousands of nodes