This document describes the architecture of themidgame-tube, a data platform for analyzing YouTube videos and influencers. It discusses scraping over 19 million videos and 500GB of data from YouTube, using a batch-only data pipeline with AWS EMR, YARN, and MapReduce to process the data. It also presents results from system scalability tests using Spot Instances on EMR, showing the platform can scale to process large amounts of data quickly and cost effectively.