Data Pipeline team at Demonware (Activision) has to deal with routing large amounts of data from various sources to many destinations every day.
Our team always wanted to be able to query processed data for debugging and analytical purposes, but creating large data warehouses was never our priority, since it usually happens downstream.
AWS Athena is completely serverless query service that doesn't require any infrastructure setup or complex provisioning. We just needed to save some of our data streams to AWS S3 and define a schema. Just a few simple steps, but in the end we were able to write complex SQL queries against gigabytes of data and get results in seconds.
In this presentation I want to show multiple ways to stream your data to AWS S3, explain some underlying tech, show how to define a schema and finally share some of the best practices we applied.