At the end of day the only thing that data scientists want is one thing. They want tabular data for their analysis. They do not want to spend hours or days preparing data. How does a data engineer handle the massive amount of data that is being streamed at them from IoT devices and apps and at the same time add structure to it so that data scientists can focus on finding insights and not preparing data? By the way, you need to do this within minutes (sometimes seconds). Oh... and there are a bunch more data sources that you need to ingest and the current providers of data are changing their structure. At GoPro, we have massive amounts of heterogeneous data being streamed at us from our consumer devices and applications, and we have developed a concept of "dynamic DDL" to structure our streamed data on the fly using Spark Streaming, Kafka, HBase, Hive, and S3. The idea is simple. Add structure (schema) to the data as soon as possible. Allow the providers of the data to dictate the structure. And automatically create event-based and state-based tables (DDL) for all data sources to allow data scientists to access the data via their lingua franca, SQL, within minutes.