This session will cover a series of use cases where you can store your data cheaply in files and analyze the data with Apache Spark, as well as use cases where you want to store your data into a different data source to access with Spark DataFrames. Here’s an example outline of some of the topics that will be covered in the talk:
Use cases to store in file systems for use with Apache Spark:
- Analyzing a large set of data files.
- Doing ETL of a large amount of data.
- Applying Machine Learning & Data Science to a large dataset.
- Connecting BI/Visualization tools to Apache Spark to analyze large datasets internally.