This talk highlights the Data Sources API which participates in the Spark SQL DataFrame Catalyst Optimizer. We dive deep into the super-advanced Cassandra's open source implementation @ github.com/datastax/spark-cassandra-connector. We discuss data locality, cluster deployment - as well as the pros and cons of mixing OLAP and OLTP workloads.
We also implement a SimpleDataSource which is a basic implementation of the DataSources API.
All analysis is done with Apache Zeppelin.