This document provides an introduction to Apache Spark and Scala. It discusses that Apache Spark is a general purpose cluster computing system that is faster than Hadoop MapReduce, runs locally and in the cloud. It has high-level APIs for Scala, Python, Java and R. The document outlines Spark's core components including SQL, MLlib, GraphX and streaming. It describes Spark's main data collections of RDDs for unstructured data and DataFrames/Datasets for structured data. Finally, it provides an overview of demonstrations that will be covered including the Spark shell, notebook, streaming and deploying a mini project to Google Cloud Dataproc.