In this course students learned what the expected output of Data Scientist is and how they can use PySpark (part of Apache Spark) to deliver against these expectations. The course assignments included Log Mining, Textual Entity Recognition, Collaborative Filtering exercises that teach students how to manipulate data sets using parallel processing with PySpark.