In this course students learned what the expected output of Data Scientist is and how they can use PySpark (part of Apache Spark) to deliver against these expectations. The course assignments included Log Mining, Textual Entity Recognition, Collaborative Filtering exercises that teach students how to manipulate data sets using parallel processing with PySpark.
In this course students learned what the expected output of Data Scientist is and how they can use PySpark (part of Apache Spark) to deliver against these expectations. The course assignments included Log Mining, Textual Entity Recognition, Collaborative Filtering exercises that teach students how to manipulate data sets using parallel processing with PySpark.
Certificate of Accomplishment: Introduction to Big Data with Apache SparkFolco Bombardieri
Certificate of Accomplishment for "Introduction to Big Data with Apache Spark" Course from edX (offered by The University of California, Berkeley).
Part of the Big Data XSeries courses.
Duration: 5 weeks
Authenticity of the certificate can be verified at: https://verify.edx.org/cert/1a54a3e6ffa44c8d8aa24986adda4efd
Organizations use their data for decision support and to build data-intensive products and services, such as recommendation, prediction, and diagnostic systems. The collection of skills required by organizations to support these functions has been grouped under the term Data Science. This course will attempt to articulate the expected output of Data Scientists and then teach students how to use PySpark (part of Apache Spark) to deliver against these expectations. The course assignments include Log Mining, Textual Entity Recognition, and Collaborative Filtering exercises that teach students how to manipulate datasets using parallel processing with PySpark.
Certificate of Accomplishment: Introduction to Big Data with Apache SparkFolco Bombardieri
Certificate of Accomplishment for "Introduction to Big Data with Apache Spark" Course from edX (offered by The University of California, Berkeley).
Part of the Big Data XSeries courses.
Duration: 5 weeks
Authenticity of the certificate can be verified at: https://verify.edx.org/cert/1a54a3e6ffa44c8d8aa24986adda4efd
Organizations use their data for decision support and to build data-intensive products and services, such as recommendation, prediction, and diagnostic systems. The collection of skills required by organizations to support these functions has been grouped under the term Data Science. This course will attempt to articulate the expected output of Data Scientists and then teach students how to use PySpark (part of Apache Spark) to deliver against these expectations. The course assignments include Log Mining, Textual Entity Recognition, and Collaborative Filtering exercises that teach students how to manipulate datasets using parallel processing with PySpark.
1. Professor in Electrical Engineering and Computer Science
University of California, Berkeley
Technical Advisor
Databricks
Anthony D. Joseph
Berkeley
VERIFIED CERTIFICATE Verify the authenticity of this certificate at
CERTIFICATE
ACHIEVEMENT
of
VERIFIED
ID
This is to certify that
Salih Oztop
successfully completed and received a passing grade in
CS100.1x: Introduction to Big Data with Apache Spark
a course of study offered by BerkeleyX, an online learning
initiative of The University of California, Berkeley through edX.
Issued July 10, 2015 https://verify.edx.org/cert/80ef9807dea74e5e87ae7a4dd3e345b4