In this course students learned what the expected output of Data Scientist is and how they can use PySpark (part of Apache Spark) to deliver against these expectations. The course assignments included Log Mining, Textual Entity Recognition, Collaborative Filtering exercises that teach students how to manipulate data sets using parallel processing with PySpark.
Certificate of Accomplishment: Introduction to Big Data with Apache SparkFolco Bombardieri
Certificate of Accomplishment for "Introduction to Big Data with Apache Spark" Course from edX (offered by The University of California, Berkeley).
Part of the Big Data XSeries courses.
Duration: 5 weeks
Authenticity of the certificate can be verified at: https://verify.edx.org/cert/1a54a3e6ffa44c8d8aa24986adda4efd
Organizations use their data for decision support and to build data-intensive products and services, such as recommendation, prediction, and diagnostic systems. The collection of skills required by organizations to support these functions has been grouped under the term Data Science. This course will attempt to articulate the expected output of Data Scientists and then teach students how to use PySpark (part of Apache Spark) to deliver against these expectations. The course assignments include Log Mining, Textual Entity Recognition, and Collaborative Filtering exercises that teach students how to manipulate datasets using parallel processing with PySpark.
In this course students learned what the expected output of Data Scientist is and how they can use PySpark (part of Apache Spark) to deliver against these expectations. The course assignments included Log Mining, Textual Entity Recognition, Collaborative Filtering exercises that teach students how to manipulate data sets using parallel processing with PySpark.
Certificate of Accomplishment: Introduction to Big Data with Apache SparkFolco Bombardieri
Certificate of Accomplishment for "Introduction to Big Data with Apache Spark" Course from edX (offered by The University of California, Berkeley).
Part of the Big Data XSeries courses.
Duration: 5 weeks
Authenticity of the certificate can be verified at: https://verify.edx.org/cert/1a54a3e6ffa44c8d8aa24986adda4efd
Organizations use their data for decision support and to build data-intensive products and services, such as recommendation, prediction, and diagnostic systems. The collection of skills required by organizations to support these functions has been grouped under the term Data Science. This course will attempt to articulate the expected output of Data Scientists and then teach students how to use PySpark (part of Apache Spark) to deliver against these expectations. The course assignments include Log Mining, Textual Entity Recognition, and Collaborative Filtering exercises that teach students how to manipulate datasets using parallel processing with PySpark.
1. Berkeley
Assistant Professor of Computer Science
University of California, Los Angeles
Visiting Assistant Professor in Electrical Engineering
and Computer Science
University of California, Berkeley
Technical Advisor
Databricks
Ameet Talwalkar
Professor in Electrical Engineering and Computer Science
University of California, Berkeley
Technical Advisor
Databricks
Anthony D. Joseph
Executive Director
Berkeley Resource Center for Online Education
University of California, Berkeley
Diana Wu
XSERIES CERTIFICATE Verify the authenticity of this certificate at
CERTIFICATE
ACHIEVEMENT
of
Yi JIN
successfully completed all courses in the XSeries
Big Data
a series of two courses offered by BerkeleyX through edX.
Issued August 12, 2015 https://verify.edx.org/cert/abca385cc6444070b252d309392ba443