https://www.learntek.org/machine-learning-using-spark/
Learntek is global online training provider on Big Data Analytics, Hadoop, Machine Learning, Deep Learning, IOT, AI, Cloud Technology, DEVOPS, Digital Marketing and other IT and Management courses.
3. Copyright @ 2019 Learntek. All Rights Reserved. 3
What is Machine Learning?
Machine learning Using Spark – Spark MLlib is an application of artificial
intelligence (AI) that provides systems the ability to automatically learn and
improve from experience without being explicitly programmed. Machine learning
focuses on the development of computer programs that can access data and use it
learn for themselves.
The process of learning begins with observations or data, such as examples, direct
experience, or instruction, in order to look for patterns in data and make better
decisions in the future based on the examples that we provide. The primary aim is
to allow the computers to learn automatically without human intervention or
assistance and adjust actions accordingly.
4. Copyright @ 2019 Learntek. All Rights Reserved. 4
Into to Machine Learning Using Spark
MLlib is Spark’s machine learning (ML) library. Its goal is to make practical machine
learning scalable and easy. At a high level, it provides tools such as:
ML Algorithms: common learning algorithms such as classification, regression,
clustering, and collaborative filtering
Featurization: feature extraction, transformation, dimensionality reduction, and
selection
Pipelines: tools for constructing, evaluating, and tuning ML Pipelines
Persistence: saving and load algorithms, models, and Pipelines
Utilities: linear algebra, statistics, data handling, etc.
5. Copyright @ 2019 Learntek. All Rights Reserved. 5
Tools
This course will be delivered using Scala and PYTHON API. For explaining statistical
concept, R language will also be using. Visualization part will be covered using
Bokeh/ggplot library.
Introduction to Apache Spark
Spark Programming model
RDD and Data Frame
Transformation and Action
Broadcast and Accumulator
Running HDP on local machine
Launching Spark Cluster
6. Copyright @ 2019 Learntek. All Rights Reserved.
6
Basic Statistics
Descriptive Statistics
• Mean, Mode, Media, Range, Variance,
Standard Deviation, Quartiles, Percentiles
Sampling
Sampling Methods
Sampling Errors
Probability Distributions
• Normal distribution, t-distribution, Chi-
square, F
Margin of Error, Confidence Interval,
Significance level, Degree of Freedom
Hypothesis concept, Type I and Type II error
P-value, t-Test, Chi-square Test
Correlation Coefficient
7. Copyright @ 2019 Learntek. All Rights Reserved. 7
Machine Learning Using Spark
Introduction to Spark Mllib
Data types: Vector, Labeled Point
Feature Extraction
Feature Transformation, Normalization
Feature Selectors
Locality Sensitive Hashing(LSH)
8. Copyright @ 2019 Learntek. All Rights Reserved. 8
Regression Analysis with Spark
Types of Regression Models
Gradient Descent
Linear Regression, Generalized Linear
Regression
MSE, RMSE MAE, R-squared Coefficient
Transforming the target variable
Tuning Model Parameters
9. Copyright @ 2019 Learntek. All Rights Reserved. 9
Classification Model with Spark
Types of Classification Models
• Linear Models, Naives Bayes Model, Decision
Tree
Logistic Regression
Linear Support Vector Machine
Random Forest
Gradient-Boosted Trees
Training Classification Models
Accuracy and prediction error
Precision and Recall
ROC curve and AUC
Cross validation
10. Copyright @ 2019 Learntek. All Rights Reserved. 10
Clustering
Hierarchical clustering
K-mean clustering
Dimensionality Reduction
Principal Component Analysis
Singular Value Decomposition
Clustering as dimensionality reduction
Training a dimensionality reduction model
Evaluating dimensionality reduction models
11. Copyright @ 2019 Learntek. All Rights Reserved. 11
Recommendation Engine
Content based filtering
Collaborative based filtering
Overview of Movie Lens data
Training a recommendation model
Using the recommendation model
Performance Evaluation
Text Processing
Feature Hashing
TF-IDF model
Tokenization
Stop words
TF-IDF Weightings
Training a TF-IDF model
Usage of TF-IDF model
Evaluating TF-IDF models
12. Copyright @ 2019 Learntek. All Rights Reserved. 12
Prerequisites :
Prior understanding of exploratory data analysis and data visualization will help
immensely in learning machine learning concept and applications. This include
basic statistical technique for data analysis. Having some knowledge of R
programming or some Python packages like sci-kit, numpy will be useful. However ,
we are going to cover basic statistics technique as part of this course before going
deep into machine learning . This will help everyone to gain maximum from this
course.
13. Copyright @ 2019 Learntek. All Rights Reserved. 13
For more Training Information , Contact Us
Email : info@learntek.org
USA : +1734 418 2465
INDIA : +40 4018 1306
+7799713624