1. Data Science and Machine
learning
Course Overview
Dr. Pratishtha Verma
Assistant Professor, NIT Kurukshetra
2. 1. The major goal of the course is to allow computers to learn (potentially complex) patterns from
data, and then make decisions based on these patterns.
2. To provide strong foundation for data science and application area related to it.
3. To provide the underlying core concepts and emerging technologies in data science.
4. A data scientist requires an integrated skill set spanning mathematics, probability and statistics,
optimization, and branches of computer science like databases, machine learning etc.
Course Learning Objectives:
3. Course Overview
Module List:
1. Introduction to Data Science: What is Data Science? Linear algebra for datascience:- algebraic and geometric view, Data
Representation & Statistical Inference:- Data objects and attribute types, Types of Data, descriptive statistics, notion of
probability, distributions, mean, variance, covariance, Understanding univariate and multivariate normal distributions.
2. Data Analysis: Probability and Random Variables, Correlation, Regression, Attribute Transformation, Sampling, Feature subset
selection, Similarity measures, High-dimensional Data: - Curse of Dimensionality, Dimensionality reduction: PCA, SVD, etc.
3. Data Visualization, Bayesian Learning& Evaluating Hypotheses: Basic principles, Scalar, Vector, & Tensor Visualization,
Multivariate Data Visualization, Text Data Visualization, Network Data Visualization, Visualization Techniques, Bayesian
Approach, Bayes’ Theorem, Evaluating Hypotheses- Z-test, T-test, Chi-square Test.
4. Machine Learning (Supervised & Unsupervised Learning): Basic concepts of Classification, k-Nearest Neighbor, Decision
Tree classification, Naïve Bayes’ Classifier, Linear Regression Models, Logistics Regression, Basic concepts of Clustering,
K-means, Hierarchical Clustering, DBSCAN.
4. What is Data Science?
Data science is a deep study of the massive amount of data, which involves extracting meaningful insights from raw,
structured, and unstructured data that is processed using the scientific method, different technologies, and algorithms.
It is a multidisciplinary field that uses tools and techniques to manipulate the data so that you can find something new and
meaningful.
Data science uses the most powerful hardware, programming systems, and most efficient algorithms to solve the data
related problems. It is the future of artificial intelligence.
In short, we can say that data science is all about:
5. Example:
Let suppose we want to travel from station A to station B by car. Now, we need to take some decisions such as which route
will be the best route to reach faster at the location, in which route there will be no traffic jam, and which will be
cost-effective. All these decision factors will act as input data, and we will get an appropriate answer from these decisions,
so this analysis of data is called the data analysis, which is a part of data science.
6. Linear algebra for data science:- algebraic and geometric view
Vector and their operation:
1) What are vectors?
2) Operations of vectors: Vector Addition, Scalar Multiplication in geometric
interpretation, Algebraic viewpoint and data science view point.
3) Length of a vector.
4) Dot product.
5) Zero vector, unit vector, orthogonal and orthonormal vector.
6) Projection: Scalar and Vector.
7. What are vectors?
Those quantities that has direction and magnitude are called vectors. Ex: Force and velocity.
Those qualities that has only magnitude are called scalars. Ex: mass, displacement.
Representation:
Vectors can be represented by arrow starting at reference point called origin.
Two vectors are equal if they have same direction and length.
A
B
C
D
8. Vectors from data science point of view:
A list of attributes of an object or describing attributes value of a specific instance.
Ex:
9. Vector operations (Geometrically) :
1. Vector addition (geometric view): addition of vector, head to tail: at the end of v place start of w.
Properties of vector addition: 1) Commutative
2) Associative
2. Scalar Multiplication (Geometric view): scalar multiplication scales the
length of the vector, but does not change its direction.
Properties of scalar multiplication: 1) Distributive over addition
10. Vector operation (Algebraically)
Lets define coordinate system to define vector operation algebraically:
1) Addition
2) Multiplication
-3
2
4
2
1
0
0
1
4i+2j
i
j
A
B
A+B=
4
2
+ -3
2
=
1
4
3A = 3
3
2
12. ● Data Objects and Attribute Types
● Basic Statistical Descriptions of Data
● Measuring Data Similarity and Dissimilarity
13. Type of Data:
– Data sets differ in a number of ways.
– Type of data determines which techniques can be used to
analyze the data.
Quality of Data:
– Data is often far from perfect.
– Improving data quality improves the quality of the resulting
analysis.
Preprocessing Steps to Make Data More Suitable:
– Raw data must be processed in order to make it suitable for
analysis.
• Improve data quality,
• Modify data so that it better fits a specified data mining
technique.
Data-Related Issues
Analyzing Data in Terms of its Relationships:
– find relationships among data objects and then
perform remaining analysis using these
relationships rather than data objects themselves.
– There are many similarity or distance measures,
and the proper choice depends on the type of data
and application.
14. What is Data?
• Data sets are made up of data objects.
Example: A data object represents an entity - in a sales database, the objects may be customers, store items,
and sales; in a medical database, the objects may be patients; in a university database, the objects may be
students, professors, and courses.
– Also called sample, example, instance, data point, object, tuple.
• Data objects are described by attributes.
• An attribute is a property or characteristic of a data object.
– Examples: eye color of a person, temperature, etc.
– Attribute is also known as variable, field, characteristic, or feature
• A collection of attributes describe an object.
• Attribute values are numbers or symbols assigned to an attribute.