6. Mathematics
Linear Algebra
Linear Algebra is used in machine learning to understand how algorithms work under the hood. It’s all about
vector/matrix/tensor operations .
1.Khan Academy Linear Algebra series (beginner-friendly).
2.Coding the Matrix course (and book).
3.3Blue1Brown Linear Algebra series.
4.fast.ai Linear Algebra for coders course, highly related to modern ML workflow.
5.The first course in Coursera Mathematics for Machine Learning specialization.
6.“Introduction to Applied Linear Algebra — Vectors, Matrices, and Least Squares” book.
7.MIT Linear Algebra course, highly comprehensive.
8.Stanford CS229 Linear Algebra review.
7. Mathematics
Calculus
Calculus is utilised in machine learning to formulate the functions used to train algorithms to reach their objective,
known by loss/cost/objective functions.
1.Khan Academy Calculus series (beginner-friendly).
2.3Blue1Brown Calculus series.
3.The second course in Coursera Mathematics for Machine Learning specialization.
4.The Matrix Calculus You Need For Deep Learning paper.
5.MIT Single Variable Calculus.
6.MIT Multivariable Calculus.
7.Stanford CS224n Differential Calculus review.
8. Mathematics
Statistics & Probability
Both are used in machine learning and data science to analyze and understand data, discover and infer valuable
insights and hidden patterns.
1.Khan Academy Statistics and probability series (beginner-friendly).
2.Seeing Theory: A visual introduction to probability and statistics.
3.Intro to Descriptive Statistics from Udacity.
4.Intro to Inferential Statistics from Udacity.
5.Statistics with R Specialization from Coursera.
6.Stanford CS229 Probability Theory review.
9. Statistics
For every data professional, stats and math are the must-haves. Because, without the knowledge of stats and
probability, one cannot able to interpret the data effectively.
Some of the major topics include descriptive and inferential stats. If you are a pure beginner, you can spend 2-3
weeks mastering these topics and work on some problems for the hands-on experience. the time invested by you
acquiring this knowledge is worth a millions.
Basic Statistics
Cases, Variables, Types of Variables
Matrix and Frequency Table
Graphs and Shapes of Distributions
Mode, Median and Mean
Range, Interquartile Range and Box Plot
Variance and Standard deviation
Z-scores
Contingency Table, Scatterplot, Pearson’s r
Basics of Regression
Elementary Probability
Random Variables and Probability Distributions
Normal Distribution, Binomial Distribution & Poisson Distribution
Inferential Statistics
Observational Studies and Experiments
Sample and Population
Population Distribution, Sample Distribution and Sampling
Distribution
Central Limit Theorem
Point Estimates
Confidence Intervals
Introduction to Hypothesis Testing
10. One of top books that you can read are - Practical stats for Data science.
First 3 chapters are explained in this video
Recommended video for detailed understanding on statistics for Data science
11. MS Excel
This section illustrates the powerful features Excel has to offer to analyze data.
1) Sort: You can sort your Excel data on one column or multiple columns. You can sort in ascending or descending
order.
2) Filter: Filter your Excel data if you only want to display records that meet certain criteria.
3) Conditional Formatting: Conditional formatting in Excel enables you to highlight cells with a certain color,
depending on the cell's value.
4) Charts: A simple Excel chart can say more than a sheet full of numbers. As you'll see, creating charts is very easy.
5) Pivot Tables: Pivot tables are one of Excel's most powerful features. A pivot table allows you to extract the
significance from a large, detailed data set.
6) Tables: Master Excel tables and analyze your data quickly and easily.
7) What-If Analysis: What-If Analysis in Excel allows you to try out different values (scenarios) for formulas.
8) Solver: Excel includes a tool called solver that uses techniques from the operations research to find optimal
solutions for all kind of decision problems.
9) Analysis ToolPak: The Analysis ToolPak is an Excel add-in program that provides data analysis tools for
financial, statistical and engineering data analysis.
12. MS Excel
1) Sort: Custom Sort Order | Sort by Color | Reverse List | Randomize List | SORT function
2) Filter: Number and Text Filters | Date Filters | Advanced Filter | Data Form | Remove Duplicates | Outlining
Data | Subtotal | Unique Values | FILTER function
3) Conditional Formatting: Manage Rules | Data Bars | Color Scales | Icon Sets | Find Duplicates | Shade Alternate
Rows | Compare Two Lists | Conflicting Rules | Heat Map
4) Charts: Column Chart | Line Chart | Pie Chart | Bar Chart | Area Chart | Scatter Plot | Data Series | Axes | Chart
Sheet | Trendline | Error Bars | Sparklines | Combination Chart | Gauge Chart | Thermometer Chart | Gantt
Chart | Pareto Chart
5) Pivot Tables: Group Pivot Table Items | Multi-level Pivot Table | Frequency Distribution | Pivot
Chart | Slicers | Update Pivot Table | Calculated Field/Item | GetPivotData
6) Tables: Structured References | Table Styles | Merge Tables | Table as Source Data | Quick Analysis
7) What-If Analysis: Data Tables | Goal Seek | Quadratic Equation
8) Solver: Transportation Problem | Assignment Problem | Capital Investment | Shortest Path Problem | Maximum
Flow Problem | Sensitivity Analysis | System of Linear Equations
13. SQL
Data Science is the study and analysis of data. In order to analyze the data, we need to extract it from the database.
This is where SQL comes into the picture.
Many database platforms are modeled after SQL. This is because it has become a standard for many
database systems. As a matter of fact, modern big data systems like Hadoop, Spark make use of SQL for
maintaining relational database systems and processing structured data.
Important topics in SQL for Data Science
Before jumping into the resources let’s see what are the important topics. Make sure you cover the
following topic but do not limit to those.
1] Group By Clause: The SQL GROUP BY clause is used in collaboration with the SELECT statement
to arrange identical data into groups. Mostly we use aggregation functions with the group by clause and
also use Having Clause to apply conditions along with group by clause.
2] Aggregation Functions: An aggregate function performs a calculation on a set of values and returns
a single value. Ex. count, avg, min, max, etc.
3] String Functions and Operations: In order to perform various operations such as Convert string to
uppercase, match a regular expression, etc.
14. SQL
4] Date & Time operations: When the value contains the only date it is easy to handle but when the time portion
is also involved then things get a bit complicated. So make sure you practice enough.
5] Output control statements: To get results as per requirements. Eg: order by clause, limit function to get
limited rows.
6] Various operators: There are mainly three types of operators as Arithmetic, Logical, and Comparison
operators.
7] Joins: This is one of the important topics and used to join multiple tables to get the desired output. Make sure
you get all the concepts like types of joins, primary key, foreign key, composite key, etc.
8] Nested Queries: A subquery/nested is used to return data when that will be used in the main query as
a condition to further restrict the data to be retrieved.
(Nested queries can be used to return either a scalar (single) value or a row set; whereas, joins are
used to return rows. If you can perform an operation in both ways then the optimized way is to use
Joins.)
9] Views & Indexing: Indexes are special lookup tables that the database search engine can use to
speed up data retrieval. In simple words, an index in a database is similar to the index of a book.
15. SQL – Practice resources
11] Windowing Functions: Window functions operate on a set of rows and return a single value for each row from
the underlying query. They reduce the complexity of queries that analyze partitions (windows) of a data set.
12] Query Optimizations: When we are dealing with larger datasets, it is important to use the most efficient method
for a SQL statement to access requested data.
Resources to Practice SQL
1] Leetcode:- This is one of the best practice platform with a great variety of questions.
2] SQL Zoo:- SQLZoo is a well established online platform (since 1999) for writing and running SQL queries
against a live database
3] HackerRank:- This is one of the good platforms to practice. Here the questions are divided into three parts as
Easy, Medium, Hard.
4] SQL Bolt:- It is, essentially, a series of interactive lessons and exercises that are created to help users learn
SQL easily. The lessons and topics found on this site are comprehensive and they cover all the important details of
using SQL.
16. SQL – Learning resources
Courses:
1] Udacity’s SQL for Data Analysis:- This is one of the best free courses covering all the above topics with a clear
explanation and practice quiz after each topic
https://www.udacity.com/course/sql-for-data-analysis--ud198
2] Introduction to SQL Programming for Excel Users: If you are an Excel user and wanted to learn SQL then
this will be a great YouTube playlist for you. This covers major topics from the above list.
3] Khan Academy: - It is a great platform with quality free courses with a detailed explanation. The entire course
contains 5 parts starting with basics and leading you all the way up to more advanced lessons. In this course, there
are challenges followed by video tutorials and a small project at the end of each topic.
Link:- https://www.khanacademy.org/computing/computer-programming/sql
17. How to learn Python for Data Science
Step 1: Learn Python fundamentals
Everyone starts somewhere. This first step is to learn Python programming basics, You can do this with an
online course, data science bootcamps, self-directed learning, or university programs. There is no right or
wrong way to learn the Python basics. The key is to choose a path and stay consistent.
Step 2: Practice with hands-on learning
One of the best ways to accelerate your education is through hands-on learning. It may surprise you how
quickly you catch on when you build small Python projects
Step 3: Learn Python data science librariesare NumPy, Pandas, Matplotlib, and Scikit-learn.
•NumPy — A library that makes a variety of mathematical and statistical operations easier; it is also the basis
for many features of the pandas library.
•pandas — A Python library created specifically to facilitate working with data. This is the bread and butter of
a lot of Python data science work.
•Matplotlib — A visualization library that makes it quick and easy to generate charts from your data.
•Scikit-learn — The most popular library for machine learning work in Python.Google Sheets
18. How to learn Python for Data Science
Step 4: Build a data science portfolio as you learn Python
For aspiring data scientists, a portfolio is a necessity — it’s one of the most important things hiring
managers look for in a qualified candidate.
These projects should include work with several different datasets, and each should share interesting insights that
you discovered. Here are some types of projects to consider:
•Data Cleaning Project — Any project that involves dirty or “unstructured” data that you clean up and analyze
will impress potential employers, since most real-world data requires cleaning.
•Data Visualization Project — Making attractive, easy-to-read visualizations is both a programming and a design
challenge, but if you can do it well, your analysis will be considerably more useful. Having great-looking charts in a
project will make your portfolio stand out.
•Machine Learning Project — If you aspire to work as a data scientist, you will definitely need a project that
shows off your ML skills. You may want a few different machine learning projects, with each focused on a different
algorithm. Your analysis should be clear and easy to read — ideally in a format like a Jupyter Notebook so a
technical audience can read your code. a. Matplotlib is a data visualization library that makes graphs as
you’d find in Excel or Google Sheets
19. Python for Data Science
1) Python Basics for Data Science– edX
This is a beginner-friendly free course to learn Python for data science. In this course, you will learn the Python
basics (how to define variables in Python, Sets, conditional statements, and functions), how to operate on files to read and
write data in Python, and how to use pandas for data analysis in Python.
2) Python For Data Science– Udemy
This course teaches the Python basics for data science. And this course is good for those who are Data Science, Artificial
Intelligence, Machine Learning, and Deep Learning Aspirants. This is not an advanced-level course, but good for
understanding the Python basics.
3) Foundations of Data Science: K-Means Clustering in Python– Coursera
This is a free course offered by Coursera, where you will learn the core concepts of Data Science and covers basic
mathematics, statistics, and programming skills. In this course, you will implement the K-means algorithm using Python
programming. This course is a perfect balance between theory and practice and a good and useful course for learning the
basics of data science.
4) Intro to Data Science– Udacity
This is another completely free course to learn data science with Python. In this course, you will learn the fundamentals
of data science, data wrangling, normal distribution, data visualization, and the basics of MapReduce.
This course is not for beginners and requires previous python programming knowledge and intro-level statistics.
20. 10 Best Python Courses According to Data Analysis
1.Learn Python by Codecademy
2.Introduction to Python Programming by Udacity
3.Programming for Everybody (Getting Started with Python) by Coursera
4.Introduction to Python for Data Science by Datacamp
5.Complete Python Bootcamp From Zero to Hero in Python by Udemy
6.Introduction to Computer Science and Programming Using Python by Edx
7.An Introduction to Interactive Programming in Python (Part 1) by Coursera
8.Machine Learning with Python by Coursera
9.Intro to TensorFlow for Deep Learning by Udacity
10.Learn to Program: The Fundamentals by Coursera
21. PowerBI
Power BI is a business analytics tool of Microsoft, It provides advanced analytic tools, reports and visualizations. Power BI
is much versatile and platform-independent tool that user can embed on the cloud, mobile and web apps.
Course Duration In Brief
Microsoft Learn Resources
(Microsoft)
12-16 hours
Collection of the best learning resources for Power BI training straight from
the creators
Microsoft Power BI Desktop for
Business Intelligence (Udemy)
11 hours
Great hands-on approach to learning Power BI for desktop for PC/Windows
users
Power BI Essential Training
(LinkedIn Learning)
3-4 hours
Great training for learning the basic concepts of Power BI quickly
with free access to videos
Power BI Data Methods
(LinkedIn Learning)
3-4 hours
Concise Power BI course with a focus on the data end of Power BI: the
Power Query
Power BI Examples, Demos &
Tutorials (Chandoo)
12 hours
Free Power BI course from an interactive instructor, comes with
downloadable files
Building Your First Power BI
Report (Pluralsight)
1-2 hours Simple yet good free Power BI course to get you started
Power BI Training (Learnit
Training)
15 hours
Free Power BI training: elaborate and comprehensive with very long,
intensive videos
22. Additional Resources
Kaggle :-
Kaggle is acquired and currently owned by Google
Kaggle is an online community platform for data scientists and machine learning enthusiasts
Kaggle allows users to collaborate with other users, find and publish datasets and compete with other
data scientists to solve data science challenges.
Everything on Kaggle is completely free: courses, certificates obtained from courses, datasets,
participation in competitions, discussion sections, etc.
Participate in Kaggle competitions - It consist of data science tasks, some competitions do not have any
prizes (but offer learning and knowledge sharing opportunities), while others have generous cash prizes.
You can participate in these competitions on your own or with a team. In addition to the prize money for
good scores in the competitions, you win medals and points.
23. Additional Resources
Collab :-
Colaboratory, or “Colab” for short, is a product from Google Research. Colab allows anybody to write and execute
arbitrary python code through the browser, and is especially well suited to machine learning, data analysis and
education. Colab is a free Jupyter notebook environment that runs entirely in the cloud.
•Interactive tutorials to learn machine learning and neural networks.
•Write and execute Python 3 code without having a local setup.
•Execute terminal commands from the Notebook.
•Import datasets from external sources such as Kaggle.
•Save your Notebooks to Google Drive.
•Import Notebooks from Google Drive.
•Free cloud service, GPUs and TPUs.
•Integrate with PyTorch, Tensor Flow, Open CV.
•Import or publish directly from/to GitHub.