SlideShare a Scribd company logo
1 of 23
Data Analysis – Technical learnings
Mathematics
Linear Algebra
Linear Algebra is used in machine learning to understand how algorithms work under the hood. It’s all about
vector/matrix/tensor operations .
1.Khan Academy Linear Algebra series (beginner-friendly).
2.Coding the Matrix course (and book).
3.3Blue1Brown Linear Algebra series.
4.fast.ai Linear Algebra for coders course, highly related to modern ML workflow.
5.The first course in Coursera Mathematics for Machine Learning specialization.
6.“Introduction to Applied Linear Algebra — Vectors, Matrices, and Least Squares” book.
7.MIT Linear Algebra course, highly comprehensive.
8.Stanford CS229 Linear Algebra review.
Mathematics
Calculus
Calculus is utilised in machine learning to formulate the functions used to train algorithms to reach their objective,
known by loss/cost/objective functions.
1.Khan Academy Calculus series (beginner-friendly).
2.3Blue1Brown Calculus series.
3.The second course in Coursera Mathematics for Machine Learning specialization.
4.The Matrix Calculus You Need For Deep Learning paper.
5.MIT Single Variable Calculus.
6.MIT Multivariable Calculus.
7.Stanford CS224n Differential Calculus review.
Mathematics
Statistics & Probability
Both are used in machine learning and data science to analyze and understand data, discover and infer valuable
insights and hidden patterns.
1.Khan Academy Statistics and probability series (beginner-friendly).
2.Seeing Theory: A visual introduction to probability and statistics.
3.Intro to Descriptive Statistics from Udacity.
4.Intro to Inferential Statistics from Udacity.
5.Statistics with R Specialization from Coursera.
6.Stanford CS229 Probability Theory review.
Statistics
For every data professional, stats and math are the must-haves. Because, without the knowledge of stats and
probability, one cannot able to interpret the data effectively.
Some of the major topics include descriptive and inferential stats. If you are a pure beginner, you can spend 2-3
weeks mastering these topics and work on some problems for the hands-on experience. the time invested by you
acquiring this knowledge is worth a millions.
Basic Statistics
 Cases, Variables, Types of Variables
 Matrix and Frequency Table
 Graphs and Shapes of Distributions
 Mode, Median and Mean
 Range, Interquartile Range and Box Plot
 Variance and Standard deviation
 Z-scores
 Contingency Table, Scatterplot, Pearson’s r
 Basics of Regression
 Elementary Probability
 Random Variables and Probability Distributions
 Normal Distribution, Binomial Distribution & Poisson Distribution
Inferential Statistics
 Observational Studies and Experiments
 Sample and Population
 Population Distribution, Sample Distribution and Sampling
Distribution
 Central Limit Theorem
 Point Estimates
 Confidence Intervals
 Introduction to Hypothesis Testing
 One of top books that you can read are - Practical stats for Data science.
 First 3 chapters are explained in this video
 Recommended video for detailed understanding on statistics for Data science
MS Excel
This section illustrates the powerful features Excel has to offer to analyze data.
1) Sort: You can sort your Excel data on one column or multiple columns. You can sort in ascending or descending
order.
2) Filter: Filter your Excel data if you only want to display records that meet certain criteria.
3) Conditional Formatting: Conditional formatting in Excel enables you to highlight cells with a certain color,
depending on the cell's value.
4) Charts: A simple Excel chart can say more than a sheet full of numbers. As you'll see, creating charts is very easy.
5) Pivot Tables: Pivot tables are one of Excel's most powerful features. A pivot table allows you to extract the
significance from a large, detailed data set.
6) Tables: Master Excel tables and analyze your data quickly and easily.
7) What-If Analysis: What-If Analysis in Excel allows you to try out different values (scenarios) for formulas.
8) Solver: Excel includes a tool called solver that uses techniques from the operations research to find optimal
solutions for all kind of decision problems.
9) Analysis ToolPak: The Analysis ToolPak is an Excel add-in program that provides data analysis tools for
financial, statistical and engineering data analysis.
MS Excel
1) Sort: Custom Sort Order | Sort by Color | Reverse List | Randomize List | SORT function
2) Filter: Number and Text Filters | Date Filters | Advanced Filter | Data Form | Remove Duplicates | Outlining
Data | Subtotal | Unique Values | FILTER function
3) Conditional Formatting: Manage Rules | Data Bars | Color Scales | Icon Sets | Find Duplicates | Shade Alternate
Rows | Compare Two Lists | Conflicting Rules | Heat Map
4) Charts: Column Chart | Line Chart | Pie Chart | Bar Chart | Area Chart | Scatter Plot | Data Series | Axes | Chart
Sheet | Trendline | Error Bars | Sparklines | Combination Chart | Gauge Chart | Thermometer Chart | Gantt
Chart | Pareto Chart
5) Pivot Tables: Group Pivot Table Items | Multi-level Pivot Table | Frequency Distribution | Pivot
Chart | Slicers | Update Pivot Table | Calculated Field/Item | GetPivotData
6) Tables: Structured References | Table Styles | Merge Tables | Table as Source Data | Quick Analysis
7) What-If Analysis: Data Tables | Goal Seek | Quadratic Equation
8) Solver: Transportation Problem | Assignment Problem | Capital Investment | Shortest Path Problem | Maximum
Flow Problem | Sensitivity Analysis | System of Linear Equations
SQL
Data Science is the study and analysis of data. In order to analyze the data, we need to extract it from the database.
This is where SQL comes into the picture.
Many database platforms are modeled after SQL. This is because it has become a standard for many
database systems. As a matter of fact, modern big data systems like Hadoop, Spark make use of SQL for
maintaining relational database systems and processing structured data.
Important topics in SQL for Data Science
Before jumping into the resources let’s see what are the important topics. Make sure you cover the
following topic but do not limit to those.
1] Group By Clause: The SQL GROUP BY clause is used in collaboration with the SELECT statement
to arrange identical data into groups. Mostly we use aggregation functions with the group by clause and
also use Having Clause to apply conditions along with group by clause.
2] Aggregation Functions: An aggregate function performs a calculation on a set of values and returns
a single value. Ex. count, avg, min, max, etc.
3] String Functions and Operations: In order to perform various operations such as Convert string to
uppercase, match a regular expression, etc.
SQL
4] Date & Time operations: When the value contains the only date it is easy to handle but when the time portion
is also involved then things get a bit complicated. So make sure you practice enough.
5] Output control statements: To get results as per requirements. Eg: order by clause, limit function to get
limited rows.
6] Various operators: There are mainly three types of operators as Arithmetic, Logical, and Comparison
operators.
7] Joins: This is one of the important topics and used to join multiple tables to get the desired output. Make sure
you get all the concepts like types of joins, primary key, foreign key, composite key, etc.
8] Nested Queries: A subquery/nested is used to return data when that will be used in the main query as
a condition to further restrict the data to be retrieved.
(Nested queries can be used to return either a scalar (single) value or a row set; whereas, joins are
used to return rows. If you can perform an operation in both ways then the optimized way is to use
Joins.)
9] Views & Indexing: Indexes are special lookup tables that the database search engine can use to
speed up data retrieval. In simple words, an index in a database is similar to the index of a book.
SQL – Practice resources
11] Windowing Functions: Window functions operate on a set of rows and return a single value for each row from
the underlying query. They reduce the complexity of queries that analyze partitions (windows) of a data set.
12] Query Optimizations: When we are dealing with larger datasets, it is important to use the most efficient method
for a SQL statement to access requested data.
Resources to Practice SQL
1] Leetcode:- This is one of the best practice platform with a great variety of questions.
2] SQL Zoo:- SQLZoo is a well established online platform (since 1999) for writing and running SQL queries
against a live database
3] HackerRank:- This is one of the good platforms to practice. Here the questions are divided into three parts as
Easy, Medium, Hard.
4] SQL Bolt:- It is, essentially, a series of interactive lessons and exercises that are created to help users learn
SQL easily. The lessons and topics found on this site are comprehensive and they cover all the important details of
using SQL.
SQL – Learning resources
Courses:
1] Udacity’s SQL for Data Analysis:- This is one of the best free courses covering all the above topics with a clear
explanation and practice quiz after each topic
https://www.udacity.com/course/sql-for-data-analysis--ud198
2] Introduction to SQL Programming for Excel Users: If you are an Excel user and wanted to learn SQL then
this will be a great YouTube playlist for you. This covers major topics from the above list.
3] Khan Academy: - It is a great platform with quality free courses with a detailed explanation. The entire course
contains 5 parts starting with basics and leading you all the way up to more advanced lessons. In this course, there
are challenges followed by video tutorials and a small project at the end of each topic.
Link:- https://www.khanacademy.org/computing/computer-programming/sql
How to learn Python for Data Science
Step 1: Learn Python fundamentals
Everyone starts somewhere. This first step is to learn Python programming basics, You can do this with an
online course, data science bootcamps, self-directed learning, or university programs. There is no right or
wrong way to learn the Python basics. The key is to choose a path and stay consistent.
Step 2: Practice with hands-on learning
One of the best ways to accelerate your education is through hands-on learning. It may surprise you how
quickly you catch on when you build small Python projects
Step 3: Learn Python data science librariesare NumPy, Pandas, Matplotlib, and Scikit-learn.
•NumPy — A library that makes a variety of mathematical and statistical operations easier; it is also the basis
for many features of the pandas library.
•pandas — A Python library created specifically to facilitate working with data. This is the bread and butter of
a lot of Python data science work.
•Matplotlib — A visualization library that makes it quick and easy to generate charts from your data.
•Scikit-learn — The most popular library for machine learning work in Python.Google Sheets
How to learn Python for Data Science
Step 4: Build a data science portfolio as you learn Python
For aspiring data scientists, a portfolio is a necessity — it’s one of the most important things hiring
managers look for in a qualified candidate.
These projects should include work with several different datasets, and each should share interesting insights that
you discovered. Here are some types of projects to consider:
•Data Cleaning Project — Any project that involves dirty or “unstructured” data that you clean up and analyze
will impress potential employers, since most real-world data requires cleaning.
•Data Visualization Project — Making attractive, easy-to-read visualizations is both a programming and a design
challenge, but if you can do it well, your analysis will be considerably more useful. Having great-looking charts in a
project will make your portfolio stand out.
•Machine Learning Project — If you aspire to work as a data scientist, you will definitely need a project that
shows off your ML skills. You may want a few different machine learning projects, with each focused on a different
algorithm. Your analysis should be clear and easy to read — ideally in a format like a Jupyter Notebook so a
technical audience can read your code. a. Matplotlib is a data visualization library that makes graphs as
you’d find in Excel or Google Sheets
Python for Data Science
1) Python Basics for Data Science– edX
This is a beginner-friendly free course to learn Python for data science. In this course, you will learn the Python
basics (how to define variables in Python, Sets, conditional statements, and functions), how to operate on files to read and
write data in Python, and how to use pandas for data analysis in Python.
2) Python For Data Science– Udemy
This course teaches the Python basics for data science. And this course is good for those who are Data Science, Artificial
Intelligence, Machine Learning, and Deep Learning Aspirants. This is not an advanced-level course, but good for
understanding the Python basics.
3) Foundations of Data Science: K-Means Clustering in Python– Coursera
This is a free course offered by Coursera, where you will learn the core concepts of Data Science and covers basic
mathematics, statistics, and programming skills. In this course, you will implement the K-means algorithm using Python
programming. This course is a perfect balance between theory and practice and a good and useful course for learning the
basics of data science.
4) Intro to Data Science– Udacity
This is another completely free course to learn data science with Python. In this course, you will learn the fundamentals
of data science, data wrangling, normal distribution, data visualization, and the basics of MapReduce.
This course is not for beginners and requires previous python programming knowledge and intro-level statistics.
10 Best Python Courses According to Data Analysis
1.Learn Python by Codecademy
2.Introduction to Python Programming by Udacity
3.Programming for Everybody (Getting Started with Python) by Coursera
4.Introduction to Python for Data Science by Datacamp
5.Complete Python Bootcamp From Zero to Hero in Python by Udemy
6.Introduction to Computer Science and Programming Using Python by Edx
7.An Introduction to Interactive Programming in Python (Part 1) by Coursera
8.Machine Learning with Python by Coursera
9.Intro to TensorFlow for Deep Learning by Udacity
10.Learn to Program: The Fundamentals by Coursera
PowerBI
Power BI is a business analytics tool of Microsoft, It provides advanced analytic tools, reports and visualizations. Power BI
is much versatile and platform-independent tool that user can embed on the cloud, mobile and web apps.
Course Duration In Brief
Microsoft Learn Resources
(Microsoft)
12-16 hours
Collection of the best learning resources for Power BI training straight from
the creators
Microsoft Power BI Desktop for
Business Intelligence (Udemy)
11 hours
Great hands-on approach to learning Power BI for desktop for PC/Windows
users
Power BI Essential Training
(LinkedIn Learning)
3-4 hours
Great training for learning the basic concepts of Power BI quickly
with free access to videos
Power BI Data Methods
(LinkedIn Learning)
3-4 hours
Concise Power BI course with a focus on the data end of Power BI: the
Power Query
Power BI Examples, Demos &
Tutorials (Chandoo)
12 hours
Free Power BI course from an interactive instructor, comes with
downloadable files
Building Your First Power BI
Report (Pluralsight)
1-2 hours Simple yet good free Power BI course to get you started
Power BI Training (Learnit
Training)
15 hours
Free Power BI training: elaborate and comprehensive with very long,
intensive videos
Additional Resources
Kaggle :-
 Kaggle is acquired and currently owned by Google
 Kaggle is an online community platform for data scientists and machine learning enthusiasts
 Kaggle allows users to collaborate with other users, find and publish datasets and compete with other
data scientists to solve data science challenges.
 Everything on Kaggle is completely free: courses, certificates obtained from courses, datasets,
participation in competitions, discussion sections, etc.
 Participate in Kaggle competitions - It consist of data science tasks, some competitions do not have any
prizes (but offer learning and knowledge sharing opportunities), while others have generous cash prizes.
You can participate in these competitions on your own or with a team. In addition to the prize money for
good scores in the competitions, you win medals and points.
Additional Resources
Collab :-
Colaboratory, or “Colab” for short, is a product from Google Research. Colab allows anybody to write and execute
arbitrary python code through the browser, and is especially well suited to machine learning, data analysis and
education. Colab is a free Jupyter notebook environment that runs entirely in the cloud.
•Interactive tutorials to learn machine learning and neural networks.
•Write and execute Python 3 code without having a local setup.
•Execute terminal commands from the Notebook.
•Import datasets from external sources such as Kaggle.
•Save your Notebooks to Google Drive.
•Import Notebooks from Google Drive.
•Free cloud service, GPUs and TPUs.
•Integrate with PyTorch, Tensor Flow, Open CV.
•Import or publish directly from/to GitHub.

More Related Content

Similar to Data Analysis – Technical learnings

Chapter 1 Introduction to Data Structures and Algorithms.pdf
Chapter 1 Introduction to Data Structures and Algorithms.pdfChapter 1 Introduction to Data Structures and Algorithms.pdf
Chapter 1 Introduction to Data Structures and Algorithms.pdfAxmedcarb
 
Data Structure and its Fundamentals
Data Structure and its FundamentalsData Structure and its Fundamentals
Data Structure and its FundamentalsHitesh Mohapatra
 
data structures and its importance
 data structures and its importance  data structures and its importance
data structures and its importance Anaya Zafar
 
What are the key points one must know before learning Advanced Excel.docx
What are the key points one must know before learning Advanced Excel.docxWhat are the key points one must know before learning Advanced Excel.docx
What are the key points one must know before learning Advanced Excel.docxshivanikaale214
 
fdocuments.in_unit-2-ooad.ppt
fdocuments.in_unit-2-ooad.pptfdocuments.in_unit-2-ooad.ppt
fdocuments.in_unit-2-ooad.pptRAJESH S
 
Data Structure.pptx
Data Structure.pptxData Structure.pptx
Data Structure.pptxSajalFayyaz
 
Object oriented methodologies
Object oriented methodologiesObject oriented methodologies
Object oriented methodologiesnaina-rani
 
Application of Excel and SPSS software for statistical analysis- Biostatistic...
Application of Excel and SPSS software for statistical analysis- Biostatistic...Application of Excel and SPSS software for statistical analysis- Biostatistic...
Application of Excel and SPSS software for statistical analysis- Biostatistic...Himanshu Sharma
 
EE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxEE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxiamultapromax
 
Report
ReportReport
Reportbutest
 
Lecture-8-The-GIS-Database-Part-1.ppt
Lecture-8-The-GIS-Database-Part-1.pptLecture-8-The-GIS-Database-Part-1.ppt
Lecture-8-The-GIS-Database-Part-1.pptPrabin Pandit
 
Lecture 1 Pandas Basics.pptx machine learning
Lecture 1 Pandas Basics.pptx machine learningLecture 1 Pandas Basics.pptx machine learning
Lecture 1 Pandas Basics.pptx machine learningmy6305874
 

Similar to Data Analysis – Technical learnings (20)

22_presentation.ppt
22_presentation.ppt22_presentation.ppt
22_presentation.ppt
 
8.unit-1-fds-2022-23.pptx
8.unit-1-fds-2022-23.pptx8.unit-1-fds-2022-23.pptx
8.unit-1-fds-2022-23.pptx
 
Chapter 1 Introduction to Data Structures and Algorithms.pdf
Chapter 1 Introduction to Data Structures and Algorithms.pdfChapter 1 Introduction to Data Structures and Algorithms.pdf
Chapter 1 Introduction to Data Structures and Algorithms.pdf
 
Data Structure and its Fundamentals
Data Structure and its FundamentalsData Structure and its Fundamentals
Data Structure and its Fundamentals
 
data structures and its importance
 data structures and its importance  data structures and its importance
data structures and its importance
 
What are the key points one must know before learning Advanced Excel.docx
What are the key points one must know before learning Advanced Excel.docxWhat are the key points one must know before learning Advanced Excel.docx
What are the key points one must know before learning Advanced Excel.docx
 
fdocuments.in_unit-2-ooad.ppt
fdocuments.in_unit-2-ooad.pptfdocuments.in_unit-2-ooad.ppt
fdocuments.in_unit-2-ooad.ppt
 
Intro_2.ppt
Intro_2.pptIntro_2.ppt
Intro_2.ppt
 
Intro.ppt
Intro.pptIntro.ppt
Intro.ppt
 
Intro.ppt
Intro.pptIntro.ppt
Intro.ppt
 
Data Structure.pptx
Data Structure.pptxData Structure.pptx
Data Structure.pptx
 
Object oriented methodologies
Object oriented methodologiesObject oriented methodologies
Object oriented methodologies
 
algo 1.ppt
algo 1.pptalgo 1.ppt
algo 1.ppt
 
Application of Excel and SPSS software for statistical analysis- Biostatistic...
Application of Excel and SPSS software for statistical analysis- Biostatistic...Application of Excel and SPSS software for statistical analysis- Biostatistic...
Application of Excel and SPSS software for statistical analysis- Biostatistic...
 
Ch08
Ch08Ch08
Ch08
 
Ch08
Ch08Ch08
Ch08
 
EE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxEE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptx
 
Report
ReportReport
Report
 
Lecture-8-The-GIS-Database-Part-1.ppt
Lecture-8-The-GIS-Database-Part-1.pptLecture-8-The-GIS-Database-Part-1.ppt
Lecture-8-The-GIS-Database-Part-1.ppt
 
Lecture 1 Pandas Basics.pptx machine learning
Lecture 1 Pandas Basics.pptx machine learningLecture 1 Pandas Basics.pptx machine learning
Lecture 1 Pandas Basics.pptx machine learning
 

Recently uploaded

Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 

Recently uploaded (20)

Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 

Data Analysis – Technical learnings

  • 1. Data Analysis – Technical learnings
  • 2.
  • 3.
  • 4.
  • 5.
  • 6. Mathematics Linear Algebra Linear Algebra is used in machine learning to understand how algorithms work under the hood. It’s all about vector/matrix/tensor operations . 1.Khan Academy Linear Algebra series (beginner-friendly). 2.Coding the Matrix course (and book). 3.3Blue1Brown Linear Algebra series. 4.fast.ai Linear Algebra for coders course, highly related to modern ML workflow. 5.The first course in Coursera Mathematics for Machine Learning specialization. 6.“Introduction to Applied Linear Algebra — Vectors, Matrices, and Least Squares” book. 7.MIT Linear Algebra course, highly comprehensive. 8.Stanford CS229 Linear Algebra review.
  • 7. Mathematics Calculus Calculus is utilised in machine learning to formulate the functions used to train algorithms to reach their objective, known by loss/cost/objective functions. 1.Khan Academy Calculus series (beginner-friendly). 2.3Blue1Brown Calculus series. 3.The second course in Coursera Mathematics for Machine Learning specialization. 4.The Matrix Calculus You Need For Deep Learning paper. 5.MIT Single Variable Calculus. 6.MIT Multivariable Calculus. 7.Stanford CS224n Differential Calculus review.
  • 8. Mathematics Statistics & Probability Both are used in machine learning and data science to analyze and understand data, discover and infer valuable insights and hidden patterns. 1.Khan Academy Statistics and probability series (beginner-friendly). 2.Seeing Theory: A visual introduction to probability and statistics. 3.Intro to Descriptive Statistics from Udacity. 4.Intro to Inferential Statistics from Udacity. 5.Statistics with R Specialization from Coursera. 6.Stanford CS229 Probability Theory review.
  • 9. Statistics For every data professional, stats and math are the must-haves. Because, without the knowledge of stats and probability, one cannot able to interpret the data effectively. Some of the major topics include descriptive and inferential stats. If you are a pure beginner, you can spend 2-3 weeks mastering these topics and work on some problems for the hands-on experience. the time invested by you acquiring this knowledge is worth a millions. Basic Statistics  Cases, Variables, Types of Variables  Matrix and Frequency Table  Graphs and Shapes of Distributions  Mode, Median and Mean  Range, Interquartile Range and Box Plot  Variance and Standard deviation  Z-scores  Contingency Table, Scatterplot, Pearson’s r  Basics of Regression  Elementary Probability  Random Variables and Probability Distributions  Normal Distribution, Binomial Distribution & Poisson Distribution Inferential Statistics  Observational Studies and Experiments  Sample and Population  Population Distribution, Sample Distribution and Sampling Distribution  Central Limit Theorem  Point Estimates  Confidence Intervals  Introduction to Hypothesis Testing
  • 10.  One of top books that you can read are - Practical stats for Data science.  First 3 chapters are explained in this video  Recommended video for detailed understanding on statistics for Data science
  • 11. MS Excel This section illustrates the powerful features Excel has to offer to analyze data. 1) Sort: You can sort your Excel data on one column or multiple columns. You can sort in ascending or descending order. 2) Filter: Filter your Excel data if you only want to display records that meet certain criteria. 3) Conditional Formatting: Conditional formatting in Excel enables you to highlight cells with a certain color, depending on the cell's value. 4) Charts: A simple Excel chart can say more than a sheet full of numbers. As you'll see, creating charts is very easy. 5) Pivot Tables: Pivot tables are one of Excel's most powerful features. A pivot table allows you to extract the significance from a large, detailed data set. 6) Tables: Master Excel tables and analyze your data quickly and easily. 7) What-If Analysis: What-If Analysis in Excel allows you to try out different values (scenarios) for formulas. 8) Solver: Excel includes a tool called solver that uses techniques from the operations research to find optimal solutions for all kind of decision problems. 9) Analysis ToolPak: The Analysis ToolPak is an Excel add-in program that provides data analysis tools for financial, statistical and engineering data analysis.
  • 12. MS Excel 1) Sort: Custom Sort Order | Sort by Color | Reverse List | Randomize List | SORT function 2) Filter: Number and Text Filters | Date Filters | Advanced Filter | Data Form | Remove Duplicates | Outlining Data | Subtotal | Unique Values | FILTER function 3) Conditional Formatting: Manage Rules | Data Bars | Color Scales | Icon Sets | Find Duplicates | Shade Alternate Rows | Compare Two Lists | Conflicting Rules | Heat Map 4) Charts: Column Chart | Line Chart | Pie Chart | Bar Chart | Area Chart | Scatter Plot | Data Series | Axes | Chart Sheet | Trendline | Error Bars | Sparklines | Combination Chart | Gauge Chart | Thermometer Chart | Gantt Chart | Pareto Chart 5) Pivot Tables: Group Pivot Table Items | Multi-level Pivot Table | Frequency Distribution | Pivot Chart | Slicers | Update Pivot Table | Calculated Field/Item | GetPivotData 6) Tables: Structured References | Table Styles | Merge Tables | Table as Source Data | Quick Analysis 7) What-If Analysis: Data Tables | Goal Seek | Quadratic Equation 8) Solver: Transportation Problem | Assignment Problem | Capital Investment | Shortest Path Problem | Maximum Flow Problem | Sensitivity Analysis | System of Linear Equations
  • 13. SQL Data Science is the study and analysis of data. In order to analyze the data, we need to extract it from the database. This is where SQL comes into the picture. Many database platforms are modeled after SQL. This is because it has become a standard for many database systems. As a matter of fact, modern big data systems like Hadoop, Spark make use of SQL for maintaining relational database systems and processing structured data. Important topics in SQL for Data Science Before jumping into the resources let’s see what are the important topics. Make sure you cover the following topic but do not limit to those. 1] Group By Clause: The SQL GROUP BY clause is used in collaboration with the SELECT statement to arrange identical data into groups. Mostly we use aggregation functions with the group by clause and also use Having Clause to apply conditions along with group by clause. 2] Aggregation Functions: An aggregate function performs a calculation on a set of values and returns a single value. Ex. count, avg, min, max, etc. 3] String Functions and Operations: In order to perform various operations such as Convert string to uppercase, match a regular expression, etc.
  • 14. SQL 4] Date & Time operations: When the value contains the only date it is easy to handle but when the time portion is also involved then things get a bit complicated. So make sure you practice enough. 5] Output control statements: To get results as per requirements. Eg: order by clause, limit function to get limited rows. 6] Various operators: There are mainly three types of operators as Arithmetic, Logical, and Comparison operators. 7] Joins: This is one of the important topics and used to join multiple tables to get the desired output. Make sure you get all the concepts like types of joins, primary key, foreign key, composite key, etc. 8] Nested Queries: A subquery/nested is used to return data when that will be used in the main query as a condition to further restrict the data to be retrieved. (Nested queries can be used to return either a scalar (single) value or a row set; whereas, joins are used to return rows. If you can perform an operation in both ways then the optimized way is to use Joins.) 9] Views & Indexing: Indexes are special lookup tables that the database search engine can use to speed up data retrieval. In simple words, an index in a database is similar to the index of a book.
  • 15. SQL – Practice resources 11] Windowing Functions: Window functions operate on a set of rows and return a single value for each row from the underlying query. They reduce the complexity of queries that analyze partitions (windows) of a data set. 12] Query Optimizations: When we are dealing with larger datasets, it is important to use the most efficient method for a SQL statement to access requested data. Resources to Practice SQL 1] Leetcode:- This is one of the best practice platform with a great variety of questions. 2] SQL Zoo:- SQLZoo is a well established online platform (since 1999) for writing and running SQL queries against a live database 3] HackerRank:- This is one of the good platforms to practice. Here the questions are divided into three parts as Easy, Medium, Hard. 4] SQL Bolt:- It is, essentially, a series of interactive lessons and exercises that are created to help users learn SQL easily. The lessons and topics found on this site are comprehensive and they cover all the important details of using SQL.
  • 16. SQL – Learning resources Courses: 1] Udacity’s SQL for Data Analysis:- This is one of the best free courses covering all the above topics with a clear explanation and practice quiz after each topic https://www.udacity.com/course/sql-for-data-analysis--ud198 2] Introduction to SQL Programming for Excel Users: If you are an Excel user and wanted to learn SQL then this will be a great YouTube playlist for you. This covers major topics from the above list. 3] Khan Academy: - It is a great platform with quality free courses with a detailed explanation. The entire course contains 5 parts starting with basics and leading you all the way up to more advanced lessons. In this course, there are challenges followed by video tutorials and a small project at the end of each topic. Link:- https://www.khanacademy.org/computing/computer-programming/sql
  • 17. How to learn Python for Data Science Step 1: Learn Python fundamentals Everyone starts somewhere. This first step is to learn Python programming basics, You can do this with an online course, data science bootcamps, self-directed learning, or university programs. There is no right or wrong way to learn the Python basics. The key is to choose a path and stay consistent. Step 2: Practice with hands-on learning One of the best ways to accelerate your education is through hands-on learning. It may surprise you how quickly you catch on when you build small Python projects Step 3: Learn Python data science librariesare NumPy, Pandas, Matplotlib, and Scikit-learn. •NumPy — A library that makes a variety of mathematical and statistical operations easier; it is also the basis for many features of the pandas library. •pandas — A Python library created specifically to facilitate working with data. This is the bread and butter of a lot of Python data science work. •Matplotlib — A visualization library that makes it quick and easy to generate charts from your data. •Scikit-learn — The most popular library for machine learning work in Python.Google Sheets
  • 18. How to learn Python for Data Science Step 4: Build a data science portfolio as you learn Python For aspiring data scientists, a portfolio is a necessity — it’s one of the most important things hiring managers look for in a qualified candidate. These projects should include work with several different datasets, and each should share interesting insights that you discovered. Here are some types of projects to consider: •Data Cleaning Project — Any project that involves dirty or “unstructured” data that you clean up and analyze will impress potential employers, since most real-world data requires cleaning. •Data Visualization Project — Making attractive, easy-to-read visualizations is both a programming and a design challenge, but if you can do it well, your analysis will be considerably more useful. Having great-looking charts in a project will make your portfolio stand out. •Machine Learning Project — If you aspire to work as a data scientist, you will definitely need a project that shows off your ML skills. You may want a few different machine learning projects, with each focused on a different algorithm. Your analysis should be clear and easy to read — ideally in a format like a Jupyter Notebook so a technical audience can read your code. a. Matplotlib is a data visualization library that makes graphs as you’d find in Excel or Google Sheets
  • 19. Python for Data Science 1) Python Basics for Data Science– edX This is a beginner-friendly free course to learn Python for data science. In this course, you will learn the Python basics (how to define variables in Python, Sets, conditional statements, and functions), how to operate on files to read and write data in Python, and how to use pandas for data analysis in Python. 2) Python For Data Science– Udemy This course teaches the Python basics for data science. And this course is good for those who are Data Science, Artificial Intelligence, Machine Learning, and Deep Learning Aspirants. This is not an advanced-level course, but good for understanding the Python basics. 3) Foundations of Data Science: K-Means Clustering in Python– Coursera This is a free course offered by Coursera, where you will learn the core concepts of Data Science and covers basic mathematics, statistics, and programming skills. In this course, you will implement the K-means algorithm using Python programming. This course is a perfect balance between theory and practice and a good and useful course for learning the basics of data science. 4) Intro to Data Science– Udacity This is another completely free course to learn data science with Python. In this course, you will learn the fundamentals of data science, data wrangling, normal distribution, data visualization, and the basics of MapReduce. This course is not for beginners and requires previous python programming knowledge and intro-level statistics.
  • 20. 10 Best Python Courses According to Data Analysis 1.Learn Python by Codecademy 2.Introduction to Python Programming by Udacity 3.Programming for Everybody (Getting Started with Python) by Coursera 4.Introduction to Python for Data Science by Datacamp 5.Complete Python Bootcamp From Zero to Hero in Python by Udemy 6.Introduction to Computer Science and Programming Using Python by Edx 7.An Introduction to Interactive Programming in Python (Part 1) by Coursera 8.Machine Learning with Python by Coursera 9.Intro to TensorFlow for Deep Learning by Udacity 10.Learn to Program: The Fundamentals by Coursera
  • 21. PowerBI Power BI is a business analytics tool of Microsoft, It provides advanced analytic tools, reports and visualizations. Power BI is much versatile and platform-independent tool that user can embed on the cloud, mobile and web apps. Course Duration In Brief Microsoft Learn Resources (Microsoft) 12-16 hours Collection of the best learning resources for Power BI training straight from the creators Microsoft Power BI Desktop for Business Intelligence (Udemy) 11 hours Great hands-on approach to learning Power BI for desktop for PC/Windows users Power BI Essential Training (LinkedIn Learning) 3-4 hours Great training for learning the basic concepts of Power BI quickly with free access to videos Power BI Data Methods (LinkedIn Learning) 3-4 hours Concise Power BI course with a focus on the data end of Power BI: the Power Query Power BI Examples, Demos & Tutorials (Chandoo) 12 hours Free Power BI course from an interactive instructor, comes with downloadable files Building Your First Power BI Report (Pluralsight) 1-2 hours Simple yet good free Power BI course to get you started Power BI Training (Learnit Training) 15 hours Free Power BI training: elaborate and comprehensive with very long, intensive videos
  • 22. Additional Resources Kaggle :-  Kaggle is acquired and currently owned by Google  Kaggle is an online community platform for data scientists and machine learning enthusiasts  Kaggle allows users to collaborate with other users, find and publish datasets and compete with other data scientists to solve data science challenges.  Everything on Kaggle is completely free: courses, certificates obtained from courses, datasets, participation in competitions, discussion sections, etc.  Participate in Kaggle competitions - It consist of data science tasks, some competitions do not have any prizes (but offer learning and knowledge sharing opportunities), while others have generous cash prizes. You can participate in these competitions on your own or with a team. In addition to the prize money for good scores in the competitions, you win medals and points.
  • 23. Additional Resources Collab :- Colaboratory, or “Colab” for short, is a product from Google Research. Colab allows anybody to write and execute arbitrary python code through the browser, and is especially well suited to machine learning, data analysis and education. Colab is a free Jupyter notebook environment that runs entirely in the cloud. •Interactive tutorials to learn machine learning and neural networks. •Write and execute Python 3 code without having a local setup. •Execute terminal commands from the Notebook. •Import datasets from external sources such as Kaggle. •Save your Notebooks to Google Drive. •Import Notebooks from Google Drive. •Free cloud service, GPUs and TPUs. •Integrate with PyTorch, Tensor Flow, Open CV. •Import or publish directly from/to GitHub.