SlideShare a Scribd company logo
Data Science
Data science is a field of applied mathematics and
statistics that provides useful information based on large
amounts of complex data or big data. It uses scientific
approaches, procedures, algorithms, the framework to
extract the knowledge and insight from a huge amount
of data. Data science is a concept to bring together
ideas, data examination, Machine Learning, and their
related strategies to comprehend and dissect genuine
phenomena with data.
KEY TAKEAWAYS
•Data science uses techniques such as machine learning
and artificial intelligence to extract meaningful
information and to predict future patterns and behaviors.
•Advances in technology, the internet, social media, and
the use of technology have all increased access to big data.
•The field of data science is growing as technology
advances and big data collection and analysis techniques
become more sophisticated.
Statistics:-
Math is probably one of the most important topics that are the core of almost all the advances in technology. The filed of data
science wouldn’t have existed without maths.
Machine Learning and Statistics are the two core skills required to become a data scientist. Statistics is like the heart of Data
Science that helps to analyze, transform and predict data. Statistics is usually a part of mathematics wherein tables of data are
operated upon to calculate metrics like mean, median, and standard deviation. These metrics are then used to characterize the
available data so that it can be used in decision-making processes. These metrics are then used to characterize the available
data so that it can be used in decision-making processes.
7 Basic Statistics Concepts For Data Science:-
1. Descriptive Statistics:-
It is used to describe the basic features of data that provide a summary of the given data set which can either represent the
entire population or a sample of the population. It is derived from calculations that include:
Mean: It is the central value which is commonly known as arithmetic average.
Mode: It refers to the value that appears most often in a data set.
Median: It is the middle value of the ordered set that divides it in exactly half.
2. Variability:-
• Variability includes the following parameters:
• Standard Deviation: It is a statistic that calculates the dispersion of a data set as compared to its mean.
• Variance: It refers to a statistical measure of the spread between the numbers in a data set. In general terms, it means the
difference from the mean. A large variance indicates that numbers are far apart from the mean or average value. Small
variance indicates that the numbers are closer to the average values. Zero variance indicates that the values are identical to
the given set.
• Range: This is defined as the difference between the largest and smallest value of a dataset.
• Percentile: It refers to the measure used in statistics that indicates the value below which the given percentage of
observation in the dataset falls.
• Quartile: It is defined as the value that divides the data points into quarters.
• Interquartile Range: It measures the middle half of your data. In general terms, it is the middle 50% of the dataset.
3. Correlation:-
• It is one of the major statistical techniques that measure the relationship between two variables. The correlation coefficient
indicates the strength of the linear relationship between two variables.
• A correlation coefficient that is more than zero indicates a positive relationship.
• A correlation coefficient that is less than zero indicates a negative relationship.
• Correlation coefficient zero indicates that there is no relationship between the two variables.
4. Probability Distribution:-
• It specifies the likelihood of all possible events. In simple terms, an event refers to the result of an experiment like tossing a
coin. Events are of two types dependent and independent.
• Independent event: The event is said to be an Independent event when it is not affected by the earlier events. For example,
tossing a coin, let us consider a coin is tossed the first outcome is head when the coin is tossed again the outcome may be
head or tail. But this is entirely independent of the first trial.
• Dependent event: The event is said to be dependent when the occurrence of the event is dependent on the earlier events. For
example when a ball is drawn from a bag that contains red and blue balls. If the first ball drawn is red, then the second ball
may be red or blue; this depends on the first trial.
The probability of independent events is calculated by simply multiplying the probability of each event and for a dependent
event is calculated by conditional probability.
5. Regression:-
It is a method that is used to determine the relationship between one or more independent variables and a dependent variable.
Regression is mainly of two types:
• Linear regression: It is used to fit the regression model that explains the relationship between a numeric predictor variable
and one or more predictor variables.
• Logistic regression: It is used to fit a regression model that explains the relationship between the binary response variable
and one or more predictor variables.
6. Normal Distribution:-
Normal is used to define the probability density function for a continuous random variable in a system. The standard normal
distribution has two parameters – mean and standard deviation that are discussed above. When the distribution of random
variables is unknown, the normal distribution is used. The central limit theorem justifies why normal distribution is used in
such cases.
7. Bias:-
• In statistical terms, it means when a model is representative of a complete population. This needs to be minimized to get
the desired outcome.
• The three most common types of bias are:
• Selection bias: It is a phenomenon of selecting a group of data for statistical analysis, the selection in such a way that data
is not randomized resulting in the data being unrepresentative of the whole population.
• Confirmation bias: It occurs when the person performing the statistical analysis has some predefined assumption.
• Time interval bias: It is caused intentionally by specifying a certain time range to favor a particular outcome.
Programming tools using Data Science
A data scientist shall extract, manipulate, pre-process and
generate information forecasts. To do this, it needs
different statistical instruments and languages of
programming. In this article, we will discuss some data
science tools that data scientists use to conduct data
transactions and that we will understand the main features
of the tools, their benefits, and the comparison of different
data science tools.
Top Data Science Tools:-
1. SAS
It is one of those information scientific instruments
designed purely for statistical purposes. SAS is
proprietary closed-source software for analyzing
information by big companies. It is commonly used in
commercial software by experts and businesses. As a data
scientist, SAS provides countless statistical libraries and
instruments to model and organize data. Although SAS is
highly trustable and has strong support, it is high in cost
and used only by larger industries. Moreover, several SAS
libraries and packages are not in the base package and can
be upgraded costly.
2. Apache Spark
Apache Spark, or simply political Spark, is a powerful analytics engine and the most commonly used Data Science
instrument. Spark is intended specifically for batch and stream processing. Spark can manage streaming information better
than other Big Data platforms. However, Spark’s most strong combination with Scala is a virtual Java-based programming
language, which is cross-platform in nature.
Features of Apache Spark:
• Apache Spark has great speed.
• It also has an advanced analytics.
• Apache spark also has a real-time stream processing.
• Dynamic in nature.
• It also has a fault tolerance.
3. BigML
BigML, another data science tool that is used very much. It offers an interactive, cloud-based GUI environment for machine
algorithm processing. BigML offers standardized cloud-based software for the sector. It allows businesses throughout multiple
areas of their enterprise to use Machine Learning algorithms. BigML is an advanced modelling specialist. It utilizes a large
range of algorithms for machine learning, including clustering and classification. You can create a free account or premium
account based on your information needs using the BigML web interface using Rest APIs. It enables interactive information
views and gives you the capacity to export visual diagrams on your mobile or IoT devices.
4. Excel
Excel is created mainly to calculate sheets by Microsoft and is currently commonly used for data processing, complicated and
visualization calculations. Excel is an efficient data science analytical instrument. Excel has several formulas, tables, filters,
slicers and so on. You can also generate your personalized features and formulae with Excel. While Excel is still an ideal
option for powerful data visualization and tablets, it is not intended to calculate huge quantities of data. You also can connect
SQL to Excel and use it for data management and analysis. Many Data Scientists use Excel as an interactive graphical device
for easy pre-processing of information. In general, Excel is an optimal instrument for data analytics at a tiny and non-
enterprise level.
Features of Excel:
• For the small scale data analysis, it is trendy.
• Excel is also used for the spreadsheet calculation and visualization.
• Excel tool pack used for data analysis complex.
• It provides the easy Connection with the SQL.
5. D3.js
6. MatLab
7. NLTK
8. TensorFlow
9. Weka
10. Jupyter
11. Tableau
12. Scikit-learn

More Related Content

Similar to Data Science 1.pdf

STATISTICS.pptx
STATISTICS.pptxSTATISTICS.pptx
STATISTICS.pptx
theadarshagarwal
 
Data science notes for ASDS calicut 2.pptx
Data science notes for ASDS calicut 2.pptxData science notes for ASDS calicut 2.pptx
Data science notes for ASDS calicut 2.pptx
swapnaraghav
 
Datascience
DatascienceDatascience
Datascience
JayaKulshrestha
 
datascience.docx
datascience.docxdatascience.docx
datascience.docx
JayaKulshrestha
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
CIToolkit
 
Research EDU821-1.pptx
Research EDU821-1.pptxResearch EDU821-1.pptx
Research EDU821-1.pptx
SalmaNiazi2
 
what is ..how to process types and methods involved in data analysis
what is ..how to process types and methods involved in data analysiswhat is ..how to process types and methods involved in data analysis
what is ..how to process types and methods involved in data analysis
Data analysis ireland
 
1.pdf
1.pdf1.pdf
Unit 8 data analysis and interpretation
Unit 8 data analysis and interpretationUnit 8 data analysis and interpretation
Unit 8 data analysis and interpretation
Asima shahzadi
 
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
Data Science  & AI Road Map by Python & Computer science tutor in MalaysiaData Science  & AI Road Map by Python & Computer science tutor in Malaysia
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
Ahmed Elmalla
 
Introduction to Data Analysis Course Notes.pdf
Introduction to Data Analysis Course Notes.pdfIntroduction to Data Analysis Course Notes.pdf
Introduction to Data Analysis Course Notes.pdf
GraceOkeke3
 
Introduction-to-Data-Analysis_Final Content.pptx
Introduction-to-Data-Analysis_Final Content.pptxIntroduction-to-Data-Analysis_Final Content.pptx
Introduction-to-Data-Analysis_Final Content.pptx
ItismeItisnotme
 
Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data Analysis
Katy Allen
 
IDS-Unit-II. bachelor of computer applicatio notes
IDS-Unit-II. bachelor of computer applicatio notesIDS-Unit-II. bachelor of computer applicatio notes
IDS-Unit-II. bachelor of computer applicatio notes
AnkurTiwari813070
 
Unit 2_ Descriptive Analytics for MBA .pptx
Unit 2_ Descriptive Analytics for MBA .pptxUnit 2_ Descriptive Analytics for MBA .pptx
Unit 2_ Descriptive Analytics for MBA .pptx
JANNU VINAY
 
Statistics Assignments 090427
Statistics Assignments 090427Statistics Assignments 090427
Statistics Assignments 090427
amykua
 
Week_2_Lecture.pdf
Week_2_Lecture.pdfWeek_2_Lecture.pdf
Week_2_Lecture.pdf
AlbertoLugoGonzalez
 
UNIT 4.pptx
UNIT 4.pptxUNIT 4.pptx
UNIT 4.pptx
SreeLatha98
 
Data analytics
Data analyticsData analytics
Data analytics
Bhanu Pratap
 
presentaion-ni-owel.pptx
presentaion-ni-owel.pptxpresentaion-ni-owel.pptx
presentaion-ni-owel.pptx
JareezRobios
 

Similar to Data Science 1.pdf (20)

STATISTICS.pptx
STATISTICS.pptxSTATISTICS.pptx
STATISTICS.pptx
 
Data science notes for ASDS calicut 2.pptx
Data science notes for ASDS calicut 2.pptxData science notes for ASDS calicut 2.pptx
Data science notes for ASDS calicut 2.pptx
 
Datascience
DatascienceDatascience
Datascience
 
datascience.docx
datascience.docxdatascience.docx
datascience.docx
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
 
Research EDU821-1.pptx
Research EDU821-1.pptxResearch EDU821-1.pptx
Research EDU821-1.pptx
 
what is ..how to process types and methods involved in data analysis
what is ..how to process types and methods involved in data analysiswhat is ..how to process types and methods involved in data analysis
what is ..how to process types and methods involved in data analysis
 
1.pdf
1.pdf1.pdf
1.pdf
 
Unit 8 data analysis and interpretation
Unit 8 data analysis and interpretationUnit 8 data analysis and interpretation
Unit 8 data analysis and interpretation
 
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
Data Science  & AI Road Map by Python & Computer science tutor in MalaysiaData Science  & AI Road Map by Python & Computer science tutor in Malaysia
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
 
Introduction to Data Analysis Course Notes.pdf
Introduction to Data Analysis Course Notes.pdfIntroduction to Data Analysis Course Notes.pdf
Introduction to Data Analysis Course Notes.pdf
 
Introduction-to-Data-Analysis_Final Content.pptx
Introduction-to-Data-Analysis_Final Content.pptxIntroduction-to-Data-Analysis_Final Content.pptx
Introduction-to-Data-Analysis_Final Content.pptx
 
Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data Analysis
 
IDS-Unit-II. bachelor of computer applicatio notes
IDS-Unit-II. bachelor of computer applicatio notesIDS-Unit-II. bachelor of computer applicatio notes
IDS-Unit-II. bachelor of computer applicatio notes
 
Unit 2_ Descriptive Analytics for MBA .pptx
Unit 2_ Descriptive Analytics for MBA .pptxUnit 2_ Descriptive Analytics for MBA .pptx
Unit 2_ Descriptive Analytics for MBA .pptx
 
Statistics Assignments 090427
Statistics Assignments 090427Statistics Assignments 090427
Statistics Assignments 090427
 
Week_2_Lecture.pdf
Week_2_Lecture.pdfWeek_2_Lecture.pdf
Week_2_Lecture.pdf
 
UNIT 4.pptx
UNIT 4.pptxUNIT 4.pptx
UNIT 4.pptx
 
Data analytics
Data analyticsData analytics
Data analytics
 
presentaion-ni-owel.pptx
presentaion-ni-owel.pptxpresentaion-ni-owel.pptx
presentaion-ni-owel.pptx
 

Recently uploaded

Lucknow Girls Call Aliganj 08630512678 Provide Best And Top Girl Service And ...
Lucknow Girls Call Aliganj 08630512678 Provide Best And Top Girl Service And ...Lucknow Girls Call Aliganj 08630512678 Provide Best And Top Girl Service And ...
Lucknow Girls Call Aliganj 08630512678 Provide Best And Top Girl Service And ...
arnavkumar9870
 
SUSD-Procurement Purchasing and Asset Presentation September 2023.pptx
SUSD-Procurement Purchasing and Asset Presentation September 2023.pptxSUSD-Procurement Purchasing and Asset Presentation September 2023.pptx
SUSD-Procurement Purchasing and Asset Presentation September 2023.pptx
bkrishnamoorthy2
 
GT biyezheng degree offer diploma Transcript
GT biyezheng degree offer diploma TranscriptGT biyezheng degree offer diploma Transcript
GT biyezheng degree offer diploma Transcript
xmevus
 
Integrated and localized Approach in Development Communication.pptx
Integrated and localized Approach in Development Communication.pptxIntegrated and localized Approach in Development Communication.pptx
Integrated and localized Approach in Development Communication.pptx
Sayan Bachaspati
 
Chapter 1 - Comparsion of Dispute Resolution Technique - Reading Material.pdf
Chapter 1 - Comparsion of Dispute Resolution Technique - Reading Material.pdfChapter 1 - Comparsion of Dispute Resolution Technique - Reading Material.pdf
Chapter 1 - Comparsion of Dispute Resolution Technique - Reading Material.pdf
Adroit PMC
 
PSUG 3 - 2024-07-15 - Splunk & AI with Philipp Drieger
PSUG 3 - 2024-07-15 - Splunk & AI with Philipp DriegerPSUG 3 - 2024-07-15 - Splunk & AI with Philipp Drieger
PSUG 3 - 2024-07-15 - Splunk & AI with Philipp Drieger
Tomas Moser
 
Hyderabad Girls Call Hyderabad 0X0000000X Unlimited Short Providing Girls Ser...
Hyderabad Girls Call Hyderabad 0X0000000X Unlimited Short Providing Girls Ser...Hyderabad Girls Call Hyderabad 0X0000000X Unlimited Short Providing Girls Ser...
Hyderabad Girls Call Hyderabad 0X0000000X Unlimited Short Providing Girls Ser...
rashmikasinghdelhiro
 
Cornell biyezheng degree offer diploma Transcript
Cornell biyezheng degree offer diploma TranscriptCornell biyezheng degree offer diploma Transcript
Cornell biyezheng degree offer diploma Transcript
xmevus
 
Trapbone Routing Plan created by Marcus Davis Jr
Trapbone Routing Plan created by Marcus Davis JrTrapbone Routing Plan created by Marcus Davis Jr
Trapbone Routing Plan created by Marcus Davis Jr
MarcusDavisJr1
 
Chapter 1 A - Introduction to Arbitration
Chapter 1 A - Introduction to ArbitrationChapter 1 A - Introduction to Arbitration
Chapter 1 A - Introduction to Arbitration
Adroit PMC
 
Presentation on enhancing risk mamangement
Presentation on enhancing risk mamangementPresentation on enhancing risk mamangement
Presentation on enhancing risk mamangement
ananyaplaha10
 
Curtin Cert degree offer diploma
Curtin Cert degree offer diploma Curtin Cert degree offer diploma
Curtin Cert degree offer diploma
popecap
 
VIP Nashik Girls Call Nashik 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Nashik Girls Call Nashik 0X0000000X Doorstep High-Profile Girl Service Ca...VIP Nashik Girls Call Nashik 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Nashik Girls Call Nashik 0X0000000X Doorstep High-Profile Girl Service Ca...
saroohilakhatariroy
 
Flinders Cert degree offer diploma
Flinders Cert degree offer diploma Flinders Cert degree offer diploma
Flinders Cert degree offer diploma
popecap
 
Mysore Girls Call Mysore 0X0000000X Payment On Delevery Cash Hot Premium Genu...
Mysore Girls Call Mysore 0X0000000X Payment On Delevery Cash Hot Premium Genu...Mysore Girls Call Mysore 0X0000000X Payment On Delevery Cash Hot Premium Genu...
Mysore Girls Call Mysore 0X0000000X Payment On Delevery Cash Hot Premium Genu...
seenaoberoi
 
CULTURE-The way of life for entire society.
CULTURE-The way of life for entire society.CULTURE-The way of life for entire society.
CULTURE-The way of life for entire society.
RIYAPAWASHE
 
NAAC REFORMS IN ACCREDITATION 2024.pptx
NAAC REFORMS IN ACCREDITATION  2024.pptxNAAC REFORMS IN ACCREDITATION  2024.pptx
NAAC REFORMS IN ACCREDITATION 2024.pptx
VeluSureshKumar
 
2024-07-14 Transformed 07 (shared slides).pptx
2024-07-14 Transformed 07 (shared slides).pptx2024-07-14 Transformed 07 (shared slides).pptx
2024-07-14 Transformed 07 (shared slides).pptx
Dale Wells
 
UCI biyezheng degree offer diploma Transcript
UCI biyezheng degree offer diploma TranscriptUCI biyezheng degree offer diploma Transcript
UCI biyezheng degree offer diploma Transcript
xmevus
 
VIP Mysore Girls Call Mysore 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Mysore Girls Call Mysore 0X0000000X Doorstep High-Profile Girl Service Ca...VIP Mysore Girls Call Mysore 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Mysore Girls Call Mysore 0X0000000X Doorstep High-Profile Girl Service Ca...
satpalsheravatmumbai
 

Recently uploaded (20)

Lucknow Girls Call Aliganj 08630512678 Provide Best And Top Girl Service And ...
Lucknow Girls Call Aliganj 08630512678 Provide Best And Top Girl Service And ...Lucknow Girls Call Aliganj 08630512678 Provide Best And Top Girl Service And ...
Lucknow Girls Call Aliganj 08630512678 Provide Best And Top Girl Service And ...
 
SUSD-Procurement Purchasing and Asset Presentation September 2023.pptx
SUSD-Procurement Purchasing and Asset Presentation September 2023.pptxSUSD-Procurement Purchasing and Asset Presentation September 2023.pptx
SUSD-Procurement Purchasing and Asset Presentation September 2023.pptx
 
GT biyezheng degree offer diploma Transcript
GT biyezheng degree offer diploma TranscriptGT biyezheng degree offer diploma Transcript
GT biyezheng degree offer diploma Transcript
 
Integrated and localized Approach in Development Communication.pptx
Integrated and localized Approach in Development Communication.pptxIntegrated and localized Approach in Development Communication.pptx
Integrated and localized Approach in Development Communication.pptx
 
Chapter 1 - Comparsion of Dispute Resolution Technique - Reading Material.pdf
Chapter 1 - Comparsion of Dispute Resolution Technique - Reading Material.pdfChapter 1 - Comparsion of Dispute Resolution Technique - Reading Material.pdf
Chapter 1 - Comparsion of Dispute Resolution Technique - Reading Material.pdf
 
PSUG 3 - 2024-07-15 - Splunk & AI with Philipp Drieger
PSUG 3 - 2024-07-15 - Splunk & AI with Philipp DriegerPSUG 3 - 2024-07-15 - Splunk & AI with Philipp Drieger
PSUG 3 - 2024-07-15 - Splunk & AI with Philipp Drieger
 
Hyderabad Girls Call Hyderabad 0X0000000X Unlimited Short Providing Girls Ser...
Hyderabad Girls Call Hyderabad 0X0000000X Unlimited Short Providing Girls Ser...Hyderabad Girls Call Hyderabad 0X0000000X Unlimited Short Providing Girls Ser...
Hyderabad Girls Call Hyderabad 0X0000000X Unlimited Short Providing Girls Ser...
 
Cornell biyezheng degree offer diploma Transcript
Cornell biyezheng degree offer diploma TranscriptCornell biyezheng degree offer diploma Transcript
Cornell biyezheng degree offer diploma Transcript
 
Trapbone Routing Plan created by Marcus Davis Jr
Trapbone Routing Plan created by Marcus Davis JrTrapbone Routing Plan created by Marcus Davis Jr
Trapbone Routing Plan created by Marcus Davis Jr
 
Chapter 1 A - Introduction to Arbitration
Chapter 1 A - Introduction to ArbitrationChapter 1 A - Introduction to Arbitration
Chapter 1 A - Introduction to Arbitration
 
Presentation on enhancing risk mamangement
Presentation on enhancing risk mamangementPresentation on enhancing risk mamangement
Presentation on enhancing risk mamangement
 
Curtin Cert degree offer diploma
Curtin Cert degree offer diploma Curtin Cert degree offer diploma
Curtin Cert degree offer diploma
 
VIP Nashik Girls Call Nashik 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Nashik Girls Call Nashik 0X0000000X Doorstep High-Profile Girl Service Ca...VIP Nashik Girls Call Nashik 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Nashik Girls Call Nashik 0X0000000X Doorstep High-Profile Girl Service Ca...
 
Flinders Cert degree offer diploma
Flinders Cert degree offer diploma Flinders Cert degree offer diploma
Flinders Cert degree offer diploma
 
Mysore Girls Call Mysore 0X0000000X Payment On Delevery Cash Hot Premium Genu...
Mysore Girls Call Mysore 0X0000000X Payment On Delevery Cash Hot Premium Genu...Mysore Girls Call Mysore 0X0000000X Payment On Delevery Cash Hot Premium Genu...
Mysore Girls Call Mysore 0X0000000X Payment On Delevery Cash Hot Premium Genu...
 
CULTURE-The way of life for entire society.
CULTURE-The way of life for entire society.CULTURE-The way of life for entire society.
CULTURE-The way of life for entire society.
 
NAAC REFORMS IN ACCREDITATION 2024.pptx
NAAC REFORMS IN ACCREDITATION  2024.pptxNAAC REFORMS IN ACCREDITATION  2024.pptx
NAAC REFORMS IN ACCREDITATION 2024.pptx
 
2024-07-14 Transformed 07 (shared slides).pptx
2024-07-14 Transformed 07 (shared slides).pptx2024-07-14 Transformed 07 (shared slides).pptx
2024-07-14 Transformed 07 (shared slides).pptx
 
UCI biyezheng degree offer diploma Transcript
UCI biyezheng degree offer diploma TranscriptUCI biyezheng degree offer diploma Transcript
UCI biyezheng degree offer diploma Transcript
 
VIP Mysore Girls Call Mysore 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Mysore Girls Call Mysore 0X0000000X Doorstep High-Profile Girl Service Ca...VIP Mysore Girls Call Mysore 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Mysore Girls Call Mysore 0X0000000X Doorstep High-Profile Girl Service Ca...
 

Data Science 1.pdf

  • 1. Data Science Data science is a field of applied mathematics and statistics that provides useful information based on large amounts of complex data or big data. It uses scientific approaches, procedures, algorithms, the framework to extract the knowledge and insight from a huge amount of data. Data science is a concept to bring together ideas, data examination, Machine Learning, and their related strategies to comprehend and dissect genuine phenomena with data. KEY TAKEAWAYS •Data science uses techniques such as machine learning and artificial intelligence to extract meaningful information and to predict future patterns and behaviors. •Advances in technology, the internet, social media, and the use of technology have all increased access to big data. •The field of data science is growing as technology advances and big data collection and analysis techniques become more sophisticated.
  • 2. Statistics:- Math is probably one of the most important topics that are the core of almost all the advances in technology. The filed of data science wouldn’t have existed without maths. Machine Learning and Statistics are the two core skills required to become a data scientist. Statistics is like the heart of Data Science that helps to analyze, transform and predict data. Statistics is usually a part of mathematics wherein tables of data are operated upon to calculate metrics like mean, median, and standard deviation. These metrics are then used to characterize the available data so that it can be used in decision-making processes. These metrics are then used to characterize the available data so that it can be used in decision-making processes. 7 Basic Statistics Concepts For Data Science:- 1. Descriptive Statistics:- It is used to describe the basic features of data that provide a summary of the given data set which can either represent the entire population or a sample of the population. It is derived from calculations that include: Mean: It is the central value which is commonly known as arithmetic average. Mode: It refers to the value that appears most often in a data set. Median: It is the middle value of the ordered set that divides it in exactly half.
  • 3. 2. Variability:- • Variability includes the following parameters: • Standard Deviation: It is a statistic that calculates the dispersion of a data set as compared to its mean. • Variance: It refers to a statistical measure of the spread between the numbers in a data set. In general terms, it means the difference from the mean. A large variance indicates that numbers are far apart from the mean or average value. Small variance indicates that the numbers are closer to the average values. Zero variance indicates that the values are identical to the given set. • Range: This is defined as the difference between the largest and smallest value of a dataset. • Percentile: It refers to the measure used in statistics that indicates the value below which the given percentage of observation in the dataset falls. • Quartile: It is defined as the value that divides the data points into quarters. • Interquartile Range: It measures the middle half of your data. In general terms, it is the middle 50% of the dataset.
  • 4. 3. Correlation:- • It is one of the major statistical techniques that measure the relationship between two variables. The correlation coefficient indicates the strength of the linear relationship between two variables. • A correlation coefficient that is more than zero indicates a positive relationship. • A correlation coefficient that is less than zero indicates a negative relationship. • Correlation coefficient zero indicates that there is no relationship between the two variables. 4. Probability Distribution:- • It specifies the likelihood of all possible events. In simple terms, an event refers to the result of an experiment like tossing a coin. Events are of two types dependent and independent. • Independent event: The event is said to be an Independent event when it is not affected by the earlier events. For example, tossing a coin, let us consider a coin is tossed the first outcome is head when the coin is tossed again the outcome may be head or tail. But this is entirely independent of the first trial. • Dependent event: The event is said to be dependent when the occurrence of the event is dependent on the earlier events. For example when a ball is drawn from a bag that contains red and blue balls. If the first ball drawn is red, then the second ball may be red or blue; this depends on the first trial. The probability of independent events is calculated by simply multiplying the probability of each event and for a dependent event is calculated by conditional probability.
  • 5. 5. Regression:- It is a method that is used to determine the relationship between one or more independent variables and a dependent variable. Regression is mainly of two types: • Linear regression: It is used to fit the regression model that explains the relationship between a numeric predictor variable and one or more predictor variables. • Logistic regression: It is used to fit a regression model that explains the relationship between the binary response variable and one or more predictor variables. 6. Normal Distribution:- Normal is used to define the probability density function for a continuous random variable in a system. The standard normal distribution has two parameters – mean and standard deviation that are discussed above. When the distribution of random variables is unknown, the normal distribution is used. The central limit theorem justifies why normal distribution is used in such cases. 7. Bias:- • In statistical terms, it means when a model is representative of a complete population. This needs to be minimized to get the desired outcome. • The three most common types of bias are: • Selection bias: It is a phenomenon of selecting a group of data for statistical analysis, the selection in such a way that data is not randomized resulting in the data being unrepresentative of the whole population. • Confirmation bias: It occurs when the person performing the statistical analysis has some predefined assumption. • Time interval bias: It is caused intentionally by specifying a certain time range to favor a particular outcome.
  • 6. Programming tools using Data Science A data scientist shall extract, manipulate, pre-process and generate information forecasts. To do this, it needs different statistical instruments and languages of programming. In this article, we will discuss some data science tools that data scientists use to conduct data transactions and that we will understand the main features of the tools, their benefits, and the comparison of different data science tools. Top Data Science Tools:- 1. SAS It is one of those information scientific instruments designed purely for statistical purposes. SAS is proprietary closed-source software for analyzing information by big companies. It is commonly used in commercial software by experts and businesses. As a data scientist, SAS provides countless statistical libraries and instruments to model and organize data. Although SAS is highly trustable and has strong support, it is high in cost and used only by larger industries. Moreover, several SAS libraries and packages are not in the base package and can be upgraded costly.
  • 7. 2. Apache Spark Apache Spark, or simply political Spark, is a powerful analytics engine and the most commonly used Data Science instrument. Spark is intended specifically for batch and stream processing. Spark can manage streaming information better than other Big Data platforms. However, Spark’s most strong combination with Scala is a virtual Java-based programming language, which is cross-platform in nature. Features of Apache Spark: • Apache Spark has great speed. • It also has an advanced analytics. • Apache spark also has a real-time stream processing. • Dynamic in nature. • It also has a fault tolerance. 3. BigML BigML, another data science tool that is used very much. It offers an interactive, cloud-based GUI environment for machine algorithm processing. BigML offers standardized cloud-based software for the sector. It allows businesses throughout multiple areas of their enterprise to use Machine Learning algorithms. BigML is an advanced modelling specialist. It utilizes a large range of algorithms for machine learning, including clustering and classification. You can create a free account or premium account based on your information needs using the BigML web interface using Rest APIs. It enables interactive information views and gives you the capacity to export visual diagrams on your mobile or IoT devices.
  • 8. 4. Excel Excel is created mainly to calculate sheets by Microsoft and is currently commonly used for data processing, complicated and visualization calculations. Excel is an efficient data science analytical instrument. Excel has several formulas, tables, filters, slicers and so on. You can also generate your personalized features and formulae with Excel. While Excel is still an ideal option for powerful data visualization and tablets, it is not intended to calculate huge quantities of data. You also can connect SQL to Excel and use it for data management and analysis. Many Data Scientists use Excel as an interactive graphical device for easy pre-processing of information. In general, Excel is an optimal instrument for data analytics at a tiny and non- enterprise level. Features of Excel: • For the small scale data analysis, it is trendy. • Excel is also used for the spreadsheet calculation and visualization. • Excel tool pack used for data analysis complex. • It provides the easy Connection with the SQL. 5. D3.js 6. MatLab 7. NLTK 8. TensorFlow 9. Weka 10. Jupyter 11. Tableau 12. Scikit-learn