## Similar to Data Science 1.pdf

STATISTICS.pptx
STATISTICS.pptx
theadarshagarwal

Data science notes for ASDS calicut 2.pptx
Data science notes for ASDS calicut 2.pptx
swapnaraghav

Datascience
Datascience
JayaKulshrestha

datascience.docx
datascience.docx
JayaKulshrestha

Descriptive Statistics
Descriptive Statistics
CIToolkit

Research EDU821-1.pptx
Research EDU821-1.pptx
SalmaNiazi2

what is ..how to process types and methods involved in data analysis
what is ..how to process types and methods involved in data analysis
Data analysis ireland

1.pdf
Unit 8 data analysis and interpretation
Unit 8 data analysis and interpretation
Asima shahzadi

Data Science & AI Road Map by Python & Computer science tutor in Malaysia
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
Ahmed Elmalla

Introduction to Data Analysis Course Notes.pdf
Introduction to Data Analysis Course Notes.pdf
GraceOkeke3

Introduction-to-Data-Analysis_Final Content.pptx
Introduction-to-Data-Analysis_Final Content.pptx
ItismeItisnotme

Exploratory Data Analysis
Exploratory Data Analysis
Katy Allen

IDS-Unit-II. bachelor of computer applicatio notes
IDS-Unit-II. bachelor of computer applicatio notes
AnkurTiwari813070

Unit 2_ Descriptive Analytics for MBA .pptx
Unit 2_ Descriptive Analytics for MBA .pptx
JANNU VINAY

Statistics Assignments 090427
Statistics Assignments 090427
amykua

Week_2_Lecture.pdf
Week_2_Lecture.pdf
AlbertoLugoGonzalez

UNIT 4.pptx
UNIT 4.pptx
SreeLatha98

Data analytics
Data analytics
Bhanu Pratap

presentaion-ni-owel.pptx
presentaion-ni-owel.pptx
JareezRobios

### Similar to Data Science 1.pdf(20)

STATISTICS.pptx
STATISTICS.pptx

Data science notes for ASDS calicut 2.pptx
Data science notes for ASDS calicut 2.pptx

Datascience
Datascience

datascience.docx
datascience.docx

Descriptive Statistics
Descriptive Statistics

Research EDU821-1.pptx
Research EDU821-1.pptx

what is ..how to process types and methods involved in data analysis
what is ..how to process types and methods involved in data analysis

1.pdf
1.pdf

Unit 8 data analysis and interpretation
Unit 8 data analysis and interpretation

Data Science & AI Road Map by Python & Computer science tutor in Malaysia
Data Science & AI Road Map by Python & Computer science tutor in Malaysia

Introduction to Data Analysis Course Notes.pdf
Introduction to Data Analysis Course Notes.pdf

Introduction-to-Data-Analysis_Final Content.pptx
Introduction-to-Data-Analysis_Final Content.pptx

Exploratory Data Analysis
Exploratory Data Analysis

IDS-Unit-II. bachelor of computer applicatio notes
IDS-Unit-II. bachelor of computer applicatio notes

Unit 2_ Descriptive Analytics for MBA .pptx
Unit 2_ Descriptive Analytics for MBA .pptx

Statistics Assignments 090427
Statistics Assignments 090427

Week_2_Lecture.pdf
Week_2_Lecture.pdf

UNIT 4.pptx
UNIT 4.pptx

Data analytics
Data analytics

presentaion-ni-owel.pptx
presentaion-ni-owel.pptx

## Recently uploaded

Lucknow Girls Call Aliganj 08630512678 Provide Best And Top Girl Service And ...
Lucknow Girls Call Aliganj 08630512678 Provide Best And Top Girl Service And ...
arnavkumar9870

SUSD-Procurement Purchasing and Asset Presentation September 2023.pptx
SUSD-Procurement Purchasing and Asset Presentation September 2023.pptx
bkrishnamoorthy2

GT biyezheng degree offer diploma Transcript
GT biyezheng degree offer diploma Transcript
xmevus

Integrated and localized Approach in Development Communication.pptx
Integrated and localized Approach in Development Communication.pptx
Sayan Bachaspati

Chapter 1 - Comparsion of Dispute Resolution Technique - Reading Material.pdf
Chapter 1 - Comparsion of Dispute Resolution Technique - Reading Material.pdf
Adroit PMC

PSUG 3 - 2024-07-15 - Splunk & AI with Philipp Drieger
PSUG 3 - 2024-07-15 - Splunk & AI with Philipp Drieger
Tomas Moser

Hyderabad Girls Call Hyderabad 0X0000000X Unlimited Short Providing Girls Ser...
Hyderabad Girls Call Hyderabad 0X0000000X Unlimited Short Providing Girls Ser...
rashmikasinghdelhiro

Cornell biyezheng degree offer diploma Transcript
Cornell biyezheng degree offer diploma Transcript
xmevus

Trapbone Routing Plan created by Marcus Davis Jr
Trapbone Routing Plan created by Marcus Davis Jr
MarcusDavisJr1

Chapter 1 A - Introduction to Arbitration
Chapter 1 A - Introduction to Arbitration
Adroit PMC

Presentation on enhancing risk mamangement
Presentation on enhancing risk mamangement
ananyaplaha10

Curtin Cert degree offer diploma
Curtin Cert degree offer diploma
popecap

VIP Nashik Girls Call Nashik 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Nashik Girls Call Nashik 0X0000000X Doorstep High-Profile Girl Service Ca...
saroohilakhatariroy

Flinders Cert degree offer diploma
Flinders Cert degree offer diploma
popecap

Mysore Girls Call Mysore 0X0000000X Payment On Delevery Cash Hot Premium Genu...
Mysore Girls Call Mysore 0X0000000X Payment On Delevery Cash Hot Premium Genu...
seenaoberoi

CULTURE-The way of life for entire society.
CULTURE-The way of life for entire society.
RIYAPAWASHE

NAAC REFORMS IN ACCREDITATION 2024.pptx
NAAC REFORMS IN ACCREDITATION 2024.pptx
VeluSureshKumar

2024-07-14 Transformed 07 (shared slides).pptx
2024-07-14 Transformed 07 (shared slides).pptx
Dale Wells

UCI biyezheng degree offer diploma Transcript
UCI biyezheng degree offer diploma Transcript
xmevus

VIP Mysore Girls Call Mysore 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Mysore Girls Call Mysore 0X0000000X Doorstep High-Profile Girl Service Ca...
satpalsheravatmumbai

### Recently uploaded(20)

Lucknow Girls Call Aliganj 08630512678 Provide Best And Top Girl Service And ...
Lucknow Girls Call Aliganj 08630512678 Provide Best And Top Girl Service And ...

SUSD-Procurement Purchasing and Asset Presentation September 2023.pptx
SUSD-Procurement Purchasing and Asset Presentation September 2023.pptx

GT biyezheng degree offer diploma Transcript
GT biyezheng degree offer diploma Transcript

Integrated and localized Approach in Development Communication.pptx
Integrated and localized Approach in Development Communication.pptx

Chapter 1 - Comparsion of Dispute Resolution Technique - Reading Material.pdf
Chapter 1 - Comparsion of Dispute Resolution Technique - Reading Material.pdf

PSUG 3 - 2024-07-15 - Splunk & AI with Philipp Drieger
PSUG 3 - 2024-07-15 - Splunk & AI with Philipp Drieger

Hyderabad Girls Call Hyderabad 0X0000000X Unlimited Short Providing Girls Ser...
Hyderabad Girls Call Hyderabad 0X0000000X Unlimited Short Providing Girls Ser...

Cornell biyezheng degree offer diploma Transcript
Cornell biyezheng degree offer diploma Transcript

Trapbone Routing Plan created by Marcus Davis Jr
Trapbone Routing Plan created by Marcus Davis Jr

Chapter 1 A - Introduction to Arbitration
Chapter 1 A - Introduction to Arbitration

Presentation on enhancing risk mamangement
Presentation on enhancing risk mamangement

Curtin Cert degree offer diploma
Curtin Cert degree offer diploma

VIP Nashik Girls Call Nashik 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Nashik Girls Call Nashik 0X0000000X Doorstep High-Profile Girl Service Ca...

Flinders Cert degree offer diploma
Flinders Cert degree offer diploma

Mysore Girls Call Mysore 0X0000000X Payment On Delevery Cash Hot Premium Genu...
Mysore Girls Call Mysore 0X0000000X Payment On Delevery Cash Hot Premium Genu...

CULTURE-The way of life for entire society.
CULTURE-The way of life for entire society.

NAAC REFORMS IN ACCREDITATION 2024.pptx
NAAC REFORMS IN ACCREDITATION 2024.pptx

2024-07-14 Transformed 07 (shared slides).pptx
2024-07-14 Transformed 07 (shared slides).pptx

UCI biyezheng degree offer diploma Transcript
UCI biyezheng degree offer diploma Transcript

VIP Mysore Girls Call Mysore 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Mysore Girls Call Mysore 0X0000000X Doorstep High-Profile Girl Service Ca...

### Data Science 1.pdf

• 1. Data Science Data science is a field of applied mathematics and statistics that provides useful information based on large amounts of complex data or big data. It uses scientific approaches, procedures, algorithms, the framework to extract the knowledge and insight from a huge amount of data. Data science is a concept to bring together ideas, data examination, Machine Learning, and their related strategies to comprehend and dissect genuine phenomena with data. KEY TAKEAWAYS •Data science uses techniques such as machine learning and artificial intelligence to extract meaningful information and to predict future patterns and behaviors. •Advances in technology, the internet, social media, and the use of technology have all increased access to big data. •The field of data science is growing as technology advances and big data collection and analysis techniques become more sophisticated.
• 2. Statistics:- Math is probably one of the most important topics that are the core of almost all the advances in technology. The filed of data science wouldn’t have existed without maths. Machine Learning and Statistics are the two core skills required to become a data scientist. Statistics is like the heart of Data Science that helps to analyze, transform and predict data. Statistics is usually a part of mathematics wherein tables of data are operated upon to calculate metrics like mean, median, and standard deviation. These metrics are then used to characterize the available data so that it can be used in decision-making processes. These metrics are then used to characterize the available data so that it can be used in decision-making processes. 7 Basic Statistics Concepts For Data Science:- 1. Descriptive Statistics:- It is used to describe the basic features of data that provide a summary of the given data set which can either represent the entire population or a sample of the population. It is derived from calculations that include: Mean: It is the central value which is commonly known as arithmetic average. Mode: It refers to the value that appears most often in a data set. Median: It is the middle value of the ordered set that divides it in exactly half.
• 3. 2. Variability:- • Variability includes the following parameters: • Standard Deviation: It is a statistic that calculates the dispersion of a data set as compared to its mean. • Variance: It refers to a statistical measure of the spread between the numbers in a data set. In general terms, it means the difference from the mean. A large variance indicates that numbers are far apart from the mean or average value. Small variance indicates that the numbers are closer to the average values. Zero variance indicates that the values are identical to the given set. • Range: This is defined as the difference between the largest and smallest value of a dataset. • Percentile: It refers to the measure used in statistics that indicates the value below which the given percentage of observation in the dataset falls. • Quartile: It is defined as the value that divides the data points into quarters. • Interquartile Range: It measures the middle half of your data. In general terms, it is the middle 50% of the dataset.
• 4. 3. Correlation:- • It is one of the major statistical techniques that measure the relationship between two variables. The correlation coefficient indicates the strength of the linear relationship between two variables. • A correlation coefficient that is more than zero indicates a positive relationship. • A correlation coefficient that is less than zero indicates a negative relationship. • Correlation coefficient zero indicates that there is no relationship between the two variables. 4. Probability Distribution:- • It specifies the likelihood of all possible events. In simple terms, an event refers to the result of an experiment like tossing a coin. Events are of two types dependent and independent. • Independent event: The event is said to be an Independent event when it is not affected by the earlier events. For example, tossing a coin, let us consider a coin is tossed the first outcome is head when the coin is tossed again the outcome may be head or tail. But this is entirely independent of the first trial. • Dependent event: The event is said to be dependent when the occurrence of the event is dependent on the earlier events. For example when a ball is drawn from a bag that contains red and blue balls. If the first ball drawn is red, then the second ball may be red or blue; this depends on the first trial. The probability of independent events is calculated by simply multiplying the probability of each event and for a dependent event is calculated by conditional probability.
• 5. 5. Regression:- It is a method that is used to determine the relationship between one or more independent variables and a dependent variable. Regression is mainly of two types: • Linear regression: It is used to fit the regression model that explains the relationship between a numeric predictor variable and one or more predictor variables. • Logistic regression: It is used to fit a regression model that explains the relationship between the binary response variable and one or more predictor variables. 6. Normal Distribution:- Normal is used to define the probability density function for a continuous random variable in a system. The standard normal distribution has two parameters – mean and standard deviation that are discussed above. When the distribution of random variables is unknown, the normal distribution is used. The central limit theorem justifies why normal distribution is used in such cases. 7. Bias:- • In statistical terms, it means when a model is representative of a complete population. This needs to be minimized to get the desired outcome. • The three most common types of bias are: • Selection bias: It is a phenomenon of selecting a group of data for statistical analysis, the selection in such a way that data is not randomized resulting in the data being unrepresentative of the whole population. • Confirmation bias: It occurs when the person performing the statistical analysis has some predefined assumption. • Time interval bias: It is caused intentionally by specifying a certain time range to favor a particular outcome.
• 6. Programming tools using Data Science A data scientist shall extract, manipulate, pre-process and generate information forecasts. To do this, it needs different statistical instruments and languages of programming. In this article, we will discuss some data science tools that data scientists use to conduct data transactions and that we will understand the main features of the tools, their benefits, and the comparison of different data science tools. Top Data Science Tools:- 1. SAS It is one of those information scientific instruments designed purely for statistical purposes. SAS is proprietary closed-source software for analyzing information by big companies. It is commonly used in commercial software by experts and businesses. As a data scientist, SAS provides countless statistical libraries and instruments to model and organize data. Although SAS is highly trustable and has strong support, it is high in cost and used only by larger industries. Moreover, several SAS libraries and packages are not in the base package and can be upgraded costly.
• 7. 2. Apache Spark Apache Spark, or simply political Spark, is a powerful analytics engine and the most commonly used Data Science instrument. Spark is intended specifically for batch and stream processing. Spark can manage streaming information better than other Big Data platforms. However, Spark’s most strong combination with Scala is a virtual Java-based programming language, which is cross-platform in nature. Features of Apache Spark: • Apache Spark has great speed. • It also has an advanced analytics. • Apache spark also has a real-time stream processing. • Dynamic in nature. • It also has a fault tolerance. 3. BigML BigML, another data science tool that is used very much. It offers an interactive, cloud-based GUI environment for machine algorithm processing. BigML offers standardized cloud-based software for the sector. It allows businesses throughout multiple areas of their enterprise to use Machine Learning algorithms. BigML is an advanced modelling specialist. It utilizes a large range of algorithms for machine learning, including clustering and classification. You can create a free account or premium account based on your information needs using the BigML web interface using Rest APIs. It enables interactive information views and gives you the capacity to export visual diagrams on your mobile or IoT devices.
• 8. 4. Excel Excel is created mainly to calculate sheets by Microsoft and is currently commonly used for data processing, complicated and visualization calculations. Excel is an efficient data science analytical instrument. Excel has several formulas, tables, filters, slicers and so on. You can also generate your personalized features and formulae with Excel. While Excel is still an ideal option for powerful data visualization and tablets, it is not intended to calculate huge quantities of data. You also can connect SQL to Excel and use it for data management and analysis. Many Data Scientists use Excel as an interactive graphical device for easy pre-processing of information. In general, Excel is an optimal instrument for data analytics at a tiny and non- enterprise level. Features of Excel: • For the small scale data analysis, it is trendy. • Excel is also used for the spreadsheet calculation and visualization. • Excel tool pack used for data analysis complex. • It provides the easy Connection with the SQL. 5. D3.js 6. MatLab 7. NLTK 8. TensorFlow 9. Weka 10. Jupyter 11. Tableau 12. Scikit-learn
Current LanguageEnglish
Español
Portugues
Français
Deutsche
© 2024 SlideShare from Scribd