SlideShare a Scribd company logo
1 of 16
UNIT I
INTRODUCTION TO DATA SCIENCE
What is Data Science?
► Data Science is an interdisciplinary field making use of scientific
methods, processes, algorithms and systems for extracting
knowledge and insights from structured and unstructured data, and
applies knowledge and actionable insight from data across a broad
range of application domains.
Data Science Definition
► Data science is the practice of mining large data sets of raw data,
structured and unstructured for identifying patterns and extract
actionable insight from it. It is an interdisciplinary field and the
foundation of data science includes statistics, inference, computer
science, predictive analytics, machine learning algorithm
development, and new technologies for gaining insights from big
data. Data science life cycle includes acquiring data, extracting
and entering it in the system.
► Next stage includes maintenance, including data warehousing,
data cleaning, data processing, data staging, and data
architecture.
Stages of Data Science Lifecycle
Data science has five stages:
► Capture: Data acquisition, data entry, signal reception, data
extraction
► Maintain: Data warehousing, data cleansing, data staging, data
processing, data architecture
► Process: Data mining, clustering/classification, data modeling, data
summarization
► Communicate: Data reporting, data visualization, business
intelligence, decision making
► Analyze: Exploratory/confirmatory, predictive analysis, regression,
text mining, qualitative analysis
Why Businesses need Data
Science?
► The amount of data created every day has resulted in need for
professionals to tackle and make sense of it.
► There is a huge mine of unstructured and semi-structure data
coming from various sources and the traditional business
intelligence tools are just not sufficient to make sense of it.
► Data science offers advanced tools for working on large volumes of
data coming from various types of sources such as financial logs,
marketing forms, sensors, instruments, text files, and multimedia files.
Job Roles in Data Science
► Data Analyst
► Data Engineers
► Database Administrator
► Machine Learning Engineer
► Data Scientist
► Data Architect
► Statistician
► Business Analyst
► Data and Analytics Manager
Skill Set Needed for a Data Scientist
► Technical
► Statistical analysis and computing
► Machine Learning
► Deep Learning
► Processing large data sets
► Data Visualization
► Data Wrangling
► Mathematics
► Programming
► Statistics
► Big Data
Skill Set Needed for a Data Scientist
► Non-Technical
► Critical Thinking
► Effective Communication
► Proactive Problem Solving
► Intellectual Curiosity
► Business Sense
Statistical Inference
Statistical inference is the
process of using data
analysis to infer properties of
an underlying distribution of
probability.
EDA and the Data Science Process
Basic Tools of EDA
Some of the most common tools used to create an EDA are:
1. R: An open-source programming language and free software environment
for statistical computing and graphics supported by the R foundation for
statistical computing. The R language is widely used among statisticians in
developing statistical observations and data analys
2. Python: An interpreted, object-oriented programming language with
dynamic semantics. Its high level, built-in data structures, combined with
dynamic binding, make it very attractive for rapid application development,
also as to be used as a scripting or glue language to attach existing
components together. Python and EDA are often used together to spot missing
values in the data set, which is vital so you’ll decide the way to handle missing
values for machine learning.
Application of Data Science
► Anomaly detection (fraud, disease, crime, etc.)
► Automation and decision-making (background checks, credit
worthiness, etc.)
► Classifications (in an email server, this could mean classifying emails
as important or junk)
► Forecasting (sales, revenue and customer retention)
► Pattern detection (weather patterns, financial market patterns, etc.)
► Recognition (facial, voice, text, etc.)
► Recommendations (based on learned preferences,
recommendation engines can refer you to movies, restaurants and
books you may like)
Data Science in Business
► Gain Customer Insights
► Increase Security
► Inform Internal Finances
► Streamline Manufacturing
► Predict Future Market Trends
Business Intelligence Vs Data
Science
S.No Factor Data Science Business Intelligence
1 Concept It is a field that uses mathematics,
statistics and various other tools to
discover the hidden patterns in the
data.
It is basically a set of technologies,
applications and processes that are used by
the enterprises for business data analysis.
2 Focus It focuses on the future. It focuses on the past and present.
3 Data It deals with both structured as well
as unstructured data.
It mainly deals only with structured data.
4 Flexibility Data science is much more flexible
as data sources can be added as per
requirement.
It is less flexible as in case of business
intelligence data sources need to be pre-
planned.
5 Method It makes use of the scientific method. It makes use of the analytic method.
Data Analytics Lifecycle
Machine Learning
Machine learning (ML) is a type of artificial intelligence (AI) that allows software
applications to become more accurate at predicting outcomes without being explicitly
programmed to do so. Machine learning algorithms use historical data as input to
predict new output values.
Why is machine learning important?
Machine learning is important because it gives enterprises a view of trends in customer
behavior and business operational patterns, as well as supports the development of
new products. Many of today's leading companies, such as Facebook, Google and
Uber, make machine learning a central part of their operations. Machine learning has
become a significant competitive differentiator for many companies.
What are the different types of machine learning?
Classical machine learning is often categorized by how an algorithm learns to become
more accurate in its predictions. There are four basic approaches:supervised learning,
unsupervised learning, semi-supervised learning and reinforcement learning. The type
of algorithm data scientists choose to use depends on what type of data they want to
predict.

More Related Content

Similar to INTRODUCTION TO DATA SCIENCE -CONCEPTS.pptx

Introduction to Data Science: Unveiling Insights Hidden in Data
Introduction to Data Science: Unveiling Insights Hidden in DataIntroduction to Data Science: Unveiling Insights Hidden in Data
Introduction to Data Science: Unveiling Insights Hidden in Datahemayadav41
 
L3 Big Data and Application.pptx
L3  Big Data and Application.pptxL3  Big Data and Application.pptx
L3 Big Data and Application.pptxShambhavi Vats
 
Data Science Training in Chandigarh h
Data Science Training in Chandigarh    hData Science Training in Chandigarh    h
Data Science Training in Chandigarh hasmeerana605
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceSwapnilSaurav10
 
Ch1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxCh1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxAbderrahmanABID2
 
Unlocking Insights_ The Power of Data Analytics in the Modern World.pptx
Unlocking Insights_ The Power of Data Analytics in the Modern World.pptxUnlocking Insights_ The Power of Data Analytics in the Modern World.pptx
Unlocking Insights_ The Power of Data Analytics in the Modern World.pptxAPTRON Solutions Noida
 
Unveiling the Power of Data Analytics Transforming Insights into Action.pdf
Unveiling the Power of Data Analytics Transforming Insights into Action.pdfUnveiling the Power of Data Analytics Transforming Insights into Action.pdf
Unveiling the Power of Data Analytics Transforming Insights into Action.pdfKajal Digital
 
Data Science- Basics.pptx
Data Science- Basics.pptxData Science- Basics.pptx
Data Science- Basics.pptxRupaliKute3
 
Data analytics presentation- Management career institute
Data analytics presentation- Management career institute Data analytics presentation- Management career institute
Data analytics presentation- Management career institute PoojaPatidar11
 
Fundamentals of data mining and its applications
Fundamentals of data mining and its applicationsFundamentals of data mining and its applications
Fundamentals of data mining and its applicationsSubrat Swain
 
Introduction to Data Science.pdf
Introduction to Data Science.pdfIntroduction to Data Science.pdf
Introduction to Data Science.pdfUniversity of Sindh
 
Big Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptxBig Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptxPrabhaJoshi4
 
Data Analytics Course In Surat.pdf
Data Analytics Course In Surat.pdfData Analytics Course In Surat.pdf
Data Analytics Course In Surat.pdfSujata Gupta
 
Introduction-to-Data-Science.pdf
Introduction-to-Data-Science.pdfIntroduction-to-Data-Science.pdf
Introduction-to-Data-Science.pdfmallikarjuntalakal
 
Introduction-to-Data-Science.pdf
Introduction-to-Data-Science.pdfIntroduction-to-Data-Science.pdf
Introduction-to-Data-Science.pdfikenossama03
 
Data Analytics Training Course in Noida.pptx
Data Analytics Training Course in Noida.pptxData Analytics Training Course in Noida.pptx
Data Analytics Training Course in Noida.pptxAPTRON Solutions Noida
 
KIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfKIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfDr. Radhey Shyam
 
Data Analytics Today - Data, Tech, and Regulation.pdf
Data Analytics Today - Data, Tech, and Regulation.pdfData Analytics Today - Data, Tech, and Regulation.pdf
Data Analytics Today - Data, Tech, and Regulation.pdfHendri Karisma
 

Similar to INTRODUCTION TO DATA SCIENCE -CONCEPTS.pptx (20)

Data Analytics Course in Noida. pptx
Data Analytics  Course in Noida.     pptxData Analytics  Course in Noida.     pptx
Data Analytics Course in Noida. pptx
 
Introduction to Data Science: Unveiling Insights Hidden in Data
Introduction to Data Science: Unveiling Insights Hidden in DataIntroduction to Data Science: Unveiling Insights Hidden in Data
Introduction to Data Science: Unveiling Insights Hidden in Data
 
L3 Big Data and Application.pptx
L3  Big Data and Application.pptxL3  Big Data and Application.pptx
L3 Big Data and Application.pptx
 
Data Science Training in Chandigarh h
Data Science Training in Chandigarh    hData Science Training in Chandigarh    h
Data Science Training in Chandigarh h
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Ch1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxCh1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptx
 
Unlocking Insights_ The Power of Data Analytics in the Modern World.pptx
Unlocking Insights_ The Power of Data Analytics in the Modern World.pptxUnlocking Insights_ The Power of Data Analytics in the Modern World.pptx
Unlocking Insights_ The Power of Data Analytics in the Modern World.pptx
 
Unveiling the Power of Data Analytics Transforming Insights into Action.pdf
Unveiling the Power of Data Analytics Transforming Insights into Action.pdfUnveiling the Power of Data Analytics Transforming Insights into Action.pdf
Unveiling the Power of Data Analytics Transforming Insights into Action.pdf
 
Data Science- Basics.pptx
Data Science- Basics.pptxData Science- Basics.pptx
Data Science- Basics.pptx
 
Data analytics presentation- Management career institute
Data analytics presentation- Management career institute Data analytics presentation- Management career institute
Data analytics presentation- Management career institute
 
Fundamentals of data mining and its applications
Fundamentals of data mining and its applicationsFundamentals of data mining and its applications
Fundamentals of data mining and its applications
 
Introduction to Data Science.pdf
Introduction to Data Science.pdfIntroduction to Data Science.pdf
Introduction to Data Science.pdf
 
Big Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptxBig Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptx
 
Data Analytics Course In Surat.pdf
Data Analytics Course In Surat.pdfData Analytics Course In Surat.pdf
Data Analytics Course In Surat.pdf
 
Introduction-to-Data-Science.pdf
Introduction-to-Data-Science.pdfIntroduction-to-Data-Science.pdf
Introduction-to-Data-Science.pdf
 
Introduction-to-Data-Science.pdf
Introduction-to-Data-Science.pdfIntroduction-to-Data-Science.pdf
Introduction-to-Data-Science.pdf
 
Data Analytics Training Course in Noida.pptx
Data Analytics Training Course in Noida.pptxData Analytics Training Course in Noida.pptx
Data Analytics Training Course in Noida.pptx
 
KIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfKIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdf
 
What is business analytics
What is business analyticsWhat is business analytics
What is business analytics
 
Data Analytics Today - Data, Tech, and Regulation.pdf
Data Analytics Today - Data, Tech, and Regulation.pdfData Analytics Today - Data, Tech, and Regulation.pdf
Data Analytics Today - Data, Tech, and Regulation.pdf
 

Recently uploaded

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 

Recently uploaded (20)

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 

INTRODUCTION TO DATA SCIENCE -CONCEPTS.pptx

  • 1. UNIT I INTRODUCTION TO DATA SCIENCE
  • 2. What is Data Science? ► Data Science is an interdisciplinary field making use of scientific methods, processes, algorithms and systems for extracting knowledge and insights from structured and unstructured data, and applies knowledge and actionable insight from data across a broad range of application domains.
  • 3. Data Science Definition ► Data science is the practice of mining large data sets of raw data, structured and unstructured for identifying patterns and extract actionable insight from it. It is an interdisciplinary field and the foundation of data science includes statistics, inference, computer science, predictive analytics, machine learning algorithm development, and new technologies for gaining insights from big data. Data science life cycle includes acquiring data, extracting and entering it in the system. ► Next stage includes maintenance, including data warehousing, data cleaning, data processing, data staging, and data architecture.
  • 4. Stages of Data Science Lifecycle Data science has five stages: ► Capture: Data acquisition, data entry, signal reception, data extraction ► Maintain: Data warehousing, data cleansing, data staging, data processing, data architecture ► Process: Data mining, clustering/classification, data modeling, data summarization ► Communicate: Data reporting, data visualization, business intelligence, decision making ► Analyze: Exploratory/confirmatory, predictive analysis, regression, text mining, qualitative analysis
  • 5. Why Businesses need Data Science? ► The amount of data created every day has resulted in need for professionals to tackle and make sense of it. ► There is a huge mine of unstructured and semi-structure data coming from various sources and the traditional business intelligence tools are just not sufficient to make sense of it. ► Data science offers advanced tools for working on large volumes of data coming from various types of sources such as financial logs, marketing forms, sensors, instruments, text files, and multimedia files.
  • 6. Job Roles in Data Science ► Data Analyst ► Data Engineers ► Database Administrator ► Machine Learning Engineer ► Data Scientist ► Data Architect ► Statistician ► Business Analyst ► Data and Analytics Manager
  • 7. Skill Set Needed for a Data Scientist ► Technical ► Statistical analysis and computing ► Machine Learning ► Deep Learning ► Processing large data sets ► Data Visualization ► Data Wrangling ► Mathematics ► Programming ► Statistics ► Big Data
  • 8. Skill Set Needed for a Data Scientist ► Non-Technical ► Critical Thinking ► Effective Communication ► Proactive Problem Solving ► Intellectual Curiosity ► Business Sense
  • 9. Statistical Inference Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability.
  • 10. EDA and the Data Science Process
  • 11. Basic Tools of EDA Some of the most common tools used to create an EDA are: 1. R: An open-source programming language and free software environment for statistical computing and graphics supported by the R foundation for statistical computing. The R language is widely used among statisticians in developing statistical observations and data analys 2. Python: An interpreted, object-oriented programming language with dynamic semantics. Its high level, built-in data structures, combined with dynamic binding, make it very attractive for rapid application development, also as to be used as a scripting or glue language to attach existing components together. Python and EDA are often used together to spot missing values in the data set, which is vital so you’ll decide the way to handle missing values for machine learning.
  • 12. Application of Data Science ► Anomaly detection (fraud, disease, crime, etc.) ► Automation and decision-making (background checks, credit worthiness, etc.) ► Classifications (in an email server, this could mean classifying emails as important or junk) ► Forecasting (sales, revenue and customer retention) ► Pattern detection (weather patterns, financial market patterns, etc.) ► Recognition (facial, voice, text, etc.) ► Recommendations (based on learned preferences, recommendation engines can refer you to movies, restaurants and books you may like)
  • 13. Data Science in Business ► Gain Customer Insights ► Increase Security ► Inform Internal Finances ► Streamline Manufacturing ► Predict Future Market Trends
  • 14. Business Intelligence Vs Data Science S.No Factor Data Science Business Intelligence 1 Concept It is a field that uses mathematics, statistics and various other tools to discover the hidden patterns in the data. It is basically a set of technologies, applications and processes that are used by the enterprises for business data analysis. 2 Focus It focuses on the future. It focuses on the past and present. 3 Data It deals with both structured as well as unstructured data. It mainly deals only with structured data. 4 Flexibility Data science is much more flexible as data sources can be added as per requirement. It is less flexible as in case of business intelligence data sources need to be pre- planned. 5 Method It makes use of the scientific method. It makes use of the analytic method.
  • 16. Machine Learning Machine learning (ML) is a type of artificial intelligence (AI) that allows software applications to become more accurate at predicting outcomes without being explicitly programmed to do so. Machine learning algorithms use historical data as input to predict new output values. Why is machine learning important? Machine learning is important because it gives enterprises a view of trends in customer behavior and business operational patterns, as well as supports the development of new products. Many of today's leading companies, such as Facebook, Google and Uber, make machine learning a central part of their operations. Machine learning has become a significant competitive differentiator for many companies. What are the different types of machine learning? Classical machine learning is often categorized by how an algorithm learns to become more accurate in its predictions. There are four basic approaches:supervised learning, unsupervised learning, semi-supervised learning and reinforcement learning. The type of algorithm data scientists choose to use depends on what type of data they want to predict.