This ppt says the introduction to data science and all the basic concepts of data science like data mining and Eda and cycle of data science and analytics
2. What is Data Science?
► Data Science is an interdisciplinary field making use of scientific
methods, processes, algorithms and systems for extracting
knowledge and insights from structured and unstructured data, and
applies knowledge and actionable insight from data across a broad
range of application domains.
3. Data Science Definition
► Data science is the practice of mining large data sets of raw data,
structured and unstructured for identifying patterns and extract
actionable insight from it. It is an interdisciplinary field and the
foundation of data science includes statistics, inference, computer
science, predictive analytics, machine learning algorithm
development, and new technologies for gaining insights from big
data. Data science life cycle includes acquiring data, extracting
and entering it in the system.
► Next stage includes maintenance, including data warehousing,
data cleaning, data processing, data staging, and data
architecture.
4. Stages of Data Science Lifecycle
Data science has five stages:
► Capture: Data acquisition, data entry, signal reception, data
extraction
► Maintain: Data warehousing, data cleansing, data staging, data
processing, data architecture
► Process: Data mining, clustering/classification, data modeling, data
summarization
► Communicate: Data reporting, data visualization, business
intelligence, decision making
► Analyze: Exploratory/confirmatory, predictive analysis, regression,
text mining, qualitative analysis
5. Why Businesses need Data
Science?
► The amount of data created every day has resulted in need for
professionals to tackle and make sense of it.
► There is a huge mine of unstructured and semi-structure data
coming from various sources and the traditional business
intelligence tools are just not sufficient to make sense of it.
► Data science offers advanced tools for working on large volumes of
data coming from various types of sources such as financial logs,
marketing forms, sensors, instruments, text files, and multimedia files.
6. Job Roles in Data Science
► Data Analyst
► Data Engineers
► Database Administrator
► Machine Learning Engineer
► Data Scientist
► Data Architect
► Statistician
► Business Analyst
► Data and Analytics Manager
7. Skill Set Needed for a Data Scientist
► Technical
► Statistical analysis and computing
► Machine Learning
► Deep Learning
► Processing large data sets
► Data Visualization
► Data Wrangling
► Mathematics
► Programming
► Statistics
► Big Data
8. Skill Set Needed for a Data Scientist
► Non-Technical
► Critical Thinking
► Effective Communication
► Proactive Problem Solving
► Intellectual Curiosity
► Business Sense
11. Basic Tools of EDA
Some of the most common tools used to create an EDA are:
1. R: An open-source programming language and free software environment
for statistical computing and graphics supported by the R foundation for
statistical computing. The R language is widely used among statisticians in
developing statistical observations and data analys
2. Python: An interpreted, object-oriented programming language with
dynamic semantics. Its high level, built-in data structures, combined with
dynamic binding, make it very attractive for rapid application development,
also as to be used as a scripting or glue language to attach existing
components together. Python and EDA are often used together to spot missing
values in the data set, which is vital so you’ll decide the way to handle missing
values for machine learning.
12. Application of Data Science
► Anomaly detection (fraud, disease, crime, etc.)
► Automation and decision-making (background checks, credit
worthiness, etc.)
► Classifications (in an email server, this could mean classifying emails
as important or junk)
► Forecasting (sales, revenue and customer retention)
► Pattern detection (weather patterns, financial market patterns, etc.)
► Recognition (facial, voice, text, etc.)
► Recommendations (based on learned preferences,
recommendation engines can refer you to movies, restaurants and
books you may like)
13. Data Science in Business
► Gain Customer Insights
► Increase Security
► Inform Internal Finances
► Streamline Manufacturing
► Predict Future Market Trends
14. Business Intelligence Vs Data
Science
S.No Factor Data Science Business Intelligence
1 Concept It is a field that uses mathematics,
statistics and various other tools to
discover the hidden patterns in the
data.
It is basically a set of technologies,
applications and processes that are used by
the enterprises for business data analysis.
2 Focus It focuses on the future. It focuses on the past and present.
3 Data It deals with both structured as well
as unstructured data.
It mainly deals only with structured data.
4 Flexibility Data science is much more flexible
as data sources can be added as per
requirement.
It is less flexible as in case of business
intelligence data sources need to be pre-
planned.
5 Method It makes use of the scientific method. It makes use of the analytic method.
16. Machine Learning
Machine learning (ML) is a type of artificial intelligence (AI) that allows software
applications to become more accurate at predicting outcomes without being explicitly
programmed to do so. Machine learning algorithms use historical data as input to
predict new output values.
Why is machine learning important?
Machine learning is important because it gives enterprises a view of trends in customer
behavior and business operational patterns, as well as supports the development of
new products. Many of today's leading companies, such as Facebook, Google and
Uber, make machine learning a central part of their operations. Machine learning has
become a significant competitive differentiator for many companies.
What are the different types of machine learning?
Classical machine learning is often categorized by how an algorithm learns to become
more accurate in its predictions. There are four basic approaches:supervised learning,
unsupervised learning, semi-supervised learning and reinforcement learning. The type
of algorithm data scientists choose to use depends on what type of data they want to
predict.