SlideShare a Scribd company logo
INTRODUCTIONTO
DATA SCIENCE
BY: ELSAYED FATHY KHALF
ABOUT DATA SCIENCE
• we can say that it is the interdisciplinary field concerned
with generating business insights and value by processing
large amounts of data. But that does not help us grasp the
complexity and nuanced nature of the discipline. Nor does
it provide us with an understanding of how it is shaping the
technological and business landscapes.
1 -The different data science fields
Traditional data: Data in the form of tables containing numeric or text
values; Data that is structured and stored in databases
Big data: Extremely large data; Humongous in terms of volume. It can
be in various formats:
- structured
- semi-structured
- unstructured
1 -The different data science fields
1 -The different data science fields
1-The different data science fields
1-The different data science fields
1-The different data science fields
2 - Analysis vs analytics
• Analysis – Dividing data into digestible components that
are easier to understand and examining how different
parts relate to each other. Performed on past data,
explaining why the story ended in the way that it did. We
want to explain ‘how’ and ‘why’ something happened.
• Analytics – Explores the future. The application of logical
and computational reasoning to the component parts
obtained in an analysis. In doing this, you are looking for
patterns and exploring what you can do with them in the
future.
3 – Qualitative vs Quantitative analytics
• Qualitative analytics – using intuition and experience in
conjunction with analysis to plan your next business move
• Quantitative analytics – applying formulas and algorithms
to numbers that you have gathered from your analysis.
4 -Business Intelligence
• Business Intelligence : can be seen as the preliminary step
of predictive analytics. First, you analyze past data and
then using these inferences would allow you to create
appropriate models that could predict the future of your
business accurately.
• Business Intelligence (BI) : is the process of analyzing and
reporting historical business data. After reports and
dashboards have been prepared, they can be used to make
informed strategic and business decisions by end-users
such as the general manager. Concisely put, business
intelligence aims to explain past events using business
data.
5 -Machine learning
• AI simulates human knowledge and decision making with
computers. Humans have managed to reach AI through
machine and deep learning.
• Machine learning is the ability of machines to predict
outcomes without being explicitly programmed to do so. It is
about creating and implementing algorithms that let
machines receive data and use this data to:
Make predictions | Analyze patterns | Give recommendations
• Symbolic reasoning is a type of AI that makes an
exception and does not use ML and deep learning.
6 -Traditional data:Techniques
• Class labelling: Labelling the data point to the correct data
type (or arranging data by category).
• Data cleansing: (‘data cleaning’, ‘data scrubbing’): deal with
inconsistent data. For example, working on a dataset
containing US states and finding that some of the names are
misspelled.
• Data shuffling: Shuffling the observations from your dataset
just like shuffling a deck of cards. This will ensure your
dataset is free from unwanted patterns caused by
problematic data collection.
6 -Traditional data:Techniques
• Data balancing: Ensuring that the sample gives equal
priority to each class. For example, if you work with a
dataset that contains 80% male and 20% female data, and
you know that the population contains approximately 50%
men and 50% women, then you need apply a balancing
technique to counteract this problem (using an equal
number of data from each group).
6.1-Traditional data: Examples
• Numerical variable: Numbers that are easily manipulated
(for ex. Added), which gives us useful information
• Categorical variable: Numbers that hold no numerical value
can be considered categorical data. Dates are also
considered categorical data.
7 - Big data:Techniques
• Text data mining: The process of deriving valuable,
unstructured data from text.
• Data masking: As a business, when you work with user
private data, you must be able to preserve confidential
information. However, this doesn’t mean that the data can’t
be touched or used for analysis. Instead, you must apply
some data masking techniques to utilize the information
without compromising private details. In essence, data
masking conceals the original data with random and false
data, allowing you to conduct analysis and keep confidential
information in a secure place.
7.1 - Big data: Examples
• We find big data in increasingly more industries and
companies.
• Probably the most notable example of a company leveraging
the true potential of big data is Facebook. The company
keeps track of its users’ names, personal data, photos,
videos, recorded messages and so on. This means their data
has a lot of variety. And with 2 billion users worldwide, the
volume of data stored on their servers is tremendous.
8 - Business IntelligenceTechniques
• Metric: refers to a value that derives from the measures you
have obtained and aims at gauging business performance or
progress. Has a business meaning attached to it.
• Measure: simple descriptive statistics of past performance
Metric = Measure + Business meaning
• KPIs: It doesn’t make sense to keep track of all metrics. So,
companies choose to focus on the most important ones.
KPIs = metrics + Business objective
Filtering out the boring metrics and turning the interesting and informative
KPIs into easily understood and comparable visualizations is an important part
of the business intelligence analyst job.
8.1 - Business Intelligence Examples
• BI allows you to adjust your strategy to past data as soon
as it is available. If done right, Business Intelligence will
help to efficiently manage your shipment logistics and, in
turn, reduce costs and increase profit.
9 - Traditional methods:Techniques
• (classical statistical methods for forecasting)
• Clustering: grouping the data in neighborhoods to analyze
meaningful patterns
• Time series: used in economics and finance, showing the
development of certain values over time, such as stock
prices or sales volume.
9.1 - Traditional methods : Examples
• The application of traditional methods is extremely broad.
Two relevant examples:
• Forecasting sales data: using time series data to predict a
firm’s future expected sales
• UX: plot customer satisfaction and customer revenue to find
that each cluster represents a different geographical
location
10 - Machine Learning: Techniques
• There are four ingredients for machine learning: data,
model, objective function, optimization algorithm
• Model: the computer uses an algorithm to recognize certain
types of patterns
• Objective function: specification of the machine learning
problem; a function to be maximized or minimized
depending on the task at hand
• Optimization algorithm: A process in which previous
solution of the problem are compared until reaching an
optimal solution
10.1- Machine Learning:Types
• Three main types of machine learning:
• Supervised learning :Training an algorithm resembles a
teacher supervising her students. Provides feedback every
step of the way. Telling students whether they did ‘good’ or
whether they need to improve their performance. When
using supervised learning you use labelled data (every data
point is categorized as ‘good’ performance or as
‘performance that needs improvement’ in our example).
10.1- Machine Learning:Types
• Three main types of machine learning:
• Unsupervised learning : In this case, the algorithm trains
itself. There isn’t a teacher who provides feedback. The
algorithm uses unlabeled data that is not categorized as
‘good’ or as ‘performance that needs improvement’. The
unsupervised ML model simply uses the data and sorts in
different groups. In our example, it will be able to show us
two groups – ‘good performing’ and ‘performance that
needs to be improved’, however the ML model would not be
able to tell us which one is which.
10.1- Machine Learning:Types
• Three main types of machine learning:
• Reinforcement learning: A reward system is introduced.
Every time a student does a task better than it used to in the
past they will receive a reward (and nothing if the task is
not performed better). Instead of minimizing an error, we
maximize a reward, or in other words, maximizing the
objective function.
• Deep learning – the modern state-of-the-art approach to
machine learning – leverages the power of neural networks
and can be placed in both categories – supervised and
unsupervised learning.
11- Common data science tools
• There are two main types of tools
• Programming languages : enable you to devise programs
that can execute specific operations. Moreover, you can
reuse these programs whenever you need to execute the
same action. the most popular programming language for
data science is Python followed by R.
- Python and R have their limitations. They are not able
to address problems specific to some domains. One
example is ‘relational database management systems’.
In these instances, SQL works best.
11- Common data science tools
• There are two main types of tools
• terms of software : Excel plays an important role. It is
able to do relatively complex computations and good
visualizations quickly. SPSS is another popular tool for
working with traditional data and applying statistical
analysis.
-There is a significant amount of software designed for
working with big data – Apache Hadoop, Apache Hbase,
and Mongo.
PowerBI, Qlik, Tableau are top-notch examples of
software designed for business intelligence visualizations.
12- Data science job positions
• Data architect – designs the way data will be retrieved
processed and consumed
• Data engineer – process the obtained data so that it is ready
for analysis
• Database administrator – handles this control of data;
works with traditional data
• BI analyst – performs analyses and reporting of past
historical data
12- Data science job positions
• BI consultant – ‘external BI analyst’
• BI developer – performs analyses specifically designed for
the company
• Data scientist – employs traditional statistical methods or
unconventional machine learning techniques for making
predictions
• Data analyst – prepare advanced analyses
• Machine learning engineer – applies state-of-the-art ML
techniques
Resource : Martin Ganchev

More Related Content

What's hot

Database management systems
Database management systemsDatabase management systems
Database management systems
Joel Briza
 
What is Machine Learning | Introduction to Machine Learning | Machine Learnin...
What is Machine Learning | Introduction to Machine Learning | Machine Learnin...What is Machine Learning | Introduction to Machine Learning | Machine Learnin...
What is Machine Learning | Introduction to Machine Learning | Machine Learnin...
Simplilearn
 
Data Structure Lec #1
Data Structure Lec #1Data Structure Lec #1
Data Structure Lec #1
University of Gujrat, Pakistan
 
CLASS OBJECT AND INHERITANCE IN PYTHON
CLASS OBJECT AND INHERITANCE IN PYTHONCLASS OBJECT AND INHERITANCE IN PYTHON
CLASS OBJECT AND INHERITANCE IN PYTHON
Lalitkumar_98
 
Architecture Design for Deep Neural Networks I
Architecture Design for Deep Neural Networks IArchitecture Design for Deep Neural Networks I
Architecture Design for Deep Neural Networks I
Wanjin Yu
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Shahar Cohen
 
Processing Arabic Text
Processing Arabic TextProcessing Arabic Text
Processing Arabic Text
Jordan Open Source Association
 
Python - object oriented
Python - object orientedPython - object oriented
Python - object oriented
Learnbay Datascience
 
Pattern recognition UNIT 5
Pattern recognition UNIT 5Pattern recognition UNIT 5
Pattern recognition UNIT 5
SURBHI SAROHA
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
Mohammad Junaid Khan
 
Knowledge Discovery and Data Mining
Knowledge Discovery and Data MiningKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining
Amritanshu Mehra
 
Fundamentals of Database system
Fundamentals of Database systemFundamentals of Database system
Fundamentals of Database system
philipsinter
 
Handling noisy data
Handling noisy dataHandling noisy data
Handling noisy data
Vivek Gandhi
 
Functions of dbms
Functions  of dbmsFunctions  of dbms
Functions of dbms
Padma Kannan
 
DAY 2 - Starting in Photoshop (Images and Layers)
DAY 2 - Starting in Photoshop (Images and Layers)DAY 2 - Starting in Photoshop (Images and Layers)
DAY 2 - Starting in Photoshop (Images and Layers)
Sef Cambaliza
 
Machine Learning - Splitting Datasets
Machine Learning - Splitting DatasetsMachine Learning - Splitting Datasets
Machine Learning - Splitting Datasets
Andrew Ferlitsch
 
bag-of-words models
bag-of-words models bag-of-words models
bag-of-words models
Xiaotao Zou
 
DBMS 1 | Introduction to DBMS
DBMS 1 | Introduction to DBMSDBMS 1 | Introduction to DBMS
DBMS 1 | Introduction to DBMS
Mohammad Imam Hossain
 
Deep learning
Deep learningDeep learning
Deep learning
Ratnakar Pandey
 
Deep learning for real life applications
Deep learning for real life applicationsDeep learning for real life applications
Deep learning for real life applications
Anas Arram, Ph.D
 

What's hot (20)

Database management systems
Database management systemsDatabase management systems
Database management systems
 
What is Machine Learning | Introduction to Machine Learning | Machine Learnin...
What is Machine Learning | Introduction to Machine Learning | Machine Learnin...What is Machine Learning | Introduction to Machine Learning | Machine Learnin...
What is Machine Learning | Introduction to Machine Learning | Machine Learnin...
 
Data Structure Lec #1
Data Structure Lec #1Data Structure Lec #1
Data Structure Lec #1
 
CLASS OBJECT AND INHERITANCE IN PYTHON
CLASS OBJECT AND INHERITANCE IN PYTHONCLASS OBJECT AND INHERITANCE IN PYTHON
CLASS OBJECT AND INHERITANCE IN PYTHON
 
Architecture Design for Deep Neural Networks I
Architecture Design for Deep Neural Networks IArchitecture Design for Deep Neural Networks I
Architecture Design for Deep Neural Networks I
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Processing Arabic Text
Processing Arabic TextProcessing Arabic Text
Processing Arabic Text
 
Python - object oriented
Python - object orientedPython - object oriented
Python - object oriented
 
Pattern recognition UNIT 5
Pattern recognition UNIT 5Pattern recognition UNIT 5
Pattern recognition UNIT 5
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
 
Knowledge Discovery and Data Mining
Knowledge Discovery and Data MiningKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining
 
Fundamentals of Database system
Fundamentals of Database systemFundamentals of Database system
Fundamentals of Database system
 
Handling noisy data
Handling noisy dataHandling noisy data
Handling noisy data
 
Functions of dbms
Functions  of dbmsFunctions  of dbms
Functions of dbms
 
DAY 2 - Starting in Photoshop (Images and Layers)
DAY 2 - Starting in Photoshop (Images and Layers)DAY 2 - Starting in Photoshop (Images and Layers)
DAY 2 - Starting in Photoshop (Images and Layers)
 
Machine Learning - Splitting Datasets
Machine Learning - Splitting DatasetsMachine Learning - Splitting Datasets
Machine Learning - Splitting Datasets
 
bag-of-words models
bag-of-words models bag-of-words models
bag-of-words models
 
DBMS 1 | Introduction to DBMS
DBMS 1 | Introduction to DBMSDBMS 1 | Introduction to DBMS
DBMS 1 | Introduction to DBMS
 
Deep learning
Deep learningDeep learning
Deep learning
 
Deep learning for real life applications
Deep learning for real life applicationsDeep learning for real life applications
Deep learning for real life applications
 

Similar to Introduction to data science.pdf

Introductions to Business Analytics
Introductions to Business Analytics Introductions to Business Analytics
Introductions to Business Analytics
Venkat .P
 
Lesson 1 - Overview of Machine Learning and Data Analysis.pptx
Lesson 1 - Overview of Machine Learning and Data Analysis.pptxLesson 1 - Overview of Machine Learning and Data Analysis.pptx
Lesson 1 - Overview of Machine Learning and Data Analysis.pptx
cloudserviceuit
 
Data Science in Python.pptx
Data Science in Python.pptxData Science in Python.pptx
Data Science in Python.pptx
Ramakrishna Reddy Bijjam
 
Data Analytics & Visualization (Introduction)
Data Analytics & Visualization (Introduction)Data Analytics & Visualization (Introduction)
Data Analytics & Visualization (Introduction)
Dolapo Amusat
 
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Rohit Dubey
 
Data Scientist By: Professor Lili Saghafi
Data Scientist By: Professor Lili SaghafiData Scientist By: Professor Lili Saghafi
Data Scientist By: Professor Lili Saghafi
Professor Lili Saghafi
 
BA Overview.pptx
BA Overview.pptxBA Overview.pptx
BA Overview.pptx
SuKuTurangi
 
Introducition to Data scinece compiled by hu
Introducition to Data scinece compiled by huIntroducition to Data scinece compiled by hu
Introducition to Data scinece compiled by hu
wekineheshete
 
Analytics
AnalyticsAnalytics
1-210217184339.pptx
1-210217184339.pptx1-210217184339.pptx
1-210217184339.pptx
PrakharMishra961088
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docx
Shanmugasundaram M
 
KIT601 Unit I.pptx
KIT601 Unit I.pptxKIT601 Unit I.pptx
KIT601 Unit I.pptx
LBSIMDS, Lucknow
 
Choosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needChoosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your need
GibDevs
 
data science and business analytics
data science and business analyticsdata science and business analytics
data science and business analytics
sunnypatil1778
 
Data Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptxData Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptx
sumitkumar600840
 
Data Analytics and Big Data on IoT
Data Analytics and Big Data on IoTData Analytics and Big Data on IoT
Data Analytics and Big Data on IoT
Shivam Singh
 
Data Science Unit1 AMET.pdf
Data Science Unit1 AMET.pdfData Science Unit1 AMET.pdf
Data Science Unit1 AMET.pdf
mustaq4
 
Morden EcoSystem.pptx
Morden EcoSystem.pptxMorden EcoSystem.pptx
Morden EcoSystem.pptx
priti jadhao
 
Data science in business Administration Nagarajan.pptx
Data science in business Administration Nagarajan.pptxData science in business Administration Nagarajan.pptx
Data science in business Administration Nagarajan.pptx
NagarajanG35
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATA
javed75
 

Similar to Introduction to data science.pdf (20)

Introductions to Business Analytics
Introductions to Business Analytics Introductions to Business Analytics
Introductions to Business Analytics
 
Lesson 1 - Overview of Machine Learning and Data Analysis.pptx
Lesson 1 - Overview of Machine Learning and Data Analysis.pptxLesson 1 - Overview of Machine Learning and Data Analysis.pptx
Lesson 1 - Overview of Machine Learning and Data Analysis.pptx
 
Data Science in Python.pptx
Data Science in Python.pptxData Science in Python.pptx
Data Science in Python.pptx
 
Data Analytics & Visualization (Introduction)
Data Analytics & Visualization (Introduction)Data Analytics & Visualization (Introduction)
Data Analytics & Visualization (Introduction)
 
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
 
Data Scientist By: Professor Lili Saghafi
Data Scientist By: Professor Lili SaghafiData Scientist By: Professor Lili Saghafi
Data Scientist By: Professor Lili Saghafi
 
BA Overview.pptx
BA Overview.pptxBA Overview.pptx
BA Overview.pptx
 
Introducition to Data scinece compiled by hu
Introducition to Data scinece compiled by huIntroducition to Data scinece compiled by hu
Introducition to Data scinece compiled by hu
 
Analytics
AnalyticsAnalytics
Analytics
 
1-210217184339.pptx
1-210217184339.pptx1-210217184339.pptx
1-210217184339.pptx
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docx
 
KIT601 Unit I.pptx
KIT601 Unit I.pptxKIT601 Unit I.pptx
KIT601 Unit I.pptx
 
Choosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needChoosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your need
 
data science and business analytics
data science and business analyticsdata science and business analytics
data science and business analytics
 
Data Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptxData Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptx
 
Data Analytics and Big Data on IoT
Data Analytics and Big Data on IoTData Analytics and Big Data on IoT
Data Analytics and Big Data on IoT
 
Data Science Unit1 AMET.pdf
Data Science Unit1 AMET.pdfData Science Unit1 AMET.pdf
Data Science Unit1 AMET.pdf
 
Morden EcoSystem.pptx
Morden EcoSystem.pptxMorden EcoSystem.pptx
Morden EcoSystem.pptx
 
Data science in business Administration Nagarajan.pptx
Data science in business Administration Nagarajan.pptxData science in business Administration Nagarajan.pptx
Data science in business Administration Nagarajan.pptx
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATA
 

Recently uploaded

Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 

Recently uploaded (20)

Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 

Introduction to data science.pdf

  • 2. ABOUT DATA SCIENCE • we can say that it is the interdisciplinary field concerned with generating business insights and value by processing large amounts of data. But that does not help us grasp the complexity and nuanced nature of the discipline. Nor does it provide us with an understanding of how it is shaping the technological and business landscapes.
  • 3. 1 -The different data science fields Traditional data: Data in the form of tables containing numeric or text values; Data that is structured and stored in databases Big data: Extremely large data; Humongous in terms of volume. It can be in various formats: - structured - semi-structured - unstructured
  • 4. 1 -The different data science fields
  • 5. 1 -The different data science fields
  • 6. 1-The different data science fields
  • 7. 1-The different data science fields
  • 8. 1-The different data science fields
  • 9. 2 - Analysis vs analytics • Analysis – Dividing data into digestible components that are easier to understand and examining how different parts relate to each other. Performed on past data, explaining why the story ended in the way that it did. We want to explain ‘how’ and ‘why’ something happened. • Analytics – Explores the future. The application of logical and computational reasoning to the component parts obtained in an analysis. In doing this, you are looking for patterns and exploring what you can do with them in the future.
  • 10. 3 – Qualitative vs Quantitative analytics • Qualitative analytics – using intuition and experience in conjunction with analysis to plan your next business move • Quantitative analytics – applying formulas and algorithms to numbers that you have gathered from your analysis.
  • 11. 4 -Business Intelligence • Business Intelligence : can be seen as the preliminary step of predictive analytics. First, you analyze past data and then using these inferences would allow you to create appropriate models that could predict the future of your business accurately. • Business Intelligence (BI) : is the process of analyzing and reporting historical business data. After reports and dashboards have been prepared, they can be used to make informed strategic and business decisions by end-users such as the general manager. Concisely put, business intelligence aims to explain past events using business data.
  • 12. 5 -Machine learning • AI simulates human knowledge and decision making with computers. Humans have managed to reach AI through machine and deep learning. • Machine learning is the ability of machines to predict outcomes without being explicitly programmed to do so. It is about creating and implementing algorithms that let machines receive data and use this data to: Make predictions | Analyze patterns | Give recommendations • Symbolic reasoning is a type of AI that makes an exception and does not use ML and deep learning.
  • 13. 6 -Traditional data:Techniques • Class labelling: Labelling the data point to the correct data type (or arranging data by category). • Data cleansing: (‘data cleaning’, ‘data scrubbing’): deal with inconsistent data. For example, working on a dataset containing US states and finding that some of the names are misspelled. • Data shuffling: Shuffling the observations from your dataset just like shuffling a deck of cards. This will ensure your dataset is free from unwanted patterns caused by problematic data collection.
  • 14. 6 -Traditional data:Techniques • Data balancing: Ensuring that the sample gives equal priority to each class. For example, if you work with a dataset that contains 80% male and 20% female data, and you know that the population contains approximately 50% men and 50% women, then you need apply a balancing technique to counteract this problem (using an equal number of data from each group).
  • 15. 6.1-Traditional data: Examples • Numerical variable: Numbers that are easily manipulated (for ex. Added), which gives us useful information • Categorical variable: Numbers that hold no numerical value can be considered categorical data. Dates are also considered categorical data.
  • 16. 7 - Big data:Techniques • Text data mining: The process of deriving valuable, unstructured data from text. • Data masking: As a business, when you work with user private data, you must be able to preserve confidential information. However, this doesn’t mean that the data can’t be touched or used for analysis. Instead, you must apply some data masking techniques to utilize the information without compromising private details. In essence, data masking conceals the original data with random and false data, allowing you to conduct analysis and keep confidential information in a secure place.
  • 17. 7.1 - Big data: Examples • We find big data in increasingly more industries and companies. • Probably the most notable example of a company leveraging the true potential of big data is Facebook. The company keeps track of its users’ names, personal data, photos, videos, recorded messages and so on. This means their data has a lot of variety. And with 2 billion users worldwide, the volume of data stored on their servers is tremendous.
  • 18. 8 - Business IntelligenceTechniques • Metric: refers to a value that derives from the measures you have obtained and aims at gauging business performance or progress. Has a business meaning attached to it. • Measure: simple descriptive statistics of past performance Metric = Measure + Business meaning • KPIs: It doesn’t make sense to keep track of all metrics. So, companies choose to focus on the most important ones. KPIs = metrics + Business objective Filtering out the boring metrics and turning the interesting and informative KPIs into easily understood and comparable visualizations is an important part of the business intelligence analyst job.
  • 19. 8.1 - Business Intelligence Examples • BI allows you to adjust your strategy to past data as soon as it is available. If done right, Business Intelligence will help to efficiently manage your shipment logistics and, in turn, reduce costs and increase profit.
  • 20. 9 - Traditional methods:Techniques • (classical statistical methods for forecasting) • Clustering: grouping the data in neighborhoods to analyze meaningful patterns • Time series: used in economics and finance, showing the development of certain values over time, such as stock prices or sales volume.
  • 21. 9.1 - Traditional methods : Examples • The application of traditional methods is extremely broad. Two relevant examples: • Forecasting sales data: using time series data to predict a firm’s future expected sales • UX: plot customer satisfaction and customer revenue to find that each cluster represents a different geographical location
  • 22. 10 - Machine Learning: Techniques • There are four ingredients for machine learning: data, model, objective function, optimization algorithm • Model: the computer uses an algorithm to recognize certain types of patterns • Objective function: specification of the machine learning problem; a function to be maximized or minimized depending on the task at hand • Optimization algorithm: A process in which previous solution of the problem are compared until reaching an optimal solution
  • 23. 10.1- Machine Learning:Types • Three main types of machine learning: • Supervised learning :Training an algorithm resembles a teacher supervising her students. Provides feedback every step of the way. Telling students whether they did ‘good’ or whether they need to improve their performance. When using supervised learning you use labelled data (every data point is categorized as ‘good’ performance or as ‘performance that needs improvement’ in our example).
  • 24. 10.1- Machine Learning:Types • Three main types of machine learning: • Unsupervised learning : In this case, the algorithm trains itself. There isn’t a teacher who provides feedback. The algorithm uses unlabeled data that is not categorized as ‘good’ or as ‘performance that needs improvement’. The unsupervised ML model simply uses the data and sorts in different groups. In our example, it will be able to show us two groups – ‘good performing’ and ‘performance that needs to be improved’, however the ML model would not be able to tell us which one is which.
  • 25. 10.1- Machine Learning:Types • Three main types of machine learning: • Reinforcement learning: A reward system is introduced. Every time a student does a task better than it used to in the past they will receive a reward (and nothing if the task is not performed better). Instead of minimizing an error, we maximize a reward, or in other words, maximizing the objective function. • Deep learning – the modern state-of-the-art approach to machine learning – leverages the power of neural networks and can be placed in both categories – supervised and unsupervised learning.
  • 26. 11- Common data science tools • There are two main types of tools • Programming languages : enable you to devise programs that can execute specific operations. Moreover, you can reuse these programs whenever you need to execute the same action. the most popular programming language for data science is Python followed by R. - Python and R have their limitations. They are not able to address problems specific to some domains. One example is ‘relational database management systems’. In these instances, SQL works best.
  • 27. 11- Common data science tools • There are two main types of tools • terms of software : Excel plays an important role. It is able to do relatively complex computations and good visualizations quickly. SPSS is another popular tool for working with traditional data and applying statistical analysis. -There is a significant amount of software designed for working with big data – Apache Hadoop, Apache Hbase, and Mongo. PowerBI, Qlik, Tableau are top-notch examples of software designed for business intelligence visualizations.
  • 28. 12- Data science job positions • Data architect – designs the way data will be retrieved processed and consumed • Data engineer – process the obtained data so that it is ready for analysis • Database administrator – handles this control of data; works with traditional data • BI analyst – performs analyses and reporting of past historical data
  • 29. 12- Data science job positions • BI consultant – ‘external BI analyst’ • BI developer – performs analyses specifically designed for the company • Data scientist – employs traditional statistical methods or unconventional machine learning techniques for making predictions • Data analyst – prepare advanced analyses • Machine learning engineer – applies state-of-the-art ML techniques
  • 30. Resource : Martin Ganchev