Day-2
Types of Data
Two Weeks online
Live Industrial Training
ON
HANDS ON: MACHINE LEARNING
Day-2
THE NATIONAL SMALL INDUSTRIES CORPORATION In Association with
(A Government of India Enterprise)
. Kamalanagar, Kushaiguda, Hyderabad-500062
Date & Time
18-08-2020
2.30pm-3.30pm
Day-2
Content
• Data
• Types
• Structured
• Un-Structured
• Time Series
• Data Sources
• Creation of Data
Day-2
Data
facts and statistics collected together for
reference or analysis.
Raw facts / Observations
the quantities, characters, or symbols on
which operations are performed by a
computer, which may be stored and
transmitted in the form of electrical
signals and recorded on magnetic, optical,
or mechanical recording media.
Information
what is conveyed or represented by a particular
arrangement or sequence of things.
Processed Data
Day-2
Data
Day-2
Data
Day-2
Data
Day-2
Data
• Numerical
• Exact Numbers-Height
• Discrete Data
• Numerical-Students? - No Half Student
• Continuous Data
• Numerical-3.265
• Categorical Data
• Yes/No, Gender, Race – red-1, green-2 – can take average
• Ordinal Data
• Mix of Numerical and Categorical data
• Scale-Movie Ratings -1-5 starts
Day-2
Types
Structured Un-Structured Time Series
Day-2
Structured & Un-Structured
Day-2
Structured
• Organized
• Can easily processed
• Not Big
• MySQL
• RDMS
• Big
• Hadoop
Day-2
Un-Structured
• Not Organized
• Difficult to process
• It include Combination of all forms
• Big
• MongoDB
• Not Big
• Hadoop
Day-2
Time Series
• Time series data is a sequence of numbers collected at regular
intervals over some period of time.
• Date & Time
• Finance
• For example, we might measure the average number of home sales
for many years.
Day-2
Data Sets
• Training data
• First stage
• Adjusting Parameters - Amitabh
• Simply, you can say training data sets are used to train the model with data used in
real-life that gathered as machine learning training data.
• Validation data
• Second Stage
• evaluating the model predictions and learn from mistakes before validating the data
sets.
• Test Data
• Third Stage
• final evaluation that a model need to go through after the training stage in model
development
Day-2
Data Sets
Day-2
Data Sets
• Training data
• Validation data
• Test Data
Day-2
Data Sources
• Data.gov.in
• https://data.gov.in/resources-from-web-service/3670701
• Kaggle
• https://www.kaggle.com/landlord/handwriting-recognition
• UCI
• https://archive.ics.uci.edu/ml/index.php
• https://archive.ics.uci.edu/ml/datasets.php
• More
• https://towardsdatascience.com/top-sources-for-machine-learning-datasets-
bb6d0dc3378b
Day-2
Creation of Data
• IoT – Sensors
• Things Speak
• Download
• Analyse
Day-2
Credits
• Harmonizer
• NSIC
• Google
Day-2
Day-2

Types of data in Machine Learning day 2

  • 1.
    Day-2 Types of Data TwoWeeks online Live Industrial Training ON HANDS ON: MACHINE LEARNING Day-2 THE NATIONAL SMALL INDUSTRIES CORPORATION In Association with (A Government of India Enterprise) . Kamalanagar, Kushaiguda, Hyderabad-500062 Date & Time 18-08-2020 2.30pm-3.30pm
  • 2.
    Day-2 Content • Data • Types •Structured • Un-Structured • Time Series • Data Sources • Creation of Data
  • 3.
    Day-2 Data facts and statisticscollected together for reference or analysis. Raw facts / Observations the quantities, characters, or symbols on which operations are performed by a computer, which may be stored and transmitted in the form of electrical signals and recorded on magnetic, optical, or mechanical recording media. Information what is conveyed or represented by a particular arrangement or sequence of things. Processed Data
  • 4.
  • 5.
  • 6.
  • 7.
    Day-2 Data • Numerical • ExactNumbers-Height • Discrete Data • Numerical-Students? - No Half Student • Continuous Data • Numerical-3.265 • Categorical Data • Yes/No, Gender, Race – red-1, green-2 – can take average • Ordinal Data • Mix of Numerical and Categorical data • Scale-Movie Ratings -1-5 starts
  • 8.
  • 9.
  • 10.
    Day-2 Structured • Organized • Caneasily processed • Not Big • MySQL • RDMS • Big • Hadoop
  • 11.
    Day-2 Un-Structured • Not Organized •Difficult to process • It include Combination of all forms • Big • MongoDB • Not Big • Hadoop
  • 12.
    Day-2 Time Series • Timeseries data is a sequence of numbers collected at regular intervals over some period of time. • Date & Time • Finance • For example, we might measure the average number of home sales for many years.
  • 13.
    Day-2 Data Sets • Trainingdata • First stage • Adjusting Parameters - Amitabh • Simply, you can say training data sets are used to train the model with data used in real-life that gathered as machine learning training data. • Validation data • Second Stage • evaluating the model predictions and learn from mistakes before validating the data sets. • Test Data • Third Stage • final evaluation that a model need to go through after the training stage in model development
  • 14.
  • 15.
    Day-2 Data Sets • Trainingdata • Validation data • Test Data
  • 16.
    Day-2 Data Sources • Data.gov.in •https://data.gov.in/resources-from-web-service/3670701 • Kaggle • https://www.kaggle.com/landlord/handwriting-recognition • UCI • https://archive.ics.uci.edu/ml/index.php • https://archive.ics.uci.edu/ml/datasets.php • More • https://towardsdatascience.com/top-sources-for-machine-learning-datasets- bb6d0dc3378b
  • 17.
    Day-2 Creation of Data •IoT – Sensors • Things Speak • Download • Analyse
  • 18.
  • 19.
  • 20.