D A T A M I N I N G
&
M A C H I N E L E A R N I N G
DA F F O D I L I N T E R N AT I O N A L U N I V E R S I T Y
Md.Anisur Rahman
Contents
1)Data Mining & Machine Learning
2)Data
3)Exploring data -Visualization
4)Data Mining and ML Techniques
5)Applications
6)Summary
DATA MINING
Data mining is considered the process of extracting useful information from a
vast amount of data. It’s used to discover new, accurate, and useful patterns in the
data, looking for meaning and relevant information.
MACHINE LEARNING
Machine learning is the process of discovering algorithms that have improved
courtesy of experience derived from data. It’s the design, study, and development
of algorithms that permit machines to learn without human intervention.
Both data mining and machine learning fall under the aegis of Data
Science, which makes sense since they both use data. Both processes are
used for solving complex problems, so consequently, many people
(erroneously) use the two terms interchangeably.
DATA
Collection of data objects and their attributes.
A collection of attributes
describe an object.
-record, point, case,
sample, entity, or instance
property or characteristic of an object
-eye color of a person, temperature,
variable, field, characteristic, or feature
TYPES OF ATTRIBUTES
Nominal Order Interval Ratio
zip codes, employee
ID numbers, eye
color,
sex: {male, female}
hardness of minerals,
{good, better, best},
grades,
street numbers
calendar dates,
temperature in
Celsius or Fahrenheit
temperature in Kelvin,
monetary quantities,
counts, age, mass, length,
electrical current
IMPORTANT CHARACTERISTICS OF STRUCTURED DATA
1)Dimensionality
Dimensionality is basically the number of columns in a dataset which also can be called the
attributes of data. If we add too many dimensions, this can potentially make the data
incredibly difficult to analyze because it becomes so different, and difficult to group together,
the data in a meaningful way.
2)Sparsity
Data sparsity is term used for how much data we have for a particular dimension/entity of
the model. Data is considered sparse when certain expected values in a dataset are missing,
which is a common phenomenon in general large scaled data analysis.
3)Resolution
Data resolution means a number of units or digits to which a measured or calculated value is
expressed and used. Patterns depend on the scale; think about weather patterns, rainfall over
a time period.
4)Distribution
Data distributions are used often in statistics.They are graphical methods of organizing and
displaying useful information.There are several types of data distributions.We are familiar
with the symmetrical and skewed distribution
Record
• Data Matrix
• Document Data
• Transaction Data
Graph • World Wide Web
• Molecular Structures
Order
• Spatial Data
• Temporal Data
• Genetic Sequence etc.
DATA QUALITY
Noise and Outliers
• Noise refers to modification of original values
• Outliers are data objects with characteristics that are considerably different than most of the other data
objects in the data set.
MissingValues
• Information is not collected
• Attributes may not be applicable to all cases
• We can handle missing values by eliminating missing values or filling them with statistical approach
Duplicate Data
• Data set may include data objects that are duplicates, or almost duplicates of one another.
• Major issue when merging data from heterogeneous sources.
• Data cleaning can solve the problem for duplication of data.
DATA PREPROCESSING
DATA VISUALIZATION
Data visualization is the graphical representation of information and data. By using visual elements like charts,
graphs, and maps, data visualization tools provide an accessible way to see and understand trends, outliers, and
patterns in data. Data visualization tools and technologies are essential to analyze massive amounts of
information and make data-driven decisions.
TECHNIQUES
Market Based Analysis
Education
Manufacturing Engineering
Research Analysis
Fraud Detection
APPLICATIONS
Market Based Analysis
Digital Midea & Entertainment
Manufacturing & Automobile
E- Commerce & CRM
Healthcare
APPLICATIONS
THANK YOU

DATA MINING - CHARACTERISTICS and APPLICATION

  • 1.
    D A TA M I N I N G & M A C H I N E L E A R N I N G DA F F O D I L I N T E R N AT I O N A L U N I V E R S I T Y Md.Anisur Rahman
  • 2.
    Contents 1)Data Mining &Machine Learning 2)Data 3)Exploring data -Visualization 4)Data Mining and ML Techniques 5)Applications 6)Summary
  • 3.
    DATA MINING Data miningis considered the process of extracting useful information from a vast amount of data. It’s used to discover new, accurate, and useful patterns in the data, looking for meaning and relevant information. MACHINE LEARNING Machine learning is the process of discovering algorithms that have improved courtesy of experience derived from data. It’s the design, study, and development of algorithms that permit machines to learn without human intervention. Both data mining and machine learning fall under the aegis of Data Science, which makes sense since they both use data. Both processes are used for solving complex problems, so consequently, many people (erroneously) use the two terms interchangeably.
  • 4.
    DATA Collection of dataobjects and their attributes. A collection of attributes describe an object. -record, point, case, sample, entity, or instance property or characteristic of an object -eye color of a person, temperature, variable, field, characteristic, or feature TYPES OF ATTRIBUTES Nominal Order Interval Ratio zip codes, employee ID numbers, eye color, sex: {male, female} hardness of minerals, {good, better, best}, grades, street numbers calendar dates, temperature in Celsius or Fahrenheit temperature in Kelvin, monetary quantities, counts, age, mass, length, electrical current
  • 5.
    IMPORTANT CHARACTERISTICS OFSTRUCTURED DATA 1)Dimensionality Dimensionality is basically the number of columns in a dataset which also can be called the attributes of data. If we add too many dimensions, this can potentially make the data incredibly difficult to analyze because it becomes so different, and difficult to group together, the data in a meaningful way. 2)Sparsity Data sparsity is term used for how much data we have for a particular dimension/entity of the model. Data is considered sparse when certain expected values in a dataset are missing, which is a common phenomenon in general large scaled data analysis. 3)Resolution Data resolution means a number of units or digits to which a measured or calculated value is expressed and used. Patterns depend on the scale; think about weather patterns, rainfall over a time period. 4)Distribution Data distributions are used often in statistics.They are graphical methods of organizing and displaying useful information.There are several types of data distributions.We are familiar with the symmetrical and skewed distribution
  • 6.
    Record • Data Matrix •Document Data • Transaction Data Graph • World Wide Web • Molecular Structures Order • Spatial Data • Temporal Data • Genetic Sequence etc.
  • 7.
    DATA QUALITY Noise andOutliers • Noise refers to modification of original values • Outliers are data objects with characteristics that are considerably different than most of the other data objects in the data set. MissingValues • Information is not collected • Attributes may not be applicable to all cases • We can handle missing values by eliminating missing values or filling them with statistical approach Duplicate Data • Data set may include data objects that are duplicates, or almost duplicates of one another. • Major issue when merging data from heterogeneous sources. • Data cleaning can solve the problem for duplication of data.
  • 8.
  • 9.
    DATA VISUALIZATION Data visualizationis the graphical representation of information and data. By using visual elements like charts, graphs, and maps, data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data. Data visualization tools and technologies are essential to analyze massive amounts of information and make data-driven decisions.
  • 10.
  • 11.
    Market Based Analysis Education ManufacturingEngineering Research Analysis Fraud Detection APPLICATIONS
  • 12.
    Market Based Analysis DigitalMidea & Entertainment Manufacturing & Automobile E- Commerce & CRM Healthcare APPLICATIONS
  • 14.