• Save
Data Mining: Data processing
Upcoming SlideShare
Loading in...5
×
 

Data Mining: Data processing

on

  • 5,856 views

Data Mining: Data processing

Data Mining: Data processing

Statistics

Views

Total Views
5,856
Views on SlideShare
5,733
Embed Views
123

Actions

Likes
2
Downloads
0
Comments
0

4 Embeds 123

http://www.ustudy.in 59
http://www.dataminingtools.net 34
http://dataminingtools.net 25
http://ustudy.in 5

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Data Mining: Data processing Data Mining: Data processing Presentation Transcript

  • Data Processing
  • What is the need for Data Processing?
    To get the required information from huge, incomplete, noisy and inconsistent set of data it is necessary to use data processing.
  • Steps in Data Processing
    Data Cleaning
    Data Integration
    Data Transformation
    Data reduction
    Data Summarization
  • What is Data Cleaning?
    Data cleaning is a procedure to “clean” the data by filling in missing values, smoothing noisy data, identifying or removing outliers, and resolving inconsistencies
  • What is Data Integration?
    Integrating multiple databases, data cubes, or files, this is called data integration.
  • What is Data Transformation?
    Data transformation operations, such as normalization and aggregation, are additional data preprocessing procedures that would contribute toward the success of the mining process.
  • What is Data Reduction?
    Data reduction obtains a reduced representation of the data set that is much smaller in volume, yet produces the same (or almost the same) analytical results.
  • What is Data Summarization?
    It is the processes of representing the collected data in an accurate and compact way without losing any information, it also involves getting a information from collected data.
    Ex: Display the data as a graph and get the mean, median, mode etc.
  • How to Clean Data?
    Handling Missing values
    Ignore the tuple
    Fill in the missing value manually
    Use a global constant to fill in the missing value
    Use the attribute mean to fill in the missing value
    Use the attribute mean for all samples belonging to the same class as the given tuple
    Use the most probable value to fill in the missing value
  • How to Clean Data?
    Handle Noisy Data
    Binning: Binning methods smooth a sorted data value by consulting its “neighborhood”.
    Regression: Data can be smoothed by fitting the data to a function, such as with regression. 
    Clustering: Outliers may be detected by clustering, where similar values are organized into groups, or “clusters.”
  • Data Integration
    Data Integration combines data from multiple sources into a coherent data store, as in data warehousing. These sources may include multiple databases, data cubes, or flat files. Issues that arises during data integration like Schema integration and object matching Redundancy is another important issue.
  • Data Transformation
    Data transformation can be achieved in following ways
    Smoothing: which works to remove noise from the data
    Aggregation: where summary or aggregation operations are applied to the data. For example, the daily sales data may be aggregated so as to compute weekly and annuual total scores.
    Generalization of the data: where low-level or “primitive” (raw) data are replaced by higher-level concepts through the use of concept hierarchies. For example, categorical attributes, like street, can be generalized to higher-level concepts, like city or country.
    Normalization: where the attribute data are scaled so as to fall within a small specified range, such as −1.0 to 1.0, or 0.0 to 1.0.
    Attribute construction : this is where new attributes are constructed and added from the given set of attributes to help the mining process.
  • Data Reduction techniques
    These are the techniques that can be applied to obtain a reduced representation of the data set that is much smaller in volume, yet closely maintains the integrity of the original data.
    Data cube aggregation
    Attribute subset selection
    Dimensionality reduction
    Numerosity reduction
    Discretization and concept hierarchy generation
  • Visit more self help tutorials
    Pick a tutorial of your choice and browse through it at your own pace.
    The tutorials section is free, self-guiding and will not involve any additional support.
    Visit us at www.dataminingtools.net