INTRODUCTION
Data mining is applied to the selected data in a large amount database. When data
analysis and mining is done on a huge amount of data, then it takes a very long time to
process, making it impractical and infeasible. [2]
Data reduction is a technique used in data mining to reduce the size of a dataset while
still preserving the most important information. This can be beneficial in situations where
the dataset is too large to be processed efficiently, or where the dataset contains a large
amount of irrelevant or redundant information
DATA REDUCTION
Data reduction techniques ensure the integrity of data while reducing the data. Data
reduction is a process that reduces the volume of original data and represents it in a much
smaller volume. Data reduction techniques are used to obtain a reduced representation of the
dataset that is much smaller in volume by maintaining the integrity of the original data. By
reducing the data, the efficiency of the data mining process is improved, which produces the
same analytical results. [2]
data reduction is an important step in data mining, as it can help to improve the efficiency
and performance of machine learning algorithms by reducing the size of the dataset.
However, it is important to be aware of the trade-off between the size and accuracy of the
data, and carefully assess the risks and benefits before implementing it.[2]
TECHNIQUES DATA REDUCTION
1. Data Sampling
2. Dimensionality Reduction
3. Data Compression
4. Data Discretization
5. Feature Selection
EXAMPLES OF DATABASES THAT NEED TO OF DATA REDUCTION
1. Data Sampling
[Acquire Valued Shoppers Challenge
Predict which shoppers will become repeat buyers
It is one of the largest problems run on Kaggle ]
The Acquire Valued Shoppers Challenge asks participants to predict which shoppers are most likely to repeat purchase. To
aid with algorithmic development, we have provided complete, basket-level, pre-offer shopping history for a large set of
shoppers who were targeted for an acquisition campaign. The incentive offered to that shopper and their post-incentive
behavior is also provided.
This challenge provides almost 350 million rows of completely anonymised transactional data from over 300,000
shoppers.
so, to get the data down to a more manageable size, extracted only transactions where the category was a category on at
least one of the offers. got the transactions down from about 22GB to about 1GB.[3]
Editor's Notes
ميزة استخراج
استخراج الميزة هو عملية استخراج معلومات كمية من صورة مثل ميزات اللون والملمس والشكل والتباين. هنا ، استخدمنا التحويل المويج المنفصل (DWT) لاستخراج معاملات المويجة ومصفوفة التواجد المشترك ذات المستوى الرمادي (GLCM) لاستخراج الميزات الإحصائية.