DWDM \ Knowledge Discovery in Databases

Knowledge Discovery in Databases is used to create knowledge either from structured or unstructured data sources or both.

 Knowledge Discovery in Databases (KDD)

Data Preprocessing
Data Cleaning Missing Data, Noisy Data
Data Reduction 1. Attribute Subset Selection(Attributes).
2. Numerosity Reduction(Storage).
3. Dimensionality Reduction(compression).
Data Transformation 1. Smoothing(Remove Noise).
2. Aggregation(summary).
3. Generalization(Low level to high level).
4. Normalization(Scaling).
5. Attribute Construction(Create Attributes)


Data Cleaning (Missing Data, Noisy Data)
Missing Data 1. Ignore the tuples.
2. Fill the Missing values with attribute mean / probable value.
Noisy Data 1. Binning Method(Create Bins)
Here complete data is divided into segments called as bins on those bins , data is smoothed by means of some methods.

2. Regression(Function)
Here data is smoothed by fitting it to a regression function. The regression used may be linear (having one independent variable) or multiple (having multiple independent variables).

3. Clustering(Groups)
Here Data is grouped into clussters and then outliers are detected.


Note: Missing / Noisy can be generated due to faulty data collection, data entry errors etc.

Data Reduction
Technique Handle Working
Data Mining huge amount of data Hard with huge data
Data Reduction Use it in data mining Easy to work with less data in additional it increase the storage efficiency , reduce data storage and analysis costs


Data Reduction Steps
S. No Data Reduction Steps Details
1 Attribute Subset Selection Except highly relevant attributes discarded all
2 Numerosity Reduction Store model of data not data
3 Dimensionality Reduction Reduce data size using encoding mechanisms (lossy or lossless). After decompression if the original data retrieved called as lossless reduction otherwise it is called as lossy reduction. Methods of dimensionality reduction are:Wavelet transforms and Principal Componenet Analysis(PCA).


Data Transformation in Data Mining
Data Transformation is processes of converting data are transformed from one format to another format which is more appropriate for data mining.

Data Transformation Strategies
Data Transformation Strategy Name Details
Smoothing It is a process of removing noise from the data.
Aggregation Here summary or aggregation operations are applied to the data.
Generalization Here low-level data are replaced with high-level data by using concept hierarchies climbing.
Normalization Normalization scaled attribute data so as to fall within a small specified range, such as 0.0 to 1.0.
Attribute Construction Here new attributes are constructed from the given set of attributes.



Home     Back