Knowledge Discovery in Databases(KDD)

DWDM \ Knowledge Discovery in Databases

Knowledge Discovery in Databases is used to create knowledge either from structured or unstructured data sources or both.

Data Preprocessing

Data Cleaning	Missing Data, Noisy Data
Data Reduction	1. Attribute Subset Selection(Attributes). 2. Numerosity Reduction(Storage). 3. Dimensionality Reduction(compression).
Data Transformation	1. Smoothing(Remove Noise). 2. Aggregation(summary). 3. Generalization(Low level to high level). 4. Normalization(Scaling). 5. Attribute Construction(Create Attributes)

Data Cleaning (Missing Data, Noisy Data)

Missing Data

1. Ignore the tuples.
2. Fill the Missing values with attribute mean / probable value.

Noisy Data

1. Binning Method(Create Bins)
Here complete data is divided into segments called as bins on those bins , data is smoothed by means of some methods.

2. Regression(Function)
Here data is smoothed by fitting it to a regression function. The regression used may be linear (having one independent variable) or multiple (having multiple independent variables).

3. Clustering(Groups)
Here Data is grouped into clussters and then outliers are detected.

Note: Missing / Noisy can be generated due to faulty data collection, data entry errors etc.

Data Reduction

Technique	Handle	Working
Data Mining	huge amount of data	Hard with huge data
Data Reduction	Use it in data mining	Easy to work with less data in additional it increase the storage efficiency , reduce data storage and analysis costs

Data Reduction Steps

S. No	Data Reduction Steps	Details
1	Attribute Subset Selection	Except highly relevant attributes discarded all
2	Numerosity Reduction	Store model of data not data
3	Dimensionality Reduction	Reduce data size using encoding mechanisms (lossy or lossless). After decompression if the original data retrieved called as lossless reduction otherwise it is called as lossy reduction. Methods of dimensionality reduction are:Wavelet transforms and Principal Componenet Analysis(PCA).

Data Transformation in Data Mining
Data Transformation is processes of converting data are transformed from one format to another format which is more appropriate for data mining.

Data Transformation Strategies

Data Transformation Strategy Name	Details
Smoothing	It is a process of removing noise from the data.
Aggregation	Here summary or aggregation operations are applied to the data.
Generalization	Here low-level data are replaced with high-level data by using concept hierarchies climbing.
Normalization	Normalization scaled attribute data so as to fall within a small specified range, such as 0.0 to 1.0.
Attribute Construction	Here new attributes are constructed from the given set of attributes.

Home Back