Classification

DWDM \ Classification

Classification is a action / process of classifying something.

General Approach to Solve a Classification Problem Steps
1. Divide the Input into 2 sets.
a. Training Set. b. Test set.
2.Training Set is used to build a classification model.
3. New classification model is used on test set to pedict the class label of each test record.

Classification models

Classification Model Name	Details / used for
Descriptive modeling	summarizing the data
Predictive modeling	predict the class label of unknown records.

Evaluation of Classifiers
Classifier algorithm is used to map the input data to a specific category. Classification model used for predicting a new class labels for a given data.

"Methods (5)" of Evaluation of Classifiers / Classification Model Evaluation

Method Name	Details
Holdout Method	1. Partition given data into 2 independent sets randomly. 2. Training set used for model construction (2/3). 3. Test set used for accuracy estimation (1/3).
Random Sub-sampling	It is a variant of Holdout Method. Repeat k Holdout Method for k times and calculate the accuracy which is equal to average of accuracies obtained.
Cross / k-fold validation	Partition the data into k mutual exclusive subset each approximately equal size. At ith iteration use D sa test set and others as training set.
Leave-one-out Method	It is a variant of Holdout Method. Repeat k Holdout Method for k times and calculate the accuracy which is equal to average of accuracies obtained. K folds where k = tuples for small sized data.
Bootstrap	Here sampling is done with replacement on training record. i.e. for training a training which is chosen is kept back into the original list of recods at the same position and it has equal chance to be redrawn.

"Metrics (5)" of Classification Evaluation
Every classifier has 2 classes and outputs one of two possible outputs: true or false.

Confusion Matrix / contingency Table
Confusion Matrix in data mining

Metric are used for measurement used in evaluating our different classification models.
1. Accuracy =proportion of true results among the total number of cases examined
Accuracy = (TP+TN)/(TP+FP+FN+TN)
2. Precision = proportion of predicted Positives is truly Positive= (TP) / (TP+FP)
3. Recall = proportion of actual Positives is correctly classified= (TP) / (TP+FN)
4. F1 Score= is the harmonic mean of precision and recall.
5. Sensitivty (the probabilities from the positive classes are separated from the negative classes)= TPR(True Positive Rate)= Recall = TP/(TP+FN)

To evaluate your classifier is to train the svm algorithm is the best method

Step1	Training Data	to test your classifier	67%
Step2	Test Data	Database And Test	33%

Classification Techniques

Unsupervised Learning	A Data mining task
Also Called as	Self organized Hebbian learning
Definition : Used To Find	Unknown patterns in data set without pre-existing labels.
Allows Modeling	Probability densities of given inputs.

Unsupervised Learning	Data mining task
Definition	It is used analyzes the training data to produces an inferred function used for mapping new examples. Or It is a function from labeled training data.

Support Vector Machine (SVM)
It is usedto separate / classify the dataset into two classes by using a single straight line. Here all points fall in one side are labelled in first class and remaining labelled in the second class. It is used for Web and Text Mining or regression problems.

Support Vector Machine or SVM

Types of SVM
It is of 2 types.They were
1. Linear SVM used for linearly separable data. i.e data set is classified by using a straight line.
2. Non-linear SVM used for Non linearly separable data. i.e data set is cant be classified by using a straight line.

Decision Trees
It is a tree used as a representation to solve the problem in data mining regarding the decision taking.It contains a root node, branches and leaf nodes.It is used for Web and Text Mining and regression methods.

Type of Node	Node Denotes
Internal Node	Test on An Attribute
Branch Node	Outcome of a Test
Leaf Node	Holds a Class Label
Topmost Node	Root Node

Pruning
Pruning is a technique used to reduce the size of decision trees by removing irrelevant sections a of tree.

Methods for Expressing attribute test conditions
It depends on type of attribute.

Types of attributes in a data mining

Attribute type	Details	Examples
Nominal	Have different names to distinguish one object from another	EMPID numbers, : zip codes
Ordinal	Used to order objects(<, >)	street numbers, Hardness of minerals
Interval	Their differences between values are meaningful. (+,-)	Temperature in Celsius or Fahrenheit, Calendar dates
Ratio	Their differences and ratios are meaningful(*,/)	Counts, age and Temperature in Kelvin

measures for selecting the best split

Home Back