DWDM \ Classification

Classification is a action / process of classifying something.

classification

General Approach to Solve a Classification Problem Steps
1. Divide the Input into 2 sets.
a. Training Set. b. Test set.
2.Training Set is used to build a classification model.
3. New classification model is used on test set to pedict the class label of each test record.

general approach to solve a classification problem

Classification models
Classification Model Name Details / used for
Descriptive modeling summarizing the data
Predictive modeling predict the class label of unknown records.


Evaluation of Classifiers
Classifier algorithm is used to map the input data to a specific category. Classification model used for predicting a new class labels for a given data.

"Methods (5)" of Evaluation of Classifiers / Classification Model Evaluation
Method Name Details
Holdout Method 1. Partition given data into 2 independent sets randomly.
2. Training set used for model construction (2/3).
3. Test set used for accuracy estimation (1/3).
Random Sub-sampling It is a variant of Holdout Method. Repeat k Holdout Method for k times and calculate the accuracy which is equal to average of accuracies obtained.
Cross / k-fold validation Partition the data into k mutual exclusive subset each approximately equal size. At ith iteration use D sa test set and others as training set.
Leave-one-out Method It is a variant of Holdout Method. Repeat k Holdout Method for k times and calculate the accuracy which is equal to average of accuracies obtained. K folds where k = tuples for small sized data.
Bootstrap Here sampling is done with replacement on training record. i.e. for training a training which is chosen is kept back into the original list of recods at the same position and it has equal chance to be redrawn.


"Metrics (5)" of Classification Evaluation
Every classifier has 2 classes and outputs one of two possible outputs: true or false.

Confusion Matrix / contingency Table
Confusion Matrix in data mining

Metric are used for measurement used in evaluating our different classification models.
1. Accuracy =proportion of true results among the total number of cases examined
Accuracy = (TP+TN)/(TP+FP+FN+TN)
2. Precision = proportion of predicted Positives is truly Positive= (TP) / (TP+FP)
3. Recall = proportion of actual Positives is correctly classified= (TP) / (TP+FN)
4. F1 Score= is the harmonic mean of precision and recall.
5. Sensitivty (the probabilities from the positive classes are separated from the negative classes)= TPR(True Positive Rate)= Recall = TP/(TP+FN)

To evaluate your classifier is to train the svm algorithm is the best method
Step1 Training Data to test your classifier 67%
Step2 Test Data Database And Test 33%


Classification Techniques
Unsupervised Learning A Data mining task
Also Called as Self organized Hebbian learning
Definition : Used To Find Unknown patterns in data set without pre-existing labels.
Allows Modeling Probability densities of given inputs.


Unsupervised Learning Data mining task
Definition It is used analyzes the training data to produces an inferred function used for mapping new examples. Or It is a function from labeled training data.


Support Vector Machine (SVM)
It is usedto separate / classify the dataset into two classes by using a single straight line. Here all points fall in one side are labelled in first class and remaining labelled in the second class. It is used for Web and Text Mining or regression problems.

Support Vector Machine or SVM



Types of SVM
It is of 2 types.They were
1. Linear SVM used for linearly separable data. i.e data set is classified by using a straight line.
2. Non-linear SVM used for Non linearly separable data. i.e data set is cant be classified by using a straight line.

Decision Trees
It is a tree used as a representation to solve the problem in data mining regarding the decision taking.It contains a root node, branches and leaf nodes.It is used for Web and Text Mining and regression methods.

Type of Node Node Denotes
Internal Node Test on An Attribute
Branch Node Outcome of a Test
Leaf Node Holds a Class Label
Topmost Node Root Node

Decision Trees in data mining



Pruning
Pruning is a technique used to reduce the size of decision trees by removing irrelevant sections a of tree.

Methods for Expressing attribute test conditions
It depends on type of attribute.

Types of attributes in a data mining

Attribute type Details Examples
Nominal Have different names to distinguish one object from another EMPID numbers, : zip codes
Ordinal Used to order objects(<, >) street numbers, Hardness of minerals
Interval Their differences between values are meaningful. (+,-) Temperature in Celsius or Fahrenheit, Calendar dates
Ratio Their differences and ratios are meaningful(*,/) Counts, age and Temperature in Kelvin


measures for selecting the best split




Home     Back