Introduction
XLMiner supports all facets of the data mining process, including data partition, classification, prediction, and association. The second stage, classification, is used to categorize a set of observations into pre-defined classes based on a set of variables. XLMiner functionality features six different classification methodologies: discriminant analysis, logistics regression, k-nearest neighbors, classification tree, naïve Bayes, and neural network. Each method has its own unique features and the selection of one is typically determined by the nature of the variables involved.
How to Access Classification Methods in Excel
- Launch Excel.
- In the toolbar, click XLMINER PLATFORM.
- In the ribbon's Data Mining section, click Classify.
- In the drop-down menu, select a classification method.
Classification Methods
Discriminant Analysis Method
Constructs a set of linear functions of the predictor variables and uses these functions to predict the class of a new observation with an unknown class. Common uses of this method include: classifying loan, credit card or insurance applicants into low or high risk categories, classifying student applications for college entrance, classifying cancer patients into clinical studies, etc.
Logistic Regression Method
A variant of ordinary regression which is used to predict the response variable, or the output variable, when the response variable is a dichotomous variable (a variable that takes only two values such as yes/no, success/failure, survive/die, etc.).
k-Nearest Neighbors Method
This classification method divides a training dataset into groups of k observations using a Euclidean Distance measure to determine similarity between “neighbors”. These classification groups are used to assign categories to each member of the validation training set.
Classification Tree Method
Also known as Decision Trees, this classification method is a good choice when goal is to generate easily understood and explained “rules” that can be translated in an SQL or query language.
Naïve Bayes Method
This classification method first scans the training dataset and finds all records where the predictor values are equal. Then the most prevalent class of the group is determined and assigned to the entire collection of observations. If a new observation’s predictor variable equals the predictor variable of this group, the new observation will be assigned to this class. Due to the simplicity of this method a large number of records are required to obtain accuracy.
Neural Network Method
Artificial neural networks are based on the operation and structure of the human brain. These networks process one record at a time and “learn” by comparing their classification of the record (which as the beginning is largely arbitrary) with the known actual classification of the record. Errors from the initial classification of the first records are fed back into the network and used to modify the networks algorithm the second time around. This continues for many, many iterations.
Classification Methods Summary
- Used to categorize a set of observations into pre-defined classes based on a set of variables.
- XLMiner supports six different classification methods.
Resources
- Data Mining: Introduction to data mining and its use in XLMiner. Major functionality discussed in this topic's sub-pages include classification, prediction, and ensemble methods.
- Time Series Analysis: Introduction to time series analysis techniques used in XLMiner, including ARIMA models and smoothing techniques.