Evaluate model performance for (binary) classification

To download the table as a csv-files click the download button on the top-right of your screen. To download plots as png files click the download icon on the middle-right of your screen.

Response variable

Select the outcome, or response, variable of interest. This should be a binary variable, either a factor or an integer with two value (i.e., 0 and 1).

Choose level

The level in the response variable that is considered a success. For example, purchase or buyer is equal to “yes”.

Predictor

Select one or more variables that can be used to predict the chosen level in the response variable. This could be a variable, an RFM index, or predicted values from a model (e.g., from a logistic regression estimated using Model > Logistic regression (GLM) or a Neural Network estimated using Model > Neural Network (ANN)).

# quantiles

The number of bins to create.

Margin & Cost

To use the Profit and ROME (Return on Marketing Expenditures) charts, enter the Margin for each sale and the estimated Cost per contact (e.g., mailing costs or opportunity cost of email or text). For example, if the margin on a sale is $10 (excluding the contact cost) and the contact cost is $1 enter 10 and 1 in the Margin and Cost input windows.

Show results for

If a filter is active (e.g., set in the Data > View tab) generate results for All data, Training data, Validation data, or Both training and validation data. If no filter is active calculations are applied to all data.

Plots

Generate Lift, Gains, Profit, and/or ROME charts. The profit chart displays a profit index useful to compare performance across predictors/models

Example

The Gains and Profit charts below show little evidence of overfitting and suggest that targeting approximately 65% of customers would maximize profits.

This insight is confirmed by looking at the confusion matrix. The True Positive Rate in the training and validation sample are 94.0% and 93.4% respectively.

The prediction used in the screen shots above was derived from a logistic regression on the dvd data. The data is available through the Data > Manage tab (i.e., choose Examples from the Load data of type drop-down and press Load examples). The model was estimated using Model > Logistic regression (GLM). The predictions shown below were generated in the Predict tab.

© Vincent Nijs (2016) Creative Commons License