Data Analytics / Data Mining / Quality Control / Business Intelligence / Web-based analytics

Bulgarian  English  Romanian  Greek  Turkey  Serbia  Croatia  Macedonia  Slovenia

STATISTICA Scorecard


Feature Selection

The Feature Selection module is used to exclude unimportant or redundant variables from the initial set of characteristics. You can create a variable ranking using two measures of overall predictive power of variables: IV (Information Value) and Cramer's V. Based on these measures, you can identify which characteristics have important impact on credit risk and select them for the next stage of model development. Moreover, the Selecting representatives option enables you to identify redundance among numerical variables without analyzing the correlation matrix of all variables. This module creates bundles of commonly correlated characteristics using factor analysis with rotation of scores. In each bundle, variables are highly correlated with the same factor (and often with each other) so you can easily select only a small number of bundle representatives.

STATISTICA Credit Scorecard Builder

Attributes Building

In the Attributes Building module, you can prepare risk profiles for every variable. Using an automatic algorithm (based on the CHAID method) or a manual mode, you can divide variables (otherwise known as characteristics) into classes (attributes or "bins") containing homogenous risks. Initial attributes can be adjusted manually to fulfill business and statistical criterions such as profile smoothness or ease of interpretation. To build proper risk profiles, statistical measures of the predictive power of each attribute (WoE - Weight of Evidence, and IV - Information Value) are generated. The quality of the WoE can be assessed for each attribute using a graph of Weight of Evidence (WoE) trend. The whole process can be saved as an XML script, and can be used later in the Credit Scorecard Builder module.

Credit Scorecard Builder

The Credit Scorecard Builder module is used to create a scorecard based on attributes prepared in the Attributes Building module and logistic regression model. The process from data to scorecard can be simplified by accepting the default parameters. Advanced users may recode initial variables into attributes (WoE or sigma-restricted dummy variables) and choose one of the model building methods:

  • Forward entry,
  • Backward elimination,
  • Forward step-wise,
  • Backward step-wise,
  • Best subset,
  • Bootstrap for all effects.

STATISTICA Credit Scorecard Builder

Once a model is built, a set of statistics (such as AIC, BIC, LR tests) and reports (such as the eliminated unimportant variables) can be generated. The final stage of this process is scorecard preparation, using a logistic regression algorithm to estimate model parameters and specified scale values to transform the model into a scorecard format, after which it can be saved as Excel, XML, or SVB script.

Survival Models

Survival Models is used to build scoring models using Cox Proportional Hazard Model. You will be able to estimate a scoring model using additional information about the time of default (when the debtor stopped paying). Based on this module, you can calculate the probability of default (scoring) in given time (for example, after 6 months, 9 months, etc.).

Reject Inference

In some circumstances, there is a need to take into consideration cases where the credit applications were rejected. Because there is no information about output class (good or bad credit) of rejected cases, this information will be garnered using an algorithm - the k-nearest neighbors method and parceling method are available. After analysis, a new data set with complete information is produced.

STATISTICA Credit Scorecard Builder

Model Evaluation

The Model Evaluation module is used to evaluate and compare different scorecard models. To assess models, you can select the following statistical measures (each with full detailed report)

  • Information Value (IV),
  • Kolmogorov - Smirnov statistic (with respective graph),
  • Gini index,
  • Divergence,
  • Hosmer - Lemeshow statistic,
  • ROC curve analysis,
  • Lift and Gain chart.

STATISTICA Credit Scorecard Builder

Additional reports include:

  • Final score report,
  • Characteristic report,
  • Odds chart,
  • Bad rate chart.

ThYou can assess goodness-of-fit of generated models and choose one that fulfills your expectations prior to creating the scorecare model.
 

Cutoff Point Selection

Cutoff Point Selection is used to define the optimal value of scoring to separate accepted and rejected applicants. You can extend the decision procedure by adding one or two additional cut-off points (for example applicants with scores below 520 will be declined, applicants with scores above 580 will be accepted, and applicants with scores between these values will be asked for additional qualifying information). Cut-off points can be defined manually, based on an ROC analysis for custom misclassifications costs and bad credit fraction (ROC - Receiver Operating Characteristic - provides a measure of the predictive power of a model). Additionally, you can set optimal cutoff points by simulating profit associated with each cut-point level. Goodness of the selected cut-off point can be assessed based on various reports.

STATISTICA Credit Scorecard Builder

Score Cases

The Score Cases module is used to score new cases using the selected model saved as an XML script. You can calculate overall scoring, partial scorings for each variable, and probability of default (from logistic regression model), adjusted by an a priori probability of default for the whole population (supplied by the user).
 

Population Stability

The Population Stability module provides analytical tools for comparing two data sets (for example, current and historical data sets) in order to detect any significant changes in characteristics structure or applicants population. Significant distortion in the current data set may signify the need to reestimate parameters of the model. This module produces reports of population and characteristics stability with respective graphs.