One of the most crucial decisions while building a scorecard is choosing the definition of the dependent variable. This definition would be the basis of all subsequent analysis, and modeling exercises that would be done, the basis of which the model performance in real business would also be judged. So what goes into selecting the dependent definition?
There are two things to keep in mind before selecting the definition of the dependent variable –
• Problem statement
• Identification of contrast
But before we need delve into that, let us understand a few basic concepts of a scorecard.
What is a Scorecard?
A scorecard is basically a way of scoring a customer on the probability of doing ‘something’ on the basis of his/her historical data. Therefore, all scorecards are essentially static in nature, or rather the result of a state function of the customer’s current state to some future action.
In analytical terms, doing ‘something’ is called the performance of the customer, and historical data is called observation. Now, a customer can be observed at different times in the past and will have different attributes attached to him at each and every point of time, but since only one state of the customer can be used to build the scorecard, only a particular point of time will be used to create the attributes for any scorecard development exercise. This point of time is called the observation snapshot. In the same way, once the performance is defined by a certain criteria, the length of time which is assigned to satisfy that criteria is called performance window. Since historical data is used to predict customer performance, the observation snapshot should always be prior to the performance window and should not overlap.
There might be a case when multiple observation snapshots might be needed to create the complete analytical dataset. In such a scenario, the conditions which apply to one snapshot need to apply to all the snapshots. From the example above, all snapshots need to have a performance of six months, starting two months after the observation point.
How to Design the Dependent Definition?
Now that we understand the basic terminologies, let us look at how to design the dependent definition. The first need is to define the performance of the customer. It is essential to note here that the performance of the customer needs to be a strong criteria with no scope for ambiguity, as any form of ambiguity would lead to misidentification of performance across customers. By standard practice, once performance is defined, customers meeting the definition can be marked as 1, and customers not meeting the definition can be marked as 0. After defining the performance, we need to define the performance window within which the performance of each customer will be measured, and then we need to define the point in time when the observation snapshot will be taken. Below is an illustration of how defining all these parameters can lead scorecards with multiple applications
If you notice above, the bad criteria, the performance window and the observation points are strict definitions with no ambiguity. This ensures that when we are designing the dependent definition of a scorecard, all the stakeholders are completely aligned with the objective of the scorecard and what will the score eventually convey. This is essential as the definition which is used often drives the consumption of a scorecard, and the best use scenario for a particular scorecard. Often there are businesses which would use a scorecard for purposes other than what it is meant for, but there are a lot of complications and unreliability associated with using scorecards like that.
Also, note the difference between a behavior scorecard v/s an application scorecard. Since the use case of an application scorecard is to determine whether or not to approve a customer for a loan or credit product, we have the observation point at one month before the trade was opened and a performance window of 3 months from opening the account. This basically means once the scorecard is built, the score will denote the probability of going into a 30+ days past due state on applications approved within one month, for 3 months from the time it is opened. Modifying this definition slightly results in a completely different scorecard, like a behavioural scorecard.
However, at this point, some readers may have a doubt in their minds about choosing the bad definition criteria for an application scorecard. What if a customer is 30 days past due in the 2nd month, but pays all his dues by the 3rd month, should that customer still be considered a bad customer? This is where the second thing we need to keep in mind comes into play, the contrast between the good and the bad must be maximized.
Let us consider the behavior scorecard. There will be certain customers who will become 90+ days past due within the first 4 month of the 12 month performance period due to some hardship, but then reorganize their finances to meet their credit obligations. These customers will clear their dues and become a good customer again. Now the question is, do we include these customers in the scorecard development or not? Such customers are called rollovers, or indeterminates, as their performance could always keep on fluctuating. To have a model which is able to adequately distinguish between the good and bad customers, we need to create a contrast between the observations with the good customers and observations with bad customers. Adding indeterminates to the scorecard development process would decrease the contrast between the good and bad, and hence they should not be included in development. However, these customers will be scored as a part of the total population.
So to sum up, while creating the bad definition or the dependent definition for any scorecard development, it is necessary to be absolutely clear in two things:
• Problem statement for business application and
• Establishing the maximum possible contrast between the good and the bad.
These two factors can be guiding factor in ensuring that the definition is in fact adequate and statistically and logically sound.