One of the most common forms of packaging given to analytical solutions is a scorecard. There are a plethora of books written on building scorecards by eminent authors, and on the way to build a scorecard for any application. Why do we need to package a solution in the form of a scorecard, why not simply give the probability of an event to the client and be done with it?

The answer lies in the way scorecards are built, understood and consumed by businesses to drive whatever business metrics that they are focused on. Building a scorecard is an art of analytical process flow, which starts all the way from data treatment, messaging, variable derivation, imputation, outlier treatment, model building and eventually translating raw probability numbers into a score which can be understood as the score given to a customer/record of something happening, provided the customer/record meets a certain criteria. This does not answer the question why numbers have to be presented in form of a score instead of pure probability number.

In the building process, if there are significant differences in the performance of certain segments of the population, a modeler might opt for segmenting the population on the basis of certain variables to improve the predictive power of the models. Let us say, three segments were built during this process, all of which would eventually undergo the same flow of modeling process that the modeler wants them to. But what about the final result? For all the three segments, we would get three sets of probability numbers, all of which would be would be independent of each other, and would mean different things to business. How would the client react to the three scores, and if this is an application scorecard, the client would have to tell their customers that your application has been rejected because you are in the first segment and you have the following features? This creates a lot of problems in terms of consuming the results of the model as well as passing on the information in terms of communicating to the client’s customers as well. There is a mandatory need for having a singular scorecard for all these segments.

How is this done? The three segments for which the probabilities have been built need to be odds aligned and scaled, so that a particular score means the same rate of risk/probability of the event happening, regardless of which segment the customer/record belongs to. For example, let us assume that in a banking/risk environment, the first segment consisted of a thin file population who do not have a lot of history or trades, and the second segment consisted of a thick file population who have a lot of history and trades, the score of 500 must meant the same chance of risk from both these segments. From a population perspective, if the scores were scaled linearly with a factor of 1000 and odds aligned, it would mean that the thin population could have a x% of the population at r% risk rate, and the thick file population would also have a y% at r% risk rate, where x% could be greater than, less than or equal to y% depending on the initial composition of both the segments; the only thing that the client’s customer needs to be told is not that you belong to a particular segment so you have this score, but rather only that you have this score and this is the reason why your scores are less.

Another reason why scorecards make more sense compared to pure probabilities is the consumption by the client themselves and industry standards. Let us say, for an industry, there is a generic scorecard which provides a risk score to clients within the range of 300-850, where 300 is the highest risk and 800 the lowest risk. A client, decides to talk to a vendor A, to build another set of scores specifically to their proprietary data, so that they can create specific strategies around approval of applications for a specific product. Now, if the vendor builds a scorecard within a range of 1-1000, then each time the client wants to look at both the scores together, the client wouldn’t be able to, as the scales in both the scorecards are different, and the risk ranking at a particular point cannot be compared or would be difficult to. This is another reason why a scorecard needs to be calibrated to the range or scale which is the most prevalent in the industry so that a client can understand, what is the risk captured by the custom score v/s the risk captured by the generic score. The client will be able to understand the risk ratios, compare the effectiveness of both the scores at multiple risk levels, set up effective business strategies and implement them with ease.

Therefore, regardless of the process used, it is always a good and sometimes necessary practice to package an analytical solution in terms of a scorecard rather than a bare probability number. As a modeler, it is also necessary to understand the intricacies associated with transforming multiple probabilities into a scorecard, and one should be wary to test the performance of the scorecard before and after the implementation calibration, odds equalization and other techniques.