Blog

Fitting distribution to Operational Risk Data

 

Pristine (www.edupristine.com) and Bionic Turtle have entered into a partnership to promote practical applications of the concepts related to risk analytics and modeling. Practical and hands on understanding of building excel based models related to operational and credit risk is necessary for any job related to risk management. For this purpose, we would be illustrating step by step model building techniques for risk management. Registrations for Operational Risk are OPEN. Please Sign-up now.

Maximum Likelihood Estimation (MLE) in Excel

Hi,

This is Dinesh from Pristine! I will discuss my favourite area – risk modelling with you.

Operational Risk Modeling Techniques

Operational risk modelling uses Loss Data Analysis (internal or external), Scenario Analysis and data on Business Environment and Internal Controls. Loss data analysis and Scenario Analysis require fitting probability distribution to loss data or scenario data i.e. identifying which distribution best describes the empirical or expert judgement data.

There are various methods of fitting distributions to data such as

· Moments Matching (estimating distribution parameters such that moments such as mean, variance of the data are matched)

· Quantile/Percentile Matching (estimating distribution parameters such that quantiles like 50th Quantile i.e. Median, 99% percentile etc. are matched)

· Probability Weighted Moments

· Maximum Likelihood Estimation (MLE)

Why Maximum Likelihood Estimators (MLE)?

MLE method estimates the distribution parameter such that the joint likelihood of observing all empirical data points together gets maximized. Quite unlike moment matching and quantile matching, MLE makes use of all data points instead of only specific moments/quantiles. MLE also allows fitting distribution to truncated and censored data (common feature of operational risk data).

Estimating MLE parameter for any distribution with given PDF and CDF functions

· Start with seed parameter values for the distribution to be fitted to the data

· Find the probability of observing each data point (using appropriate PDF function)

· Assuming all data points are independent, joint probability of observing all data points together is the product of their probability density functions

If two events A and B are independent Joint probability of A and B happening together is P(A) x P(B)

· Overall probability of observing all loss data amounts together would be all PDF multiplied together.

· We would like to maximize the joint likelihood to observe all data points. This is achieved either

Maximizing P(A)*P(B) or

Maximizing log(P(A)) + log(P(B)) or

Minimizing – [log(P(A)) + log(P(B))]

· Use an optimization algorithm (like Excel Solver) to maximize joint log-likelihood by changing parameter values

Excel Modeling for Estimating Parameters for Gamma Distribution using MLE

Let’s take a small operational loss severity data set comprising of only 20 loss data points. Illustration shows estimation of parameters of Gamma distribution using MLE:

clip_image003

Step-1: Seed values of parameters

Gamma distribution has two parameters

· Shape/Alpha which controls the shape of the distribution and impacts skewness and kurtosis of the distribution and

· Scale/Beta which controls the dispersion (variance) of the distribution

For Gamma distribution, shape and scale parameters have to be positive, so we start with positive seed values.

Step-2: PDF of each loss data point

Usually, CDF and PDF functions in Excel have ‘Dist’ suffixed to the distribution name. For instance, NORMDIST function gives CDF of normal distribution if ‘cumulative’ argument is ‘TRUE’ and gives PDF if ‘cumulative’ argument is entered as ‘FALSE’.

Similarly, GAMMADIST gives CDF (cumulative = ‘TRUE’) and PDF (cumulative = ‘FALSE) of Gamma distributed random variable.

clip_image005

Step-3: Calculation of Joint Probability

Joint probability of observing all data points together is calculated as product of PDF of individual data point. We can either

· Maximise product of all PDF or

· Maximize the sum of logs of individual PDF (maximising log likelihood) or

· Minimise the sum of logs of individual PDF (minimizing log likelihood).

We decide to minimize the negative logarithm. Column C we calculate the logs of PDF and in C7 we take negative sum of all log PDF. We then use Excel solver to minimize cell C7 by changing parameter values.

clip_image007

Step-4: Invoke Excel Solver to minimize negative log likelihood function

Excel solver can be used to minimize the log likelihood function on (Cell C7), by changing parameter values (B3 and B4), subject to the constraints that parameters are positive (B3, B4 > = small positive value)

clip_image009

Practical considerations

In practice and as allowed in regulatory guidelines, Banks may not collect data on small operational losses below a threshold (called ‘truncation’ of loss data). MLE parameter estimates need to be corrected for this to avoid bias in estimates.

Say if the loss collection threshold is USD 10000

· Find conditional PDF, conditional on the fact that losses above 10000 are only being collected

· To find conditional PDF, we divide PDF by the cumulative probability that losses are above the threshold (as we are capturing only data points above the threshold)

· Equivalently we divide by (1 – cumulative probability that losses are less than the threshold).

0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="clip_image011" border="0" alt="clip_image011" src="http://www.edupristine.com/wp-content/uploads/2011/11/clip-image011-thumb.jpg" />

Notice that we are now minimizing cell E7, sum of conditional PDF. Conditional probability of each data point is calculated as

PDF(Loss Amount)/(1-CDF(Loss threshold))

GAMMADIST(Loss amount, Param-1, Param-2, Cumulative = False)/ (1- GAMMADIST(Loss Threshold, Param-1, Param-2, Cumulative = True)).

clip_image013

Another noteworthy point is that after considering loss threshold, shape parameter has changed. Therefore, not adjusting for loss threshold would lead to biased parameter estimates.

Templates to download

I have created a template for you, where the subheadings are given and you have to link the model to get the cash numbers! You can download the same from here. You can go through the case and fill in the yellow boxes. I also recommend that you try to create this structure on your own (so that you get a hang of what information is to be recorded).

Also you can download this filled template and check, if the information you recorded, matches mine or not!

Next Steps

Excel functions can be used to model operational loss data. Similar techniques can be used for other common distributions fitted to operational loss data, use:

· ‘Weibull’ function for Weibull distribution

· ‘Expondist’ function for exponential distribution

· ‘Normdist’ function for lognormal distribution after taking Natural Log of all data points

Also in the next few tutorials we would find the methodology of fitting distributions to data and the practical problems that we face in the same.

By the way, we have launched the course on Operational Risk Modeling, which covers all these practical concepts through video lectures. If you want to join the same, you can join by clicking here.

To reach the author, you can send an email to dinesh.chaudhary@asymmetrix.co.in.


The case has been drafted for discussion purpose. It has been written by Pristine (www.edupristine.com) and would be discussed by experts from Pristine & Bionic Turtle. There would be a step by step analysis and financial model building to come to a conclusion on the decision.

Related posts:

  • Introduction to Operational Risk The rogue trader that cost UBS $2 billion adds to a long list of case studies that only begins with Barings, Sumitomo, Allied Irish Bank,...
  • Normal Distribution Normal Distribution which is also called Gaussian curve (after the name of Carl Friedrich Gauss) is the most commonly used distribution among the world of...
  • VaR Methodology: Shortcomings What is the most I can lose on this investment? This is a question that almost every investor who has invested or is considering investing...
  • FRM Part II webinar on Credit Risk The webinar conducted by Pristine for FRM Part II candidates on Credit Risk in the month of August 2011, for launching it's 2011 batch. Pristine...
  • FRM FREE Preparation for Important Topics – Value at Risk Pristine organized a free one hour Webinar (live instruction training) on Value at Risk (VaR). The objective of the Webinar was to explain the concept...

avatar An MBA from IIM Indore and a B.Tech. and M.Tech from IIT Delhi in Mathematics and Computing, Paramdeep is a director with Pristine. Prior to founding Pristine, he was working with Standard Chartered Investment Banking as a Senior Manager.

Tags: , , , , , ,