Analytical CRM Insurance Sector

Published on February 2017 | Categories: Documents | Downloads: 16 | Comments: 0 | Views: 182
of 11
Download PDF   Embed   Report




Analytical CRM can be deployed to understand processing of claims in Insurance sector. Deregulation of Insurance industry in the global has resulted in increased number of players in the market hence competition.

In India also Industry has undergone a major change. Before 2000, two state insurers – i.e. LIC and GIC were the only players in the market. These

companies were created after the nationalization of the life and non-life sectors in 1956 and 1952 respectively.

Eventually government took a decision to dismount the monopoly. One of the reasons may be that competition would promote better products, value & service to the customers. This will increase the overall size of the sector.

There are about twenty new entrants and majority of them (about 12 or so) in life and about 6 in non-life sector. Initially life sector has attracted more participants than non-life sector. Eventually the prediction is there will be about 16 to 26 companies in life and about 8 to 12 in non – life sector.

In general performance of non life sector is more challenging than life sector. With deregulation of Insurance sector financial companies, banks are getting into non life insurance sector to their existing customer base. This requires non-life insurers to add value in the value chain.

Analytical CRM can be used in the insurance industry for the following applications.


Acquiring new customers Identifying cross selling/ upselling opportunities Establishing the premium rates Assisting the regulators to understand from Rate and Models.

The two cases given below address the issue of establishing the rates and identification of cross selling opportunities.

Establishing the premium rates is an important aspect of insurance business. The goal is to set rates that reflect the risk level of the policy holder. The lower the risk, the lower the premium rate.

Identification of cross selling/ upselling opportunities involves identification of those customers in the existing database whose likelihood of responding to a product which they do not hold presently is the highest.

As an example consider a case where we have a customer database of about 100,000. out of the 1,00,000 customers, say about 10,000 are currently holding a specific product and the balance 90,000 are not holding the product.

We are interested in identifying about 20,000 customers out of the 90,000, the criterion being their probability of responding to the promotion/ marketing campaign is the highest so that we do not waste time and energy on those whose likelihood of buying is not high. marketing campaign/strategy. This will also help to develop a focused

The above concept can be illustrated diagrammatically as given below :

Existing and Potential Customers Profile of those most likely to Purchase

Profile of those most likely to remain loyal

The case study given below illustrates how Analytical CRM concepts can be applied to understanding the Risk factors that arte related to size of claim in nonlife insurance sector.

We have a small insurance data base named CLAIMS.xls which contain data for about 1000 policies containing claim data for Residences for various types of claim.

The information available are: • • • • • • • • • • • • Policy Number Cost of the Asset Age in months Number of bedrooms Number of bathrooms Number of Floors Percentage of wood used in construction Percentage of concrete used in construction Percentage of other materials Whether there is a smoke alarm Claim type (Theft; Fire; Riots) Claim amount

The problem to be analyzed is how the type of claim is related to claim amount, age, material used in the construction. This type of understanding will help to price the insurance product that relates to the risk involved.

So we use claim Type as DEPENDENT OR TARGET VARIABLE. For this study, all other variables are included as independent variables. The presence or absence of smoke alarm is treated as categorical variables while all others are treated as numeric variables. The policy number is excluded from the analysis. This is because it will not help us to derive generalized knowledge about the Question we are trying to answer.

Using FORESIGHT a single decision tree is built using the option that at least 100 records should be there in each branch of the tree. Also 20% of the data is used for testing.

If you look at Root node of the tree out of 800 samples used for training, there are about 280 cases in Vandalism, 283 in Theft and 236 in fire. Cases are

approximately equally distributed. When the model is used to classify the entire training data set a total of 274 samples have been misclassified. If you look at the confusion matrix, you can see that majority of fire cases have been correctly classified. The confusion matrix is as follows:

Fire Fire Theft Riots 201 4 0

Theft 2 175 131

Riots 10 104 149

Classification accuracy =

201+175+149 201+25+10+4+175+104+0+131+149

≡67 65.7

Error : 33%

The most important variable is claim amount, which is related to the target variable i.e. claim type. From the tree one can make out that if claim amount is greater than $5300, they belong to fire class. Classification Error is around 2%, which means the classification Accuracy is about 98%. Majority of Claims relating to fire has claim amount greater than $5300.

Drill down the segment containing the claim amount <= $5300. Again it shows that claim amount is an important variable. This is broken down into Two segments one containing >$2700 and another <=$2700. Now one can see that majority of theft cases has amount greater than $2700 and <=$5300. The error is about 41% which means that accuracy is about 59%.

We can drill down each segment and understand the relation between claim type and other variables.

Let us see whether we can improve the accuracy of the model by suing some of the advanced features available in FORESIGHT.

Let us use the cross validation approach. Use 10 as the number of trees used in cross validation. This means the system will build TEN different trees by dividing the data into block. (This has been explained earlier). If uses 9 blocks to develop the model and uses one block to validate. This is repeated by leaving out one block that is different each time so that TEN trees will be generated.

Look at the display of the cross validation tree. By default the best tree is displayed. In this tree also, claim amount is the most important variable. But the split is at claim amount $4000 instead of earlier amount of $5000. The accuracy of the tree is nearly 90.7% (Error rate: 9.3%). This has significantly increased the accuracy of the Model compared to earlier one where the error was about 33%.

By looking at Two segments of the tree where the claim amount is <=4000 and >4000 one can see that right branch of the tree has fire claims of 91%, theft 9% and vandalism% which is in contrast with left branch of the tree where fire 5%, theft 45% and vandalism 50%. The figures clearly indicate that two segment are very distinct, the left segment concentration on vandalism + Theft cases where as the Right Segment Concentration on fire type of claims.

Drill down on the Right segment. Now you can see that for claim amount >$7400, all belong to class fire. Out of TOTAL of 255 cases of claim type fire, 202 have been classified correctly in the total data set which translates to an accuracy of 90%.

You can explore the tree further.

Click on Rule generation tab. FORESIGHT Automatically generates the Rules by growing the tree to full depth and converting the tree to a set of rules by pruning the trees.

Some of the Rules relating to claim type FIRE are:

If Claim > $7400; claim type Fire 99.3% If Price <=$2,17,000 and claimtype Fire 99.2% Claim amount > 4000 If % concrete > 40 and Claim type Fire 98.7% Claim amount > 4000

Rules relating to Theft are: If Price > 2,17,000 and number of bathrooms >2 and Claim amount > 4000 and Claim amount <= 7400

Coming to vandalism, there is only one rule which states that If claim amount <3100 Then class = vandalism [ 51.6%]

This indicates that vandalism class is not correctly classified by the rule. One reason may be that definition / separation between vandalism and theft is not very clear. The second conclusion may be that input variables used are insufficient to separate out this class from others. May be addition of few more explanatory variables may help to separate out this class from others.

(1) Using Autofilter feature of EXCEL. Vandalism (2) Use cross validation feature and generate the best tree. (3) Did you gain any additional Insight into the problem by this method? (4) Change the default pruning factor if possible (5) Did you gain any additional information to update your mental model based on additional information by analysis. Filter only cases of Theft and


Identification of cross selling opportunities is an another important application of Business Intelligence. Basically this involves identification of those customers who do not own a specific product but whose likelihood of response to campaign targeting that product is maximum. Identification of such type of customer will result in considerable cost savings in acquisition cost of customer.

The data for this case consists of Socio-economic / Demographics, Product holding (s), of about 2300 customers. The original data set consisted of about 40 Socio-economic / Demographic data and about 42 product holding attributes. The data set for this study has been reduced and it consists of 6 socio/demographic information and about 8 product holding information.

The information available are

Age Customer type Average – income No. of houses Rented – house Purchasing power class Socio / Demographic Variables

Contribution of car policies Contribution of Life policies Contribution of boat policies

Number of car policies Number of life policies Number of Fire policies Mobile home policy Product holding attributes

Almost all the variable are grouped variables for example age has been grouped into 6 groups purchasing power, contribution to different policies etc. deciles (10 groups: coded 0 – 9) into

Customer type: into ten groups

Since this is sample of original data set, you may not have all levels. Out of total 2340 Customers, about 350 hold Mobile home policy & the rest do not hold.

The objective is to identify a subset of Customers Who do not hold the mobile policy & whose likelihood of purchasing the mobile home policy is highest. Obviously the target variable is Mobile home policy.

Include all other variables in input. Change the status of all input variables to Group.

This example illustrates the case where the distribution of target is highly skewed. In the sample data only about 10% own the policy where as the balance 90% do not own the policy.

Use cross validation option and generate the tree.

From the tree it is clear that contribution to car policy has the highest impact on those who own the mobile home policy. There are two segments form the root of the tree. The left segment consists of group 0,5,7 & Right 6 based on contribution to car policies. The left segment consists of group where only 6% own the policy; 94% do not own. The right segment consists of group where 26% own the policy & 74% do not own.

If one has to stop at this stage, we can say that those 690 customers in the right hand segment of the Tree who do not own the mobile policy are those whose likelihood of purchase is maximum. Drill down this segments to Two levels below The segment consisting of one where contribution to Fire policies is 3 and 4, customer main type: GROWERS has 56% owning the policy and 44% not owning the policy. These 53 customers have highest likelihood of buying. Thus from a total population of about 1754 who do not own, this level of the tree has identified about 42 customers to whom the product can be targeted.

(1) Do not Group any of the Independent variables. Run the tree. What do you infer about the Results. (2) Change the number of cross validation trees to 15. Do you see any difference in Results? (3) Do a preliminary analysis of data using EXCEL. Identify a few meaning segments based on age, Customer type, average income. Build model (s) for identified segments. obtained in (1) Compare the results to those

Sponsor Documents

Or use your account on


Forgot your password?

Or register your new account on


Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in