Seminarier i Matematisk Statistik

KTH Matematik

$Matematisk Statistik$

Tid: 17 april 2019 kl 13.00-14.00.

Seminarierummet 3418, Institutionen för matematik, KTH, Lindstedtsvägen 25, plan 4.

Föredragshållare: Michel Postigo Smura

Title: Cluster analysis on sparse customer data on purchase of insurance products

Abstract: This thesis work aims at performing a cluster analysis on customer data of insurance products. Three different clustering algorithms are investigated. These are K-means (center-based clustering), Two-Level clustering (SOM and Hierarchical clustering) and HDBSCAN (density-based clustering). The in- put to the algorithms is a high-dimensional and sparse data set. It contains information about the customers previous purchases, how many of a product they have bought and how much they have paid. The data set is partitioned in four different subsets done with domain knowledge and also preprocessed by normalizing respectively scaling before running the three different cluster algorithms on it. A parameter search is performed for each of the cluster al- gorithms and the best clustering is compared with the other results. The best is measured by the highest average silhouette index. The results indicates that all of the three algorithms performs approxi- mately equally good, with single exceptions. However, it can be stated that the algorithm showing best general results is K-means on scaled data sets. The different preprocessings and partitions of the data impacts the results in different ways and this shows that it is important to preprocess the input data in several ways when performing a cluster analysis.

The full report (pdf)

Till seminarielistan
To the list of seminars

Sidansvarig: Jimmy Olsson
Uppdaterad: 4/12-2019