Ryan Ramanujam KTH & KI : Use of topological data analysis and the mapper algorithm with biological data
Abstract:
Several techniques utilizing topological methods to analyze multi-dimensional data have emerged in the last few years. Using the principle of persistent homology, barcoding, persistence diagrams and the mapper algorithm all provide information regarding geometric characteristics of point clouds of data. In particular, the mapper algorithm provides a method to visualize data of high dimensionality, as well as identify and recover higher ordered relationships within the data.
We have adapted mapper in order to analyze biological data of various types, including those with multiple data points per sample or individual. Additionally, statistical output of the resultant graph can be used to measure internal cross-validation for binary classification problems, giving an unbiased estimate of model accuracy. The subsets of nodes most relevant for accurate prediction can therefore yield feature importance as well as information regarding variable interactions in the relevant data subsets.
Two practical problems are explored using this methodology. First, data pertaining to tree species, birch (n=30) or pine (n=33), which were spatially scanned were constructed, consisting of a number of rows representing various characteristics of individual branches of a tree. These were then used with mapper to determine the classification accuracy, which correctly identified all 63 individual trees. Furthermore, the features relevant for classification were clearly identifiable. Second, RNA expression data from the autoimmune disorder multiple sclerosis was also analyzed using similar methods. Using a pre-processing algorithm, a subset of thresholded genes was mapped using the disease severity of individuals at sampling as a filter function. When using the presence of the genetic marker HLA-DR15 as a binary partitioning, several genes associated with earlier disease onset and less severe disease were discovered.
Tillbaka till huvudsidan.