KTH Matematik  


Matematisk Statistik

Tid: 8 juni 2018 kl 13.50-14.25.

Seminarierummet F11, Institutionen för matematik, KTH, Lindstedtsvägen 22.

Föredragshållare: Erik Alpsten

Title: Modeling news data flows using multivariate Hawkes processes

Abstract: This thesis presents a multivariate Hawkes process approach to model flows of news data. The data is divided into classes based on the news' content and sentiment levels, such that each class contains a homogeneous type of observations. The arrival times of news in each class are related to a unique element in the multivariate Hawkes process. Given this framework, the massive and complex flow of information is given a more compact representation that describes the excitation connections between news classes, which in turn can be used to better predict the future flow of news data. Such a model has potential applications in areas such as finance and security. This thesis focuses especially on the different bucket sizes used in the discretization of the time scale as well as the differences in results that these imply. The study uses aggregated news data provided by RavenPack and software implementations are written in Python using the TensorFlow package.
For the cases with larger bucket sizes and datasets containing a larger number of observations, the results suggest that the Hawkes models give a better fit to training data than the Poisson model alternatives. The Poisson models tend to give better performance when models trained on historic data are tested on subsequent data flows. Moreover, the connections between news classes are given to vary significantly depending on the underlying datasets. The results indicate that lack of observations in certain news classes lead to over-fitting in the training of the Hawkes models and that the model ought to be extended to take into account the deterministic and periodic behaviors of the news data flows.

The full report (pdf)

Till seminarielistan
To the list of seminars

Sidansvarig: Jimmy Olsson
Uppdaterad: 30/5-2018