Royal Holloway Probability and Statistics Colloquium

Moore Annex 34, RHUL, Egham, TW20 0EX

31st March 2016

Registration (free of charge)

For those requiring overnight accommodation: The Hub (RHUL campus) and off campus B&B

Final Programme

 

9:15am Welcoming tea and coffee

Morning Session: Applications

9:30-10:30am  Misleading Metrics: On Evaluating Machine Learning for Malware with Confidence, Lorenzo Cavallaro, Information Security Group, RHUL

Joint work with Roberto Jordaney, Zhi Wang, Davide Papini, Ilia Nouretdinov, and Lorenzo Cavallaro.
Abstract:  

Malware pose a serious and challenging threat across the Internet and the need for automated learning-based approaches has become rapidly clear. Machine learning has long been acknowledged as a promising technique to identify and classify malware threats; such a powerful technique is unfortunately often seen as a black-box panacea, where little is understood and the results—especially with high accuracy—are taken without questioning their quality. For such reasons, results are often biased by the choice of empirical thresholds or dataset-specific artefacts, hindering the ability to set easy-to-understand error metrics and thus compare different approaches. This setting, calls for new metrics that look beyond quantitative measurements (e.g., precision and recall), and help in scientifically assessing the soundness of the underlying machine learning tasks. To this end, we propose conformal evaluator, a framework designed at evaluating the quality of a result in terms of statistical metrics such as credibility and confidence. Credibility tells you how much a sample is credited with one given prediction (e.g., a label), whereas confidence focuses on pointing out how much a given sample is distinguished from other predictions. Such evaluation metrics give useful insights, providing a quantifiable per-choice level of assurance and reliability. Core of conformal evaluator is a non-conformity measure, which, in essence, allows for measuring the difference between a sample and a set of samples. For this reason, our framework is general enough to be immediately applied by a large class of algorithms that rely on distances to identify and classify malware, allowing to better understand and compare machine learning results. To further support our claim, we present case studies where the outcome of three different algorithms are evaluated under conformal evaluator settings. We show how traditional metrics mislead about the performance of different algorithms. Instead, conformal evaluator’s metrics enable us to understand the reasons behind the performance of a given algorithm, and reveal shortcomings of apparently highly accurate methods.

10:30-10:45am discussion

10:45-11:45am On modeling computed tomography (CT) and magnetic resonance imaging (MRI) data, Kristi Kuljus, Institute of Mathematical Statistics, University of Tartu, Estonia

Abstract:

Possibilities for using Markov models in modeling CT and MRI data are studied. We are interested in estimating CT equivalent information from MRI sequences. Such substitute CT images are needed for example in dose planning for radiotherapy. One way to generate substitute CT images is to use regression: the joint distribution of CT and MRI sequences is modeled by a mixture of multivariate normal distributions, and the regression function for calculating substitute CT images given the MRI sequences is obtained by just weighting together conditional expectations of multivariate normal distributions. The obtained regression function depends on how spatial information is included in the modeling procedure. Gaussian mixture models assume independence between voxels, while Markov random field (MRF) models allow for spatial dependence through a Markov random field prior on the mixture components. Hidden Markov (chain) models (HMMs) are in this context somewhere in the middle: to be able to apply HMMs to 3-dimensional data we have to "sequence" the data and lose therefore some information on the neighbourhood structure. We discuss differences between HMM and MRF models and how this affects the modeling results using head data.

11:45am-12:00pm discussion

12:00-1:00pm  On Bayesian segmentation, Jüri Lember, Institute of Mathematical Statistics, University of Tartu, Estonia

Abstract:

We consider the segmentation or decoding problem with hidden Markov models in a fully Bayesian setup. The main focus is MAP or Viterbi segmentation where the goal is to find the path with maximum posterior probability. In the Bayesian setup the Viterbi path cannot be found by a dynamic programming (Viterbi) algorithm any more. We compare several iterative methods for finding the MAP path including simulated annealing, the frequentist's parameter estimation and the segmetation approach and many more.

1-2pm lunch at the Senior Common Room (SCR), Fournder's Building (paid for the speakers only)

Afternoon Session: Random Processes

2-2:15pm coffee/tea

2:15-3:15pm Moments of passage times, explosion, and implosion for continuous time Markov chains, Mikhail Menshikov, Durham University, Joint work with Dimitri Petritis (University of Rennes, France).

Abstract:  

We establish general theorems quantifying the notion of recurrence— through an estimation of the moments of passage times—for irreducible continuous-time Markov chains on countably infinite state spaces. Sharp conditions of occurrence of the phenomenon of explosion are also obtained.  A new phenomenon of implosion is introduced and sharp conditions for its occurrence are proven.

3:15-3:30pm discussion

3:30-4:30pm  Non-homogeneous random walks: Anomalous recurrence and angular asymptotics, Andrew Wade, Durham University

Abstract:  

Spatially homogeneous random walks (i.e., partial sums of i.i.d. random vectors) are well understood. The most delicate regime is when the walk has zero drift, where (under mild conditions) the walk is recurrent in dimensions 1 or 2 but transient in dimension 3 or more. If spatial homogeneity is relaxed, very different behaviour can be observed: zero-drift random walks that are recurrent in 3 dimensions, or transient in 2 dimensions, for example. To probe precisely the recurrence-transience phase transition it is natural to study the asymptotically-zero drift regime (in analogy to classical one-dimensional work of Lamperti). I will survey some results on recurrence behaviour and angular asymptotics for this class of  spatially non-homogeneous random walks, including joint work with Nicholas Georgiou, Iain MacPhee, Mikhail Menshikov, and Aleksandar Mijatovic.


4:30-5pm closing discussion over tea/coffee

6pm dinner (paid for the speakers only), location TBA

For queries, please contact organisers:

Alexey A. Koloydenko (Alexey.Koloydenko@rhul.ac.uk)

Teo Sharia   (T.Sharia@rhul.ac.uk)

Vadim Shcherbakov (Vadim.Shcherbakov@rhul.ac.uk).