When and where
14 sep 2016, h 11
Meeting room

Title
Hypothesis testing and feature selection in semi-supervised data

Abstract:
In this presentation we will present a set of novel methodologies which enable valid statistical hypothesis testing when we have only positive and unlabelled (PU) examples.  This type of problem, a special case of semi-supervised data, is common in text mining, bioinformatics, and computer vision. Focusing on a generalised likelihood ratio test, we have 3 key contributions: (1) a proof that assuming all unlabelled examples are negative cases is sufficient for independence testing, but not for power analysis activities; (2) a new methodology that compensates this and enables power analysis, allowing sample size determination for observing an effect with a desired power; and finally, (3) a new capability, supervision determination, which can determine a-priori the number of labelled examples the user must collect before being able to observe a desired statistical effect. At the end, we will show how we can use this conditional test of independence for Markov Blanket discovery around partially labelled targets.

About the speaker
Kostas Sechidis is a post-doc in the Machine Learning and Optimization Group holding the AstraZeneca/Manchester Data Science Fellowship. His project focuses on statistical and machine learning methods for subgroup analysis in clinical trials. Prior to that, he did his PhD in the School of Computer Science under the supervision of Dr. Gavin Brown. His research interests are in the area of information theoretic feature selection in different learning environments and particularly focusing on medical and health informatics applications.