When and where
14 sep 2016, h 11
Meeting room
Title
Hypothesis testing and feature selection in semi-supervised data
Abstract:
In this presentation we will present a set of novel
methodologies which enable valid statistical hypothesis testing
when we have only positive and unlabelled (PU) examples. This
type of problem, a special case of semi-supervised data, is
common in text mining, bioinformatics, and computer vision.
Focusing on a generalised likelihood ratio test, we have 3 key
contributions: (1) a proof that assuming all unlabelled examples
are negative cases is sufficient for independence testing, but
not for power analysis activities; (2) a new methodology that
compensates this and enables power analysis, allowing sample
size determination for observing an effect with a desired power;
and finally, (3) a new capability, supervision determination,
which can determine a-priori the number of labelled examples the
user must collect before being able to observe a desired
statistical effect. At the end, we will show how we can use this
conditional test of independence for Markov Blanket discovery
around partially labelled targets.
About the speaker
Kostas Sechidis is a post-doc in the Machine Learning and
Optimization Group holding the AstraZeneca/Manchester Data
Science Fellowship. His project focuses on statistical and
machine learning methods for subgroup analysis in clinical
trials. Prior to that, he did his PhD in the School of Computer
Science under the supervision of Dr. Gavin Brown. His research
interests are in the area of information theoretic feature
selection in different learning environments and particularly
focusing on medical and health informatics applications.