When and where
14 sep 2016, h 11
Meeting room
Title
Hypothesis testing and feature selection in semi-supervised data
Abstract:
In this presentation we will present a set of novel methodologies which
enable valid statistical hypothesis testing when we have only positive
and unlabelled (PU) examples. This type of problem, a special case of
semi-supervised data, is common in text mining, bioinformatics, and
computer vision. Focusing on a generalised likelihood ratio test, we
have 3 key contributions: (1) a proof that assuming all unlabelled
examples are negative cases is sufficient for independence testing, but
not for power analysis activities; (2) a new methodology that
compensates this and enables power analysis, allowing sample size
determination for observing an effect with a desired power; and finally,
(3) a new capability, supervision determination, which can determine
a-priori the number of labelled examples the user must collect before
being able to observe a desired statistical effect. At the end, we will
show how we can use this conditional test of independence for Markov
Blanket discovery around partially labelled targets.
About the speaker
Kostas Sechidis is a post-doc in the Machine Learning and Optimization
Group holding the AstraZeneca/Manchester Data Science Fellowship. His
project focuses on statistical and machine learning methods for subgroup
analysis in clinical trials. Prior to that, he did his PhD in the School
of Computer Science under the supervision of Dr. Gavin Brown. His
research interests are in the area of information theoretic feature
selection in different learning environments and particularly focusing
on medical and health informatics applications.