Talks June 2017

talks@lists.idsia.ch

1 participants
1 discussions

by Announcements of talks＠IDSIA

Dear Colleagues, We will have three talks on Wednesday, June 7th, starting at about 14h30. Room G1-222. :: On the Stability of Feature Selection Algorithms ::Speaker: Gavin Brown (University of Manchester) Feature Selection is central to modern data science, from exploratory data analysis to predictive model-building. The stability of a feature selection algorithm refers to the robustness of its feature preferences with respect to small changes in the training data. An algorithm is ‘unstable’ if a small change in data leads to large changes in the chosen feature subset. We present a rigorous statistical and axiomatic treatment for this concept, applicable generically from bioinformatics to business analytics. In particular we address is how best to measure stability – in the literature we find numerous proposals, each with different motivations. In this work we consolidate the literature and suggest a new approach to the problem. The result is (1) a deeper understanding of existing work based on a small set of axioms, and (2) a clearly justified statistical framework with several novel benefits. This approach serves to identify a stability measure obeying all desirable axioms, and (for the first time in the literature) allowing confidence intervals on its estimates, enabling a more rigorous comparison of feature selection algorithms. :: Distinguishing prognostic and predictive biomarkers: An information theoretic view :: Speaker: Kostantinos Sechidis (University of Manchester) We present a novel method for data-driven ranking of predictive biomarkers, using information theoretic methods. A strength of the approach is in explicitly distinguishing predictive vs prognostic markers, allowing us to quantify when markers are solely predictive, solely prognostic, or some mixture of the two. Our information theoretic formalization of the problem enable us to derive biomarker rankings that capture the predictive strength, by estimating different, high dimensional, conditional mutual information terms. To estimate these terms, we suggest efficient low dimensional approximations, and we derive an empirical Bayes ranking procedure, which is suitable for "small n, large p" scenarios. Our approach turns out to be an asset in small sample scenarios, when noise factors may dominate and markers get mistakenly identified as predictive, when in fact they are just strongly prognostic. We propose that the information theoretic view is a natural and flexible mathematical framework for data-driven biomarker discovery, providing a natural algebra to discuss and quantify the `predictiveness' and `prognosticness' of candidate biomarkers. ::Hierarchical Multinomial-Dirichlet model for estimating of conditional probabilities and mutual information ::Speaker: Laura Azzimonti (Idsia) We present a novel approach for estimating conditional probability tables based on a hierarchical Multinomial-Dirichlet model, which relaxes the traditional local independence assumption. We derive exact analytical expressions for the estimators and we analyse their properties both analytically and via simulation. We then apply this method to the estimation of parameters in a Bayesian network. Given the structure of the network, the proposed approach better estimates the joint distribution and significantly improves the classification performance with respect to traditional parameter estimation approaches. In the end, we apply the hierarchical Multinomial-Dirichlet model to the estimation of mutual information. The proposed mutual information estimator is then compared to other traditional estimators via simulations.

8 years, 1 month

1
0
0 / 0

← Newer
1
Older →

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Talks June 2017