Research
“Mide el ángulo formado
por ti y por mí
es la solución a algo muy común aquí”
Una décima de segundo, Antonio Vega, 1997.
[Measuring the angle formed by you and me is the solution to something very common here] I couldn’t find a greatest foreword to introduce kernel methods…
My main research topic is Machine Learning, mostly Kernel Methods (Support Vector Machines, Spectral Clustering…) and their application in health and finances. These two fields share a few interesting characteristics that motivate my research:
- they are strongly regulated, so explainability of the models is at premium
- datasets are usually small, and it is complicated to obtain realistic synthetic data
- people without background in machine learning have been succesfully working in these fields for years and leveraging their expert domain knowledge to design machine learning models sounds like fun.
Core machine learning research
- Statistical Learning Theory: PAC-Bayes bounds to understand the generalization capabilities of machine learning models. This framework pursues to be able to asses the out-of-sample accuracy of models by looking at the in-sample accuracy. Therefore, one could use all the available training instances to learn the model, without leaving a separate validation set to estimate the learnt model generalization capability.
- Feature Engineering, to cook patterns underlying in the available data in a more easy-to-digest way for the model
- Introduction of domain expert knowledge in the design of the models.
Applications of machine learning
- Finances: Portfolios that track financial indices, technical indicators, pricing
- Health: Analysis of neuroimage with machine learning
Ongoing research projects
HADA MADRINA: Human Aware DAta Science for MAchine Learning DRIveN Applications (2021-2024)
This proposal arises as a natural response to cover a set of common gaps that we have identified after several years working as machine learning (ML) experts in multidisciplinary research and innovation oriented teams in health and financial applications. To develop ML models in these two fields is particularly difficult because the scarcity of data must be completed with a lot of prior domain knowledge, that sometimes is hard to embed in the design of the ML models. On top of that, the two fields are highly regulated, which imposes severe limitations to the black-box nature of most ML approaches. However, every time these multidisciplinar collaborations ended up in the deployment of successful model, and the domain experts experienced its potential to help them improve their decision-making processes, these non-ML native domain experts would refer the ML component as a Fairy Godmother that brought in the magic ingredient the project needed to outperform.
The project pursues the development of a Human-Aware DAta science framework for MAchine learning DRIveN Applications (HADA MADRINA, Fairy Godmother in Spanish). HADA MADRINA is a Bayesian ML framework that helps us design those tailored ML Fairy Godmother each project needs in a faster, robust and human-aware manner. The project starting point is a recent key research result of the team, SSHIBA, a framework that generalizes from a Bayesian perspective many tailored ML solutions we had developed in the past to deal with specific health or finance problems.