From cryptography over statistics and medicine: the need for multi-disciplinary collaboration

Over many centuries, medical science has accumulated a vast and profound knowledge of the human body and how treatments can heal diseases.  However, some biological processes are too complex to fully understand, describe in a text book or study manually.  Moreover, each patient is different, e.g., because he has different genes, lives in a different environment or has developed different habits.  Also, regularly new diseases emerge, such as those caused by new viruses.  To further improve medicine, it is important to not only rely on knowledge and rules but to also follow a data-driven approach.

Machine Learning for a more accurate diagnoses in healthcare

Nowadays, huge amounts of patient data are collected in electronic patient records, and examining those using machine learning allows us to learn new patterns and make more accurate diagnoses, and more accurate predictions about the future evolution of a disease or about the effect of a treatment.  Machine learning is a field studying ways to let computers analyze huge amounts of data efficiently for statistical patterns, and hence learn and become better at making predictions or choosing actions.  For machine learning we need both computer science to devise fast algorithms and statistics for sound mathematical modeling.  The more data is used, the better the models become and the better the machine learning models can help medical science.  The data collected in a single hospital is often sufficient to learn models which are simple but already quite useful.  To address more complex questions, data from a larger number of patients is needed, so one wants to combine data from many hospital to further boost performance.

TRUMPET project and cryptographic technique for preserving patient data thanks to multidisciplinary

However, naively sharing data between hospitals also means that at some points sensitive data from thousands of patients may be concentrated, and in case of a failure or an attack may be leaked.  Ethics considerations and legislation require researchers to protect personal data.  To avoid this, the TRUMPET project develops federated machine learning using cryptographic technique so personal patient data never leaves the hospital where the patient is, and statistical models are learned by only exchanging encrypted messages.  In this way, to improve medical care, researchers from many disciplines, including cryptography, statistics, algorithms, law and medicine, need to collaborate.


Jan Ramon – INRIA