• Pierre Bros, Inato (https://inato.com/) [Talk in French]
“Building unique hospital profiles using clustering and classification: an evolution of our approach as Inato grew”
At Inato we match clinical trials with qualified sites [site ~ hospital] worldwide and optimize site performance [performance ~ number of patients recruited] throughout the trial. We help biopharma companies identify high-performing sites, increase the pool of available patients, and deliver efficient, on-time studies.
In order to get the most accurate and complete information possible, we scrape multiple data sources, that come in different languages and formatting. The creation of a unique profile for each site means we need to be able to identify all the different names a site can take.
We will show how we approached this challenge for the last 2 years, and how our solution evolved with Inato.
• Edouard d’Archimbaud & Pierre Marcenac, Kili Technology (https://kili-technology.com/) [Talk in French]
“How to scale up training data?”
“It is better to have a standard algorithm on a lot of quality data than a state-of-the-art algorithm on a lot of data.”
Thus, data labelling has become, even if very painful, an essential step in the modelling process.
However, annotation to scale requires a combination of intuitive interfaces and machine learning (for example, to pre-annotate). Moreover, labelling at scale without compromising data quality requires transparency throughout the labelling process to facilitate quality monitoring and internal collaboration and/or with external annotators.
We will show how we at Kili have structured our annotation pipeline to scale up and better yet, to help get it into production by facilitating human supervision and continuous learning
• Pauline Chavallard, Doctrine (https://www.doctrine.fr/) [Talk in French]
“Structuring legal documents with deep learning”
Court decisions are traditionally long and complex documents. To make things worse, it is not uncommon for a lawyer to only be interested in the operative part of the judgement (for example, the outcome of the trial).
In fact, in general, it is pretty standard to be looking for a specific legal aspect, which can quickly feel like looking for a needle in a haystack. As such, our goal was to detect the underlying structure of decisions on Doctrine (i.e. the table of contents) to help users navigate them more easily.
Decisions can be seen as small stories. While humans can understand them because they are naturally context-aware and have some expectations, how should an algorithm operate?
In order to address this challenging issue, we trained a neural network (bi-LSTM with attention) using PyTorch to help us predict a suitable table of contents given a free text decision.
This talk gets into more details about our methodology and results