We would first like to thank Scaleway as host of this Meetup #6 and also our speakers for their very interesting presentations.
You can find the slides of our speakers below:
• Olga Petrova, Machine Learning DevOps Engineer at Scaleway
Subject: Understanding text with BERT
Reading comprehension is one of the fundamental human skills that, however, presents a highly non trivial problem for a machine learning system. One of the ways to begin tackling it is to cast it in the form of question answering based on a given text. In this talk we shall look at how we can approach this task using the latest advance in deep learning for NLP: the Transformer architecture, which has come to replace RNN based models for many NLP tasks. In particular, we will go through an example of training a model based on BERT, a pre-trained encoder/transformer network, on SQuAD (the Stanford Question Answering Dataset).
• Axel de Romblay, Machine Learning Engineer at Dailymotion
Subject: How to build a multi-lingual text classifier ?
In this talk, we will introduce one of the biggest challenge we face at dailymotion : how do we accurately categorize our video catalog at scale using the descriptions ?
The purpose is to introduce the whole pipeline running at dailymotion which relies on a complex mixing of different methods : machine learning for language detection, NEL to Wikidata knowledge graph, deep learning using sparse representations and NLP with multi-lingual embeddings & robust transfer learning.
Reference : https://medium.com/dailymotion/topic-annotation-automatic-algorithms-data-377079d27936
• Arthur Darcet & Mehdi Hamoumi, Glose
Subject: Measuring text readability with strong and weak supervision
Text complexity is mainly described by three factors:
* Readability, text content described such as vocabulary, syntax, discourse.
* Legibility, text form such as character size, font and formatting such as emphasis.
* Reader-dependent features such as reading ability and reading context such as environment (noisy, calm, classroom, subway) or intent (educational, recreational).
At Glose, we built a product where readers can discover, read, and annotate thousands of e-books while being able to share with their friends. It is currently used by thousands of readers worldwide, especially in the academic field where collaborative reading is a great feature for professors/teachers and their students.
In order to improve reading experience, we are currently working on automatic text readability evaluation to enhance book recommendation, which should ease the learning curve of a reader.
We tackle this NLP task with both supervised and unsupervised machine learning approaches.
During this talk, we will present our supervised pipeline  which encodes a book’s content into a set of features and consumes it to fit model parameters that are able to predict a readability score.
Then, we will introduce an unsupervised approach to this task  based on the following hypothesis: the simpler a text is, the better it should be understood by a machine. It consists in correlating the ability of multiple language models (LMs) at infilling Cloze tests with readability level labels.
[PDF] Meetup Paris NLP 24_07_2019_Glose