Paris NLP Season 3 Meetup #6 at Scaleway

We would first like to thank Scaleway as host of this Meetup #6 and also our speakers for their very interesting presentations.


You can find the slides of our speakers below:


  Olga Petrova, Machine Learning DevOps Engineer at Scaleway

Subject: Understanding text with BERT


Reading comprehension is one of the fundamental human skills that, however, presents a highly non trivial problem for a machine learning system. One of the ways to begin tackling it is to cast it in the form of question answering based on a given text. In this talk we shall look at how we can approach this task using the latest advance in deep learning for NLP: the Transformer architecture, which has come to replace RNN based models for many NLP tasks. In particular, we will go through an example of training a model based on BERT, a pre-trained encoder/transformer network, on SQuAD (the Stanford Question Answering Dataset).

[PDF] Meetup_Paris_NLP_24_07_2019_Scaleway


•  Axel de Romblay, Machine Learning Engineer at Dailymotion

Subject: How to build a multi-lingual text classifier ?

In this talk, we will introduce one of the biggest challenge we face at dailymotion : how do we accurately categorize our video catalog at scale using the descriptions ?
The purpose is to introduce the whole pipeline running at dailymotion which relies on a complex mixing of different methods : machine learning for language detection, NEL to Wikidata knowledge graph, deep learning using sparse representations and NLP with multi-lingual embeddings & robust transfer learning.

Reference :

[PDF] Meetup_Paris_NLP_24_07_2019_Dailymotion


• Arthur Darcet & Mehdi Hamoumi, Glose

Subject: Measuring text readability with strong and weak supervision


Text complexity is mainly described by three factors:
* Readability, text content described such as vocabulary, syntax, discourse.
* Legibility, text form such as character size, font and formatting such as emphasis.
* Reader-dependent features such as reading ability and reading context such as environment (noisy, calm, classroom, subway) or intent (educational, recreational).

At Glose, we built a product where readers can discover, read, and annotate thousands of e-books while being able to share with their friends. It is currently used by thousands of readers worldwide, especially in the academic field where collaborative reading is a great feature for professors/teachers and their students.
In order to improve reading experience, we are currently working on automatic text readability evaluation to enhance book recommendation, which should ease the learning curve of a reader.
We tackle this NLP task with both supervised and unsupervised machine learning approaches.

During this talk, we will present our supervised pipeline [1] which encodes a book’s content into a set of features and consumes it to fit model parameters that are able to predict a readability score.
Then, we will introduce an unsupervised approach to this task [2] based on the following hypothesis: the simpler a text is, the better it should be understood by a machine. It consists in correlating the ability of multiple language models (LMs) at infilling Cloze tests with readability level labels.


[PDF] Meetup Paris NLP 24_07_2019_Glose


Meetup’s video:

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s