Paris NLP Season 3 Meetup #5 at LinkFluence

Full room for Paris NLP @ Linkfluence

We would first like to thank Linkfluence as host of this Meetup #5 and also our speakers for their very interesting presentations. Not forgetting the attendees for being so many at this session!

You can find the slides of our speakers below:

• Alexis Dutot, Linkfluence

At Linkfluence, we analyze millions of social media posts per day in more than 60 languages. This represents thousands of noisy user-generated documents per second passing through our internal enrichment pipeline. This volume combined with the real-time constraint prevents us from using cross lingual BERT-like models.

In this talk we will focus on multilingual sentiment analysis and emotion detection tasks based on social media data. Only a few annotated corpora tackle these tasks and the vast majority of them is dedicated to the English language. We will see how we fully exploit the potential of emojis as a universal expression of sentiment and emotion in order to build accurate sentiment analysis and emotion detection “real-time” deep learning systems in several languages using solely English annotated corpora.

[PDF] Alexis Dutot, Linkfluence

Benoît Lebreton, Sacha Samama and Tom Stringer, Quantmetry

Melusine is an open source library developed by Quantmetry and MAIF. The talk focuses on technical issues raised by Melusine’s open source implementation, as well as underlying neural models and algorithms that are being leveraged.

[PDF] Benoît Lebreton, Sacha Samama and Tom Stringer, Quantmetry

 

Videos will come soon!

 

Advertisements

Paris NLP Season 3 Meetup #4 at MeilleursAgents

We would like first thank MeilleursAgents as host of this meetup, then thank our 3 speakers for their very interesting presentation and also thank the participants for coming still so many at this session.

You can find the slides of our three speakers below:

• Syrielle Montariol, LIMSI, CNRS

Word usage, meaning and connotation change throughout time ; it echoes the various aspects of the evolution of society (cultural, technological…). For example the word “Katrina”, originally associated with female surnames, came closer to the disaster vocabulary after Hurricane Katrina appeared in august 2005.
Diachronic word embeddings are used to grasp such change in an unsupervised way : it is useful to linguistic research to understand the evolution of languages, but also for standard NLP tasks to study long time-range corpora.
In this talk, I will introduce a selection of methods to train time-varying word embeddings and to evaluate it, placing greater emphasis on probabilistic word embeddings models.

Slides Syrielle Montariol, LIMSI, CNRS

Pierre Pakey and Dimitri Lozeve, Destygo

If data beats model, why not build models that produce data ? Vast quantities of realistic labeled data will always make the difference in all machine learning optimization problems. At Destygo, we automatically leverage the interactions between users and our conversational AI agents to produce vast quantities of labelled data and train our natural language understanding algorithms in a reinforcement learning framework. We will present the outline of our self-learning pipeline, its relation with state of the art literature and the specificity due to the NLP space. Finally, we will focus more specifically on the network responsible for choosing whether to try something new or not, which is one of the important pieces of the process.

Slides Pierre Pakey & Dimitri Lozeve, Destygo

 Julien Perez, Machine Learning and Optimization group, Naver Labs Europe

Over the last 5 years, differentiable programming and deep learning have become the-facto standard on a vast set of decision problems of data science. Three factors have enabled this rapid evolution. First, the availability and systematic collection of data have enabled to gather and leverage large quantities of traces of intelligent behavior. Second, the development of standardized development framework has dramatically accelerated the development of differentiable programming and its applications to the major’s modalities of the numerical world, image, text, and sound. Third, the availability of powerful and affordable computational infrastructure have enabled this new step toward machine intelligence. Beyond these encouraging results, new limits have arisen and need to be addressed. Automatic common-sense acquisition and reasoning capabilities are two of these frontiers that the major research labs of machine learning are now involved. In this context, human language has become once again a support of choice of such research. In this talk, we will take a task of natural language understanding, machine reading, as a medium to illustrate the problem and describe the research progress suggested throughout the machine reading project. First, we will describe several of the limitations the current decision models are suggesting. Secondly, we will speak of adversarial learning and how such approach robustifies learning. Thirdly, we will explore several differentiable transformations that aim at moving toward these goals. Finally, we will discuss ReviewQA, a machine reading corpus over human generated hotel review, that aims at encouraging research around these questions.

Slides Julien Perez, Machine Learning and Optimization group, Naver Labs Europe

 

 

Paris NLP Season 3 Meetup #3 at Doctrine

We would like first thank Doctrine as host of this meetup, then thank our 3 speakers for their presentation and also thank the participants for coming so  many at this session.

You can find the slides of our three speakers below:

Hugo Vasselin & Benoit Dumeunier, Artefact

Comment redéfinir l’image d’une marque avec un simple compteur de mots ? Ce talk célèbre la rencontre entre la data science et la créa. Il vous raconte comment des techniques de NLP basiques, croisées à une approche créative ont permis de re-définir une marque. Dans un premier temps, nous avons conçu un outil permettant de donner une idée de la perception des différentes marques d’un grand groupe hotelier à travers le monde, par rapport à ses concurrents. Ces données ont fait ressortir un certain nombre de valeurs chères aux hôtes, qui ont servi de piliers pour des expériences de marque créatives et innovantes…

Slides Hugo Vasseling & Benoît Dumeunier (Artefact)

Romain Vial, Hyperlex

Hyperlex is a contract analytics and management solution powered by artificial intelligence. Hyperlex helps companies manage and make the most of their contract portfolio by identifying relevant information and data to manage key contractual commitments during the whole life of the contract. Our technology rests on a combination of specifically trained Natural Language Processing (NLP) algorithms and advanced machine learning techniques.

In this talk, I will present some of the challenges we are currently solving at Hyperlex through a focus on two important NLP tasks: (i) learning representations for texts and words using recent language modelling techniques; and (ii) building knowledge from predictions by mining relations in legal documents.

Slides Romain Vial (Hyperlex)

Grégory Châtel, Lead R&D @ Disaitek et membre du programme Intel AI software innovator

In this talk, I will present two recent research articles from openAI and Google AI Language about transfer learning in NLP and their implementation.

Historically, transfer learning for NLP neural networks has been limited to reusing pre-computed word embeddings. Recently, a new trend appeared, much closer to what transfer learning looks like in computer vision, consisting in reusing a much larger part of a pre-trained network. This approach allows to reach state of the art results on many NLP tasks with minimal code modification and training time. In this presentation, I will present the underlying architectures of these models, the generic pre-training tasks and an example of using such network to complete a NLP task.

Slides Grégory Châtel (Disaitek)

Paris NLP Season 3 Meetup #2 at Méritis

Thanks to our host Meritis

• François Yvon, LIMSI/CNRS

Using monolingual data in Neural Machine Translation

Modern machine translation rests on the availability of appropriate parallel corpora, which are scarce and costly to accumulate. Monolingual corpora are much easier to get, and can be easily integrated into Statistical Machine Translation systems, where they have shown to be of great help. The issue is slightly different in Neural Machine Translation (NMT) , and how to take advantage of these resources is still the subject to discussions. In this talk, I will try to summarize a series of recent papers on this topic and comment on the current state of the debate. This will also give me the opportunity to discuss research in NMT in more general terms. This has been conducted jointly with Franck Burlot.

presentation_francois_yvon

———-
• Kezhan SHI, Data Science manager at Allianz France,

will show you interesting results with NLP techniques in an Insurance project through an in-depth case study involving :

– string distance or phonetic distance (used in geocoding for string fuzzy matching)
– documents classification (for construction firm’s activity recognition)
– word2vec (understanding construction firm’s activities)

Paris NLP Season 3 Meetup #1 @Xebia

Thanks to our host : Xebia

Guillaume Lample, FAIR [Talk in English]
Unsupervised machine translation

Machine translation (MT) has achieved impressive results recently, thanks to recent advances in deep learning and the availability of large-scale parallel corpora. Yet, their effectiveness strongly relies on the availability of large amounts of parallel sentences, which hinders their applicability to the majority of language pairs.

Previous studies have shown that monolingual data — widely available in most languages — can be used to improve the performance of MT systems. However, these were used to augment, rather than replace, parallel corpora.

In this talk, I will present our recent research on Unsupervised Machine Translation, where we show that it is possible to train MT systems in a fully unsupervised setting, without the need of any cross-lingual dictionary or parallel resources whatsoever, but with access only to large monolingual corpora in each language. Beyond translating languages for which there is no parallel data, our method could potentially be used to decipher unknown languages.

Talk_Meetup_NLP_Guillaume_Lample

Thomas Wolf, Hugging Face [Talk in English]
Neural networks based dialog agents: going beyond the seq2seq model

I will present a summary of the technical tools behind our submission to the Conversational Intelligence Challenge 2 which is part of NIPS 2018 (convai.io).

This challenge tests how a dialog agent can incorporate personality as well as common sense reasoning in a free-form setting.

Our submission is leading the leaderboard topping all tested metrics with a significant margin over the second top model.

These strong improvements are obtained by an innovative use of transfert learning, data augmentation technics and multi-task learning in a non-seq2seq architecture.

Hugging Face Slides

Paris NLP Meetup #6 Season 2 @ LinkValue

You can find the video of the meetup here : https://www.youtube.com/watch?v=sIX8AxMe_bU

[Talk in English] Guillaume Barrois – Liegey Muller Pons
LMP is a technology company that develops tool to understand public opinion at a very local scale. This talk will present exemples of analysis that we apply to original textual data sources, in order to extract the dynamics and features of the opinion on a given territory.

meetup_nlp_liegey_muller_pons

[Talk in French] Ismael Belghiti – Hiresweet

HireSweet permet aux entreprises de recruter les meilleurs ingénieurs
en développant un moteur de recommandation classant des profils à partir d’une offre d’emploi. Ce talk présentera comment différentes techniques de NLP peuvent être appliquées pour calculer un score de matching entre un profil et une offre, en comparant leur performance sur une métrique de ranking dédiée.

meetup_nlp_hiresweet

• [Talk in English] Gil Katz earned his PhD in Information Theory from CentralSupélec in 2017. Today he is a senior data scientist in SAP Conversational AI (previously Recast.AI), based in Paris.

Unsupervised Learning and Word Embeddings

The field of Machine Learning can be divided into two main branches – supervised and unsupervised learning. While examples for applications of supervised learning are easy to come by, the power of unsupervised learning is less intuitive. In this talk, we will use the problem of representing words as a case study. The limitations of simple one-hot encoding will be discussed before describing the modern method of embedding words in a vector space of real numbers. After comparing several approaches, current advances and future challenges will be discussed.

meetup_nlp_recast

Paris NLP Meetup #5 Season 2 @ Snips

  • Adrien Ball, Snips

An Introduction to Snips NLU, the Open Source Library behind Snips Voice Platform

Integrating a voice or chatbot interface into a product used to require a Natural Language Understanding cloud service. Snips NLU is a Private by Design NLU engine. It can run on the edge or on a server, with minimal footprint, while performing as good or better than cloud solutions.

2018_05_NLP_meetup_snips

  • Jérôme Dockes, INRIA

Mapping neuroimaging text reports to spatial distributions over the brain.

We learn the statistical link between anatomical terms and spatial coordinates extracted from the neuroscience literature. This allows us to associate brain images with fragments of text which describe neuroimaging observations. Accessing the unstructured spatial information contained in such reports offers new possibilities for meta-analysis.

2018_05_NLP_meetup_inria

  • Charles Borderie, Victor de la Salmonière et Marian Szczesniak, Lettria

LETTRIA développe des outils de Traitement du Langage exclusivement dédiés à la compréhension du Français. L’accent est mis sur la facilité d’utilisation, la performance et l’appréciation du réel sens des mots.

2018_05_NLP_meetup_Lettria

Paris NLP Meetup #4 Season 2 @ Critéo

Arnaud DELAUNAY et Daoud CHAMI, LinkValue
NLP as a pricing tool ?
Deep Learning on text data for Regression problems

2018_03_NLP_meetup_linkvalue

Pierre-Emmanuel Mazaré, Facebook (FAIR)
In this talk, we’ll present DrQA, our architecture for question answering. We test it in various settings and show its value both on closed domain tasks such as SQuAD and in an open-domain setup that has access to all wikipedia text.

2018_03_NLP_meetup_facebook

Sacha VAKILI, Doctrine
Natural Language Processing for Legal Applications

2018_03_NLP_meetup_doctrine

 

Cinquième Paris NLP Meetup @ Dataiku

La cinquième édition du meetup a été accueilli le 24 Mai par Dataiku, qu’on remercie chaleureusement.

Au programme de la soirée, 2 talks cette fois:

  • Karl Neuberger (Partner @ Quantmetry) – Antoine Simoulin (Data Scientist @ Quantmetry) nous on présenté le projet Senometry: Analyse de dossiers médicaux textuels pour l’extraction de données structurées que Quantmetry a mené en collaboration avec l’unité de Sénologie du CHRU de Strasbourg (unité de traitement des maladies du sein) pour la mise en place d’une méthodologie d’extraction et de structuration automatisée de données textuelles anonymisées issues des dossiers de patientes traitées pour un cancer du sein.
  • Damien Nouvel (chercheur @ Inalco) nous a introduit la problématique de la désambiguïsation lexicale de manière générale et pour plusieurs langues, puis nous a décrit les méthodes généralement utilisées pour réaliser cette tâche pour enfin illustrer ces méthodes pour la reconnaissance et la résolution des entités nommées en français.

On remercie nos speakers encore une fois ainsi que notre hôte Dataiku et on vous donne rendez-vous à la prochaine édition qui aura lieu le 26 Juillet.

Quatrième Paris NLP Meetup à l’école 42

L’école 42 nous a accueilli le 22 mars pour cette quatrième édition du Paris NLP meetup, où 80 personnes ont pu assister aux 3 talks de la soirée.

  • Paul Renvoisé (Cofondateur @ Recast.ai) Recast.ai est une plateforme de construction et d’entrainement de chatbot collaborative créé en septembre 2015. Paul nous a exposé le problème de constitution de datasets pour l’apprentissage supervisé (classification et reconnaissance d’entité nommé). (slides)
  • Christophe Bourguignat (Cofondateur et CEO @ Zelros) – Mathieu Bizen (Data Scientist @ Zelros) La mission de Zelros est d’intégrer le conversationnel dans les process de l’entreprise. Les speakers nous ont présenté des applications des modèles basés sur des réseaux de neurones pour les tâches de Natural Language Understanding. (slides)
  • Francois Régis Chaumartin (CEO @ Proxem) Proxem Software est une suite logicielle qui intègre les technologies de web, text et data mining. François nous a présenté l’intégration du deep learning qui rend plus simple la création d’un analyseur sémantique sur mesure, multi-lingue et adapté au corpus à traiter, en maximisant le rappel et la précision.

La video du meetup est aussi disponible ici.

On remercie encore une fois nos 3 speakers et notre hôte l’école 42 !