Papers
arxiv:1804.07461

GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

Published on Apr 20, 2018
Authors:
,
,
,
,

Abstract

For natural language understanding (NLU) technology to be maximally useful, both practically and as a scientific object of study, it must be general: it must be able to process language in a way that is not exclusively tailored to any one specific task or dataset. In pursuit of this objective, we introduce the General Language Understanding Evaluation benchmark (GLUE), a tool for evaluating and analyzing the performance of models across a diverse range of existing NLU tasks. GLUE is model-agnostic, but it incentivizes sharing knowledge across tasks because certain tasks have very limited training data. We further provide a hand-crafted diagnostic test suite that enables detailed linguistic analysis of NLU models. We evaluate baselines based on current methods for multi-task and transfer learning and find that they do not immediately give substantial improvements over the aggregate performance of training a separate model per task, indicating room for improvement in developing general and robust NLU systems.

Community

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment

Models citing this paper 21

Browse 21 models citing this paper

Datasets citing this paper 2

Spaces citing this paper 89

Collections including this paper 9