As one of the premier institutes for technology, Massachusetts Institute of Technology (MIT) has several prominent research which has resulted in many ground-breaking technological advancements.
In this article, we take a look at the top five recent research papers from on Computational Linguistics from the institute.
1. Learning an Executable Neural Semantic Parser
Authors: Jianpeng Cheng, Siva Reddy, Vijay Saraswat, and Mirella Lapata
Abstract: This article describes a neural semantic parser that maps natural language utterances ontological forms that can be executed against a task-specific environment, such as a knowledge base or a database, to produce a response. The parser generated tree-structured logical forms with a transition-based approach, combining a generic tree-generation algorithm with domain-general grammar defined by the logical language.
Research methodology: To tackle mismatches between natural language and logical form tokens, various attention mechanisms were explored. Finally, the researchers considered different training settings for the neural semantic parser, including fully supervised training where annotated logical forms were given, weakly supervised training where denotations were provided, and distant supervision where only unlabeled sentences and a knowledge base are available.
Access the research: Learning an Executable Neural Semantic Parser
2. Unsupervised Compositionality Prediction of Nominal Compounds
Authors: Silvio Cordeiro, Aline Villavicencio, Marco Idiart and Carlos Ramisch
Abstract: Nominal compounds such as red wine and nut case display a continuum of compositionality, with varying contributions from the components of the compound to its semantics. This article proposes a framework for compound compositionality prediction using distributional semantic models, evaluating to what extent they capture idiomaticity compared to human judgments.
Research methodology: For evaluation, the researchers introduced data sets containing human judgments in three languages: English, French, and Portuguese. The results obtained reveal a high agreement between the models and human predictions, suggesting that they were able to incorporate information about idiomaticity.
Access the research: Unsupervised Compositionality Prediction of Nominal Compounds
3. Automatic Inference of Sound Correspondence Patterns across Multiple Languages
Authors: Johann-Mattis List
Abstract: The researcher presented an automatic method for the inference of sound correspondence patterns across multiple languages based on a network approach. The core idea was to represent all columns in aligned cognate sets as nodes in a network with edges representing the degree of compatibility between the nodes.
Research methodology: The task of inferring all compatible correspondence sets can then be handled as the well-known minimum clique cover problem in graph theory, which essentially seeks to split the graph into the smallest number of cliques in which each node is represented by exactly one clique. The resulting partitions represent all correspondence patterns that can be inferred for a given data set. By excluding those patterns that occur in only a few cognate sets, the core of regularly recurring sound correspondences can be inferred. Based on this idea, the article presents a method for automatic correspondence pattern recognition, which is implemented as part of a Python library which supplements the article.
Access the research: Automatic Inference of Sound Correspondence Patterns across Multiple Languages
4. A Sequential Matching Framework for Multi-Turn Response Selection in Retrieval-Based Chatbots
Authors: Yu Wu, Wei Wu, Chen Xing, Can Xu, Zhoujun Li, and Ming Zhou
Abstract: The researchers studied the problem of response selection for multi-turn conversation in retrieval-based chatbots. The task involved matching a response candidate with a conversation context, the challenges for which include how to recognize important parts of the context, and how to model the relationships among utterances in the context.
Research Methodology: Using a new matching framework called sequential matching framework (SMF), the researchers proposed a sequential convolutional network and sequential attention network and conducted experiments on two public data sets to test their performance. Experiment results show that both models can significantly outperform state-of-the-art matching methods. The researchers also show that the models are interpretable with visualisations that provide us insights on how they capture and leverage important information in contexts for matching.
Access the research: A Sequential Matching Framework for Multi-Turn Response Selection in Retrieval-Based Chatbots
5. Parsing Chinese Sentences with Grammatical Relations
Authors: Weiwei Sun, Yufei Chen, Xiaojun Wan and Meichun Liu
Abstract: The research represents grammatical information using general directed dependency graphs. Both only-local and rich long-distance dependencies are explicitly represented.
Research methodology: To create high-quality annotations, the researchers took advantage of an existing TreeBank, namely, Chinese TreeBank (CTB), which is grounded on the Government and Binding theory. Two key problems as addressed by the researchers include (a) how to decompose a complex graph into simple subgraphs, and (b) how to combine subgraphs into a coherent complex graph. For transition-based parsing, the researchers introduced a neural parser based on a list-based transition system. They also discussed several other key problems, including dynamic oracle and beam search for neural transition-based parsing. The evaluation gauged how successful GR parsing for Chinese can be by applying data-driven models. The empirical analysis suggests several directions for future study.
Access the research: Parsing Chinese Sentences with Grammatical Relations