Nlp Highlights

Nlp Highlights

Autor: Vários
Narrador: Vários
Editor: Podcast
Duración: 80:57:29
Mas informaciones

Sinopsis

Discussing recent and interesting work related to natural language processing. Matt Gardner and Waleed Ammar, research scientists at the Allen Institute for Artificial Intelligence, give short discussions of papers, mostly in interviews with authors about their work.

Show more

Episodios

44 - Truly Low Resource NLP, with Anders Søgaard

07/12/2017 Duración: 48min

Anders talks with us about his line of work on doing NLP in languages where you have no linguistic resources other than a Bible translation or other religious works. He and his students have developed methods for annotation projection for both part of speech tagging and dependency parsing, aggregating information from many languages to predict annotations for languages where you have no training data. We talk about low-resource NLP generally, then dive into the specifics of the annotation projection method that Anders used, also touching on a related paper on learning cross-lingual word embeddings. https://www.semanticscholar.org/paper/If-all-you-have-is-a-bit-of-the-Bible-Learning-POS-Agic-Hovy/812965ddce635174b33621aaaa551e5f6199b6c0 https://www.semanticscholar.org/paper/Multilingual-Projection-for-Parsing-Truly-Low-Reso-Agic-Johannsen/1414e3041f4cc3366b6ab49d1dbe9216632b9c78 https://www.semanticscholar.org/paper/Cross-Lingual-Dependency-Parsing-with-Late-Decodin-Schlichtkrull-S%C3%B8gaard/eda636e3abae82

Escucha
43 - Reinforced Video Captioning with Entailment Rewards, with Ramakanth and Mohit

04/12/2017 Duración: 47min

EMNLP 2017 paper by Ramakanth Pasunuru and Mohit Bansal Ram and Mohit join us to talk about their work, which uses reinforcement learning to improve performance on a video captioning task. They directly optimize CIDEr, a popular image/video captioning metric, using policy gradient methods, then use a modified version of CIDEr that penalizes the model when it fails to produce a caption that is _entailed_ by the correct caption. In our discussion, we hit on what video captioning is, what typical models look like for this task, and how the entailment-based reward function is similar to other attempts to be smart about handling paraphrases when evaluating or training language generation models. Unfortunately, due to some technical issues, the audio recording is a little worse than usual for this episode. Our apologies. https://www.semanticscholar.org/paper/Reinforced-Video-Captioning-with-Entailment-Reward-Pasunuru-Bansal/0d11977afa1a6ce90dc3b1f26694492c2ab04773

Escucha
42 - Generating Sentences by Editing Prototypes, with Kelvin Guu

30/11/2017 Duración: 38min

Paper is by Kelvin Guu, Tatsunori B. Hashimoto, Yonatan Oren, and Percy Liang In this episode, Kelvin tells us how to build a language model that starts from a prototype sentence instead of starting from scratch, enabling much more grammatical and diverse language modeling results. In the process, Kelvin gives us a really good intuitive explanation for how variational autoencoders work, we talk about some of the details of the model they used, and some of the implications of the work - can you use this for better summarization, or machine translation, or dialogue responses? https://www.semanticscholar.org/paper/Generating-Sentences-by-Editing-Prototypes-Guu-Hashimoto/d94d2a9c615b5359ec7d63b1379f9896c48a713f

Escucha
41 - Cross-Sentence N-ary Relation Extraction with Graph LSTMs, with Nanyun (Violet) Peng

10/11/2017 Duración: 34min

TACL 2017 paper, by Nanyun Peng, Hoifung Poon, Chris Quirk, Kristina Toutanova, and Wen-tau Yih. Most relation extraction work focuses on binary relations, like (Seattle, located in, Washington), because extracting n-ary relations is difficult. Nanyun (Violet) and her colleagues came up with a model to extract n-ary relations, focusing on drug-mutation-gene interactions, using graph LSTMs (a construct pretty similar to graph CNNs, which was developed around the same time). Nanyun comes on the podcast to tell us about her work. https://www.semanticscholar.org/paper/Cross-Sentence-N-ary-Relation-Extraction-with-Grap-Peng-Poon/03a2f871cc841e8047ab3291806dc301c5144bec

Escucha
40 - On the State of the Art of Evaluation in Neural Language Models, with Gábor Melis

07/11/2017 Duración: 29min

Recent arxiv paper by Gábor Melis, Chris Dyer, and Phil Blunsom. Gábor comes on the podcast to tell us about his work. He performs a thorough comparison between vanilla LSTMs and recurrent highway networks on the language modeling task, showing that when both methods are given equal amounts of hyperparameter tuning, LSTMs perform better, in contrast to prior work claiming that recurrent highway networks perform better. We talk about parameter tuning, training variance, language model evaluation, and other related issues. https://www.semanticscholar.org/paper/On-the-State-of-the-Art-of-Evaluation-in-Neural-La-Melis-Dyer/2397ce306e5d7f3d0492276e357fb1833536b5d8

Escucha
39 - Organizing the SemEval task on scientific information extraction, with Isabelle Augenstein

01/11/2017 Duración: 31min

Isabelle Augenstein was the lead organizer of SemEval 2017 task 10, on extracting keyphrases and relations from scientific publications. In this episode we talk about her experience organizing the task, how the task was set up, and what the result of the task was. We also talk about some related work Isabelle did on multi-task learning for keyphrase boundary detection. https://www.semanticscholar.org/paper/SemEval-2017-Task-10-ScienceIE-Extracting-Keyphras-Augenstein-Das/71007219617d0f5e2419c5c1ab1a0d6d0bc40b7e https://www.semanticscholar.org/paper/Multi-Task-Learning-of-Keyphrase-Boundary-Classifi-Augenstein-S%C3%B8gaard/4a0db09d0c19dfeb78900164d46d4b06cd3fc9f3

Escucha
38 - A Corpus of Natural Language for Visual Reasoning, with Alane Suhr

30/10/2017 Duración: 23min

ACL 2017 best resource paper, by Alane Suhr, Mike Lewis, James Yeh, and Yoav Artzi Alane joins us on the podcast to tell us about the dataset, which contains images paired with natural language descriptions of the images, where the task is to decide whether the description is true or false. Alane tells us about the motivation for creating the new dataset, how it was constructed, the way they elicited complex language from crowd workers, and why the dataset is an interesting target for future research. https://www.semanticscholar.org/paper/A-Corpus-of-Natural-Language-for-Visual-Reasoning-Suhr-Lewis/633453fb633c3c8695f3cd0e6b5350e971058bed

Escucha
37 - On Statistical Significance, Training Variance, and Why Reporting Score Distributions Matters

24/10/2017 Duración: 12min

In this episode we talk about a couple of recent papers that get at the issue of training variance, and why we should not just take the max from a training distribution when reporting results. Sadly, our current focus on performance in leaderboards only exacerbates these issues, and (in my opinion) encourages bad science. Papers: https://www.semanticscholar.org/paper/Reporting-Score-Distributions-Makes-a-Difference-P-Reimers-Gurevych/0eae432f7edacb262f3434ecdb2af707b5b06481 https://www.semanticscholar.org/paper/Deep-Reinforcement-Learning-that-Matters-Henderson-Islam/90dad036ab47d683080c6be63b00415492b48506

Escucha
36 - Attention Is All You Need, with Ashish Vaswani and Jakob Uszkoreit

23/10/2017 Duración: 41min

NIPS 2017 paper. We dig into the details of the Transformer, from the "attention is all you need" paper. Ashish and Jakob give us some motivation for replacing RNNs and CNNs with a more parallelizable self-attention mechanism, they describe how this mechanism works, and then we spend the bulk of the episode trying to get their intuitions for _why_ it works. We discuss the positional encoding mechanism, multi-headed attention, trying to use these ideas to replace encoders in other models, and what the self-attention actually learns. Turns out that the lower layers learn something like n-grams (similar to CNNs), and the higher layers learn more semantic-y things, like coreference. https://www.semanticscholar.org/paper/Attention-Is-All-You-Need-Vaswani-Shazeer/0737da0767d77606169cbf4187b83e1ab62f6077 Minor correction: Talking about complexity equations without the paper in front of you can be tricky, and Ashish and Jakob may have gotten some of the details slightly wrong when we were discussing computation

Escucha
35 - Replicability Analysis for Natural Language Processing, with Roi Reichart

19/10/2017 Duración: 31min

TACL 2017 paper by Rotem Dror, Gili Baumer, Marina Bogomolov, and Roi Reichart. Roi comes on to talk to us about how to make better statistical comparisons between two methods when there are multiple datasets in the comparison. This paper shows that there are more powerful methods available than the occasionally-used Bonferroni correction, and using the better methods can let you make stronger, statistically-valid conclusions. We talk a bit also about how the assumptions you make about your data can affect the statistical tests that you perform, and briefly mention other issues in replicability / reproducibility, like training variance. https://www.semanticscholar.org/paper/Replicability-Analysis-for-Natural-Language-Proces-Dror-Baumer/fa5129ab6fd85f8ff590f9cc8a39139e9dfa8aa2

Escucha
34 - Translating Neuralese, with Jacob Andreas

17/10/2017 Duración: 32min

ACL 2017 paper by Jacob Andreas, Anca D. Dragan, and Dan Klein. Jacob comes on to tell us about the paper. The paper focuses on multi-agent dialogue tasks, where two learning systems need to figure out a way to communicate with each other to solve some problem. These agents might be figuring out communication protocols that are very different from what humans would come up with in the same situation, and Jacob introduces some clever ways to figure out what the learned communication protocol looks like - you find human messages that induce the same beliefs in the listener as the robot messages. Jacob tells us about this work, and we conclude with a brief discussion of the more general issue of interpreting neural models. https://www.semanticscholar.org/paper/Translating-Neuralese-Andreas-Dragan/49612dc348ce953027bb4aba95adad0c703d76d1

Escucha
33 - Entity Linking via Joint Encoding of Types, Descriptions, and Context, with Nitish Gupta

16/10/2017 Duración: 24min

EMNLP 2017 paper by Nitish Gupta, Sameer Singh, and Dan Roth. Nitish comes on to talk to us about his paper, which presents a new entity linking model that both unifies prior sources of information into a single neural model, and trains that model in a domain-agnostic way, so it can be transferred to new domains without much performance degradation. https://www.semanticscholar.org/paper/Entity-Linking-via-Joint-Encoding-of-Types-Descrip-Gupta-Singh/a66b6a3ac0aa9af6c178c1d1a4a97fd14a882353

Escucha
32 - The Effect of Different Writing Tasks on Linguistic Style, with Roy Schwartz

10/10/2017 Duración: 24min

CoNLL 2017 paper, by Roy Schwartz, Maarten Sap, Ioannis Konstas, Leila Zilles, Yejin Choi, and Noah A. Smith. Roy comes on to talk to us about the paper. They analyzed the ROCStories corpus, which was created with three separate tasks on Mechanical Turk. They found that there were enough stylistic differences between the text generated from each task that they could get very good performance on the ROCStories cloze task just by looking at the style, ignoring the information you're supposed to use to solve the task. Roy talks to us about this finding, and about how hard it is to generate datasets that don't have some kind of flaw (hint: they all have problems). https://www.semanticscholar.org/paper/The-Effect-of-Different-Writing-Tasks-on-Linguisti-Schwartz-Sap/1a697d7cf187e51d5ccc23eb3ee5d2950ece5522

Escucha
31 - Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling

06/10/2017 Duración: 11min

ICLR 2017 paper by Hakan Inan, Khashayar Khosravi, Richard Socher, presented by Waleed. The paper presents some tricks for training better language models. It introduces a modified loss function for language modeling, where producing a word that is similar to the target word is not penalized as much as producing a word that is very different to the target (I've seen this in other places, e.g., image classification, but not in language modeling). They also give theoretical and empirical justification for tying input and output embeddings. https://www.semanticscholar.org/paper/Tying-Word-Vectors-and-Word-Classifiers-A-Loss-Fra-Inan-Khosravi/424aef7340ee618132cc3314669400e23ad910ba

Escucha
30 - Probabilistic Typology: Deep Generative Models of Vowel Inventories

05/10/2017 Duración: 31min

Paper by Ryan Cotterell and Jason Eisner, presented by Matt. This paper won the best paper award at ACL 2017. It's also quite outside the typical focus areas that you see at NLP conferences, trying to build generative models of vowel vocabularies in languages. That means we give quite a bit of set up, to try to help someone not familiar with this area understand what's going on. That makes this episode quite a bit longer than a typical non-interview episode. https://www.semanticscholar.org/paper/Probabilistic-Typology-Deep-Generative-Models-of-V-Cotterell-Eisner/6fad97c4fe0cfb92478d8a17a4e6aaa8637d8222

Escucha
29 - Neural machine translation via binary code prediction, with Graham Neubig

14/07/2017 Duración: 38min

ACL 2017 paper, by Yusuke Oda and others (including Graham Neubig) at Nara Institute of Science and Technology (Graham is now at Carnegie Mellon University). Graham comes on to talk to us about neural machine translation generally, and about this ACL paper in particular. We spend the first half of the episode talking about major milestones in neural machine translation and why it is so much more effective than previous methods (spoiler: stronger language models help a lot). We then talk about the specifics of binary code prediction, how it's related to a hierarchical or class-factored softmax, and how to make it robust to off-by-one-bit errors. Paper link: https://www.semanticscholar.org/paper/Neural-Machine-Translation-via-Binary-Code-Predict-Oda-Arthur/bbedfd0380eb2e62f1c3b61aaf484d5867e6358d An example of the Language log posts that we discussed: http://languagelog.ldc.upenn.edu/nll/?p=33613 (there are many more).

Escucha
28 - Data Programming: Creating Large Training Sets, Quickly

11/07/2017 Duración: 25min

NIPS 2016 paper by Alexander Ratner and coauthors in Chris Ré's group at Stanford, presented by Waleed. The paper presents a method for generating labels for an unlabeled dataset by combining a number of weak labelers. This changes the annotation effort from looking at individual examples to constructing a large number of noisy labeling heuristics, a task the authors call "data programming". Then you learn a model that intelligently aggregates information from the weak labelers to create a weighted "supervised" training set. We talk about this method, how it works, how it's related to ideas like co-training, and when you might want to use it. https://www.semanticscholar.org/paper/Data-Programming-Creating-Large-Training-Sets-Quic-Ratner-Sa/37acbbbcfe9d8eb89e5b01da28dac6d44c3903ee

Escucha
27 - What do Neural Machine Translation Models Learn about Morphology?, with Yonatan Belinkov

05/07/2017 Duración: 29min

ACL 2017 paper by Yonatan Belinkov and others at MIT and QCRI. Yonatan comes on to tell us about their work. They trained a neural MT system, then learned models on top of the NMT representation layers to do morphology tasks, trying to probe how much morphological information is encoded by the MT system. We talk about the specifics of their model and experiments, insights they got from doing these experiments, and how this work relates to other work on representation learning in NLP. https://www.semanticscholar.org/paper/What-do-Neural-Machine-Translation-Models-Learn-ab-Belinkov-Durrani/37ac87ccea1cc9c78a0921693dd3321246e5ef07

Escucha
26 - Structured Attention Networks, with Yoon Kim

30/06/2017 Duración: 25min

ICLR 2017 paper, by Yoon Kim, Carl Denton, Luong Hoang, and Sasha Rush. Yoon comes on to talk with us about his paper. The paper shows how standard attentions can be seen as an expected feature count computation, and can be generalized to other kinds of expected feature counts, as long as we have efficient, differentiable algorithms for computing those marginals, like the forward-backward and inside-outside algorithms. We talk with Yoon about how this works, the experiments they ran to test this idea, and interesting implications of their work. https://www.semanticscholar.org/paper/Structured-Attention-Networks-Kim-Denton/0aec1745d0e054e8d86d21b20d0ee5fc0d932a49 Yoon also brought up a more recent paper by Yang Liu and Mirella Lapata that computes a very similar kind of structured attention, but does so much more efficiently. That paper is here: https://www.semanticscholar.org/paper/Learning-Structured-Text-Representations-Liu-Lapata/4435c3586364e8f8a2c8c9ee671c39d7df7e196c.

Escucha
25 - Neural Semantic Parsing over Multiple Knowledge-bases

28/06/2017 Duración: 10min

ACL 2017 short paper, by Jonathan Herzig and Jonathan Berant. This is a nice, obvious-in-hindsight paper that applies a frustratingly-easy-domain-adaptation-like approach to semantic parsing, similar to the multi-task semantic dependency parsing approach we talked to Noah Smith about recently. Because there is limited training data available for complex logical constructs (like argmax, or comparatives), but the mapping from language onto these constructions is typically constant across domains, domain adaptation can give a nice, though somewhat small, boost in performance. NB: I felt like I struggled a bit with describing this clearly. Not my best episode. Hopefully it's still useful. https://www.semanticscholar.org/paper/Neural-Semantic-Parsing-over-Multiple-Knowledge-ba-Herzig-Berant/6611cf821f589111adfc0a6fbb426fa726f4a9af

Escucha

|<
<<
>>
>|

página 6 de 8