EMNLP 2021 Quick Notes
Wanyu Du
Nov 21, 2021
Here are some quick notes for EMNLP 2021 papers, which focus on the evaluation of text generation tasks, few-shot learning for generation and classification tasks.
Evaluation
RESOURCES AND EVALUATION
- Visually Grounded Reasoning across Languages and Cultures (best long paper)
- Compression, Transduction, and Creation: A Unified Framework for Evaluating Natural Language Generation
- [video] [paper] [code]
- Categorize NLG tasks based on information change from input (\( X\)) to output (\(Y\)): (1) compression (\(X > Y\)), (2) transduction (\(X = Y\)), (3) creation (\(X < Y\)).
- Evaluate by measuring information alignment (between \(x\), \(y\), reference \(r\), and context \(c\)) for different NLG tasks.
- Evaluating the Evaluation Metrics for Style Transfer: A Case Study in Multilingual Formality Transfer
- The Perils of Using Mechanical Turk to Evaluate Open-Ended Text Generation
NLG (Generation in Low-data Settings)
EFFICIENT METHODS FOR NLG
- When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute (outstanding paper)
- [video] [paper] [code]
- Attention is NOT ALL we need, combine fast recurrence and attention: attention helps recurrence avoid information & gradient propagation issue, and recurrence helps attention remove multi-head and relative position.
- Parallelize the computation (linear project + attention) of state vector \(c_t\), forget gate \(f_t\) and reset gate \(r_t\), then recurrently compute hidden state \(h_t\).
- Much faster training speed, comparable test perplexity, and similar amount of parameters with Transformer-XL.
- Few-Shot Text Generation with Natural Language Instructions
- Smelting Gold and Silver for Improved Multilingual AMR-to-Text Generation
DIALOGUE AND INTERACTIVE SYSTEMS
- MindCraft: Theory of Mind Modeling for Situated Dialogue in Collaborative Tasks (outstanding paper)
- GOLD: Improving Out-of-Scope Detection in Dialogues using Data Augmentation
- [video] [paper] [code]
- Use data augmentation to improve the out-of-scope detection in dialogues.
- Find out-of-scope utterances from external related datasets, replace the original utterance with the external utterances, select the candidate which receives major votes from an ensemble of out-of-scope detectors.
- ConvFiT: Conversational Fine-Tuning of Pretrained Language Models
- Self-training Improves Pre-training for Few-shot Learning in Task-oriented Dialog Systems
QUESTION ANSWERING
- SituatedQA: Incorporating Extra-Linguistic Contexts into QA (outstanding paper)
- Surface Form Competition: Why the Highest Probability Answer Isn’t Always Right
- [video] [paper] [code]
- For zero-shot QA, GPT-3 may generate multiple valid answers, but those answers may not be one of the multiple choice options. Therefore, ranking by string probabilties can be problematic.
- Introduce domain conditional pointwise mutual information, which reweighs each option according to a term that is proportional to its a priori likelihood within the context of the zero-shot task, to score the multiple choice options.
- Synthetic Data Augmentation for Zero-Shot Cross-Lingual Question Answering
- Back-Training excels Self-Training at Unsupervised Domain Adaptation of Question Generation and Passage Retrieval
NLU (Sample Selection and Data Augmentation)
EFFICIENT METHODS FOR NLU
- Dynamic Knowledge Distillation for Pre-trained Language Models
- [video] [paper] [code]
- Dynamic Teacher Adoption: compute the prediction uncertainty (entropy) of the student model across all training data, dive the training data into high-uncertainty group and low-uncertainty group, and select the large teacher model for high-uncertainty group and the small teacher model for low-uncertainty group.
- Dynamic Data Selection: select informative instances in each training batch according to the prediction uncertainty (entropy) of the student model.
- Dynamic Objective Adjustment: adjust the weight of the aligment objective (matching student & teacher hidden states) with the prediction uncertainty (entropy) of the student model.
- HypMix: Hyperbolic Interpolative Data Augmentation
- Unsupervised Data Augmentation with Naive Augmentation and without Unlabeled Data
MACHINE LEARNING FOR NLU
- Active Learning by Acquiring Contrastive Examples
- [video] [paper] [code]
- Uncertainty: the predictive uncertainty, e.g. least confident data.
- Diversity: the heterogeneity in feature space, e.g. clustering.
- Contrastive examples: datapoints that are close in the model feature space (\(K\)-nearest neighbours), but the model produces different predictive likelihoods.
- Certified Robustness to Programmable Transformations in LSTMs
- Efficient Contrastive Learning via Novel Data Augmentation and Curriculum Learning
- STraTA: Self-Training with Task Augmentation for Better Few-shot Learning
- [video] [paper]
- Task augmentation: train an NLI data generator to produce synthetic in-domain NLI training examples.
- Self-training: initialize the teacher and student model with a strong auxiliary-task base model, then fine-tune the base model using the labeled target task data. At each iteration, use the teacher model to generate pesudo-labels for unlabeled in-domain examples and augment the original labeled target task data.
- Experiments show that using a strong base model and training on a broad distribution of pseudo-labeled data are key factors for successful deployment in NLP.