LG – 机器学习 CV – 计算机视觉 CL – 计算与语言

195次阅读

LG – 机器学习 CV – 计算机视觉 CL – 计算与语言

1、[CL] Navigating the Grey Area: Expressions of Overconfidence and Uncertainty in Language Models
2、[LG] EvoPrompting: Language Models for Code-Level Neural Architecture Search
3、[CL] In-Context Instruction Learning
4、[CL] Goal Driven Discovery of Distributional Differences via Language Descriptions
5、[CV] Layer Grafted Pre-training: Bridging Contrastive Learning And Masked Image Modeling For Label-Efficient Representations
[CL] ProofNet: Autoformalizing and Formally Proving Undergraduate-Level Mathematics
[LG] GAM Coach: Towards Interactive and User-centered Algorithmic Recourse
[CV] Monocular Depth Estimation using Diffusion Models
[CV] Generic-to-Specific Distillation of Masked Autoencoders

摘要：语言模型的过度自信与不确定表达、面向代码级神经架构搜索的语言模型、上下文指令学习、目标驱动的基于语言描述的分布差异发现、基于对比学习和掩码图像建模的标签高效表示、本科数学的自动形式化与形式化证明、交互式用户为中心的算法追索研究、基于扩散模型的单目深度估计、掩码自编码器的一般到特定蒸馏

1、[CL] Navigating the Grey Area: Expressions of Overconfidence and Uncertainty in Language Models

K Zhou, D Jurafsky, T Hashimoto
[Stanford University]

行走灰色地带: 语言模型的过度自信与不确定表达

要点:

语言模型(LM)缺乏生成和解释不确定表达的能力，而这对人工决策和沟通至关重要；
根据提示中使用的不确定表达方式，语言模型生成的准确性差异很大(最高可达80%)；
当语言模型被教导发出确定的表达而非不确定表达时，语言模型的校准会受到影响，而自然的确定表达会导致准确性的下降；
本文提供了一个分析不确定表达和语言模型间相互作用的框架，介绍了一种评估语言特征如何影响语言模型生成的类型学，并为自然语言生成和自然的不确定表达提供了建议。

一句话总结:
语言模型在解释和产生不确定表达方面存在困难，导致在构建能产生可信语言的模型方面面临挑战。

Despite increasingly fluent, relevant, and coherent language generation, major gaps remain between how humans and machines use language. We argue that a key dimension that is missing from our understanding of language models (LMs) is the model’s ability to interpret and generate expressions of uncertainty. Whether it be the weatherperson announcing a chance of rain or a doctor giving a diagnosis, information is often not black-and-white and expressions of uncertainty provide nuance to support human-decision making. The increasing deployment of LMs in the wild motivates us to investigate whether LMs are capable of interpreting expressions of uncertainty and how LMs’ behaviors change when learning to emit their own expressions of uncertainty. When injecting expressions of uncertainty into prompts (e.g., “I think the answer is…”), we discover that GPT3’s generations vary upwards of 80% in accuracy based on the expression used. We analyze the linguistic characteristics of these expressions and find a drop in accuracy when naturalistic expressions of certainty are present. We find similar effects when teaching models to emit their own expressions of uncertainty, where model calibration suffers when teaching models to emit certainty rather than uncertainty. Together, these results highlight the challenges of building LMs that interpret and generate trustworthy expressions of uncertainty.

https://arxiv.org/abs/2302.13439

2、[LG] EvoPrompting: Language Models for Code-Level Neural Architecture Search

A Chen, D M. Dohan, D R. So
[Google Brain & New York University]

EvoPrompting: 面向代码级神经架构搜索的语言模型

要点:

EvoPrompting 方法提高了语言模型的少样本/上下文能力，发现了新的和有竞争力的神经架构；
在 MNIST-1D 和 CLRS 算法推理基准任务上，EvoPrompting 优于人工设计的和单纯的少样本提示法；
EvoPrompting 具有足够的通用性，可以很容易地适用于搜索 NAS 以外的其他类型推理任务的解决方案；
未来的工作可以扩大 EvoPrompting 的规模，与更有竞争力的大规模架构进行比较，如 Transformer。

一句话总结:
EvoPrompting 将进化搜索(evolutionary search)与软提示微调(soft prompt-tuning)相结合，在各种机器学习任务中创建准确和高效的神经网络架构。

Given the recent impressive accomplishments of language models (LMs) for code generation, we explore the use of LMs as adaptive mutation and crossover operators for an evolutionary neural architecture search (NAS) algorithm. While NAS still proves too difficult a task for LMs to succeed at solely through prompting, we find that the combination of evolutionary prompt engineering with soft prompt-tuning, a method we term EvoPrompting, consistently finds diverse and high performing models. We first demonstrate that EvoPrompting is effective on the computationally efficient MNIST-1D dataset, where EvoPrompting produces convolutional architecture variants that outperform both those designed by human experts and naive few-shot prompting in terms of accuracy and model size. We then apply our method to searching for graph neural networks on the CLRS Algorithmic Reasoning Benchmark, where EvoPrompting is able to design novel architectures that outperform current state-of-the-art models on 21 out of 30 algorithmic reasoning tasks while maintaining similar model size. EvoPrompting is successful at designing accurate and efficient neural network architectures across a variety of machine learning tasks, while also being general enough for easy adaptation to other tasks beyond neural network design.

https://arxiv.org/abs/2302.14838

3、[CL] In-Context Instruction Learning

S Ye, H Hwang, S Yang, H Yun, Y Kim, M Seo
[KAIST & LG AI Research]

上下文指令学习

要点:

上下文指令学习(In-Context Instruction Learning，ICIL)极大地提高了预训练的和指令微调的大型语言模型的零样本任务泛化性能；
ICIL 用一个固定提示来评估所有任务，是一个跨任务演示的串联；
ICIL是对基于指令微调的补充，最强大的指令微调基线(text-davinci-003)也从ICIL中受益了9.3%；
ICIL的效果来自于学习指令中答案选择和示范标签间的对应关系，导致大型语言模型更好地关注指令。

一句话总结:
上下文指令学习(In-Context Instruction Learning，ICIL)极大提高了经过预训练和指导微调的大型语言模型的零样本任务泛化性能。

Instruction learning of Large Language Models (LLMs) has enabled zero-shot task generalization. However, instruction learning has been predominantly approached as a fine-tuning problem, including instruction tuning and reinforcement learning from human feedback, where LLMs are multi-task fine-tuned on various tasks with instructions. In this paper, we present a surprising finding that applying in-context learning to instruction learning, referred to as In-Context Instruction Learning (ICIL), significantly improves the zero-shot task generalization performance for both pretrained and instruction-fine-tuned models. One of the core advantages of ICIL is that it uses a single fixed prompt to evaluate all tasks, which is a concatenation of cross-task demonstrations. In particular, we demonstrate that the most powerful instruction-fine-tuned baseline (text-davinci-003) also benefits from ICIL by 9.3%, indicating that the effect of ICIL is complementary to instruction-based fine-tuning.

https://arxiv.org/abs/2302.14691

4、[CL] Goal Driven Discovery of Distributional Differences via Language Descriptions

R Zhong, P Zhang, S Li, J Ahn, D Klein, J Steinhardt
[UC Berkeley]

目标驱动的基于语言描述的分布差异发现

要点:

介绍了一个新任务D5，用于以目标驱动方式自动发现两个大型语料库间的差异；
提出一个元数据集 OpenD5，包括 675 个跨域开放性问题，使用统一的评价指标来评价 D5 系统；
语言模型可利用 D5 的目标和统一指标提出更多相关的、新的和重要的发现；
D5 系统可以在广泛的应用中发现以前未知的见解，包括讨论主题、政治立场和 NLP 模型的错误模式。

一句话总结:
提出一种新任务 D5，以目标驱动方式自动发现两个大型语料库之间的分布差异，引入一个元数据集 OpenD5，用统一评价指标来评价 D5 系统。

Mining large corpora can generate useful discoveries but is time-consuming for humans. We formulate a new task, D5, that automatically discovers differences between two large corpora in a goal-driven way. The task input is a problem comprising a research goal “comparing the side effects of drug A and drug B” and a corpus pair (two large collections of patients’ self-reported reactions after taking each drug). The output is a language description (discovery) of how these corpora differ (patients taking drug A “mention feelings of paranoia” more often). We build a D5 system, and to quantitatively measure its performance, we 1) contribute a meta-dataset, OpenD5, aggregating 675 open-ended problems ranging across business, social sciences, humanities, machine learning, and health, and 2) propose a set of unified evaluation metrics: validity, relevance, novelty, and significance. With the dataset and the unified metrics, we confirm that language models can use the goals to propose more relevant, novel, and significant candidate discoveries. Finally, our system produces discoveries previously unknown to the authors on a wide range of applications in OpenD5, including temporal and demographic differences in discussion topics, political stances and stereotypes in speech, insights in commercial reviews, and error patterns in NLP models.

https://arxiv.org/abs/2302.14233

5、[CV] Layer Grafted Pre-training: Bridging Contrastive Learning And Masked Image Modeling For Label-Efficient Representations

Z Jiang, Y Chen, M Liu…
[Microsoft & Texas A&M University & University of Texas at Austin]

Layer Grafted Pre-training: 基于对比学习和掩码图像建模的标签高效表示

要点:

Layer Grafted Pre-training 将对比学习和掩码图像建模连起来，以更好地实现表示学习；
根据不同的偏好，将 MIM 和 CL 损失分别移植到低层和高层；
顺序级联的方式在下游应用中带来了更理想的表示质量和优秀的标签效率；
与 MIM 和 CL 基线相比，Layer Grafted Pre-training 在少样本性能和线性评估方面取得了明显的改进。

一句话总结:
Layer Grafted Pre-training 将对比学习和掩码图像建模以连续级联的方式结合起来，以获得更好的标签效率和少样本性能。

Recently, both Contrastive Learning (CL) and Mask Image Modeling (MIM) demonstrate that self-supervision is powerful to learn good representations. However, naively combining them is far from success. In this paper, we start by making the empirical observation that a naive joint optimization of CL and MIM losses leads to conflicting gradient directions – more severe as the layers go deeper. This motivates us to shift the paradigm from combining loss at the end, to choosing the proper learning method per network layer. Inspired by experimental observations, we find that MIM and CL are suitable to lower and higher layers, respectively. We hence propose to combine them in a surprisingly simple, “sequential cascade” fashion: early layers are first trained under one MIM loss, on top of which latter layers continue to be trained under another CL loss. The proposed Layer Grafted Pre-training learns good visual representations that demonstrate superior label efficiency in downstream applications, in particular yielding strong few-shot performance besides linear evaluation. For instance, on ImageNet-1k, Layer Grafted Pre-training yields 65.5% Top-1 accuracy in terms of 1% few-shot learning with ViT-B/16, which improves MIM and CL baselines by 14.4% and 2.1% with no bells and whistles. The code is available at this https URL.

https://arxiv.org/abs/2302.14138

另外几篇值得关注的论文：

[CL] ProofNet: Autoformalizing and Formally Proving Undergraduate-Level Mathematics

Z Azerbayev, B Piotrowski, H Schoelkopf, E W. Ayers, D Radev, J Avigad
[Yale College & University of Warsaw & EleutherAI & CMU]

ProofNet: 本科数学的自动形式化与形式化证明

要点:

ProofNet是大学本科阶段数学自动形式化和形式化证明的基准，由371个例子组成，涵盖实数和复数分析、线性代数、抽象代数和拓扑学等主题；
非正式数学和正式数学之间缺乏平行数据，这意味着自动正规化缺乏标准基准来指导该领域的进展；
预训练的大型语言模型通过对 ProofNet 语句自动形式化的上下文学习，实现了非平凡但远非一致的性能；
提出了两种新的语句自动格式化方法，即提示检索和蒸馏回译，使自动格式化性能高于基线。

一句话总结:
ProofNet是大学本科阶段数学自动形式化和形式化证明的基准，由371个例子组成，涵盖实数和复数分析、线性代数、抽象代数和拓扑学等主题。

We introduce ProofNet, a benchmark for autoformalization and formal proving of undergraduate-level mathematics. The ProofNet benchmarks consists of 371 examples, each consisting of a formal theorem statement in Lean 3, a natural language theorem statement, and a natural language proof. The problems are primarily drawn from popular undergraduate pure mathematics textbooks and cover topics such as real and complex analysis, linear algebra, abstract algebra, and topology. We intend for ProofNet to be a challenging benchmark that will drive progress in autoformalization and automatic theorem proving. We report baseline results on statement autoformalization via in-context learning. Moreover, we introduce two novel statement autoformalization methods: prompt retrieval and distilled backtranslation.

https://arxiv.org/abs/2302.12433

[LG] GAM Coach: Towards Interactive and User-centered Algorithmic Recourse

Z J. Wang, J W Vaughan, R Caruana, D H Chau
[Georgia Tech & Microsoft Research]

GAM Coach: 交互式用户为中心的算法追索研究

要点:

GAM Coach 是一个交互式算法追索工具，使最终用户能指定偏好，并反复微调可操作的追索计划；
该工具适应整数线性编程，为广义加性模型(GAM)生成可定制的反事实解释；
利用交互可视化，使终端用户能发现满足其需求的满意的追索计划；
该工具是开源的、基于网络、可访问的，而且用户更喜欢个性化的追索计划而不是通用计划。

一句话总结:
GAM Coach 是一个交互式算法追索工具，使最终用户有能力迭代微调可操作的追索计划，增加机器学习模型的透明度。

Machine learning (ML) recourse techniques are increasingly used in high-stakes domains, providing end users with actions to alter ML predictions, but they assume ML developers understand what input variables can be changed. However, a recourse plan’s actionability is subjective and unlikely to match developers’ expectations completely. We present GAM Coach, a novel open-source system that adapts integer linear programming to generate customizable counterfactual explanations for Generalized Additive Models (GAMs), and leverages interactive visualizations to enable end users to iteratively generate recourse plans meeting their needs. A quantitative user study with 41 participants shows our tool is usable and useful, and users prefer personalized recourse plans over generic plans. Through a log analysis, we explore how users discover satisfactory recourse plans, and provide empirical evidence that transparency can lead to more opportunities for everyday users to discover counterintuitive patterns in ML models. GAM Coach is available at: this https URL.

https://arxiv.org/abs/2302.14165

[CV] Monocular Depth Estimation using Diffusion Models

S Saxena, A Kar, M Norouzi, D J. Fleet
[Google Research]

基于扩散模型的单目深度估计

要点:

DepthGen 是一种用于单目深度估计的扩散模型，基于自监督预训练和监督微调，在NYU基准上达到SOTA相对误差0.074；
使用L1损失、深度填充和分步去噪扩散，在含噪、不完整的深度数据上训练扩散模型；
展示了多模态深度推断和缺失深度分配，用于文本到3D生成和新视图合成。

一句话总结:
用去噪扩散模型进行单目深度估计的新方法，在具有挑战性的深度估计基准上取得了最先进的结果，实现了多模态深度推断和归纳。

We formulate monocular depth estimation using denoising diffusion models, inspired by their recent successes in high fidelity image generation. To that end, we introduce innovations to address problems arising due to noisy, incomplete depth maps in training data, including step-unrolled denoising diffusion, an L1 loss, and depth infilling during training. To cope with the limited availability of data for supervised training, we leverage pre-training on self-supervised image-to-image translation tasks. Despite the simplicity of the approach, with a generic loss and architecture, our DepthGen model achieves SOTA performance on the indoor NYU dataset, and near SOTA results on the outdoor KITTI dataset. Further, with a multimodal posterior, DepthGen naturally represents depth ambiguity (e.g., from transparent surfaces), and its zero-shot performance combined with depth imputation, enable a simple but effective text-to-3D pipeline. Project page: this https URL

https://arxiv.org/abs/2302.14816

[CV] Generic-to-Specific Distillation of Masked Autoencoders

W Huang, Z Peng, L Dong, F Wei, J Jiao, Q Ye
[Microsoft Research & University of Chinese Academy of Sciences]

掩码自编码器的一般到特定蒸馏

要点:

G2SD 是一种两阶段蒸馏方法，将任务无关和任务特定知识从大型预训练模型迁移到轻量 ViT 中；
一种简单有效的通用蒸馏策略，是通过让学生预测与可见图块和掩码图块预训练掩码自编码器隐特征对齐而设计的；
通过 G2SD，轻量学生模型在不同的视觉任务中取得了有竞争力的结果，将轻量 ViT 模型的性能提升到一个新高度；
实验表明，vanilla ViT-Small 模型在图像分类、目标检测和语义分割方面分别达到了其教师(ViT-Base)的 98.7%、98.1% 和 99.3% 的性能。

一句话总结:
G2SD 是一种两阶段蒸馏方法，将任务无关和任务特定的知识从大型预训练模型迁移到轻量 ViT，为两阶段视觉模型蒸馏设定了一个坚实的基线。

Large vision Transformers (ViTs) driven by self-supervised pre-training mechanisms achieved unprecedented progress. Lightweight ViT models limited by the model capacity, however, benefit little from those pre-training mechanisms. Knowledge distillation defines a paradigm to transfer representations from large (teacher) models to small (student) ones. However, the conventional single-stage distillation easily gets stuck on task-specific transfer, failing to retain the task-agnostic knowledge crucial for model generalization. In this study, we propose generic-to-specific distillation (G2SD), to tap the potential of small ViT models under the supervision of large models pre-trained by masked autoencoders. In generic distillation, decoder of the small model is encouraged to align feature predictions with hidden representations of the large model, so that task-agnostic knowledge can be transferred. In specific distillation, predictions of the small model are constrained to be consistent with those of the large model, to transfer task-specific features which guarantee task performance. With G2SD, the vanilla ViT-Small model respectively achieves 98.7%, 98.1% and 99.3% the performance of its teacher (ViT-Base) for image classification, object detection, and semantic segmentation, setting a solid baseline for two-stage vision distillation. Code will be available at this https URL.

https://arxiv.org/abs/2302.14771

正文完

可以使用微信扫码关注公众号（ID：xzluomor）