Let’s take a look at some of the top trending ML papers of 2022:
1) A ConvNet for the 2020s (Liu et al) — Vision Transformers took off this year but this work proposes ConvNeXt to reexamine the design spaces and test the limits of a pure ConvNet on several vision tasks. The ConvNets vs. Transformers debate continues.
2) Language Models as Zero-Shot Planners (Huang et al) — studies the possibility of grounding high-level tasks to actionable steps for embodied agents. Pre-trained LLMs are used to extract knowledge to perform common-sense grounding by planning actions.
3) OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework (Wang et al) — introduces a unified paradigm for effective multimodal pre-training that support all kinds of uni-modal and cross-modal tasks.
4) Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer (Yang et al) — proposes a new paradigm for more efficiently tuning large neural networks via zero-shot hyperparameter tuning.
5) OPT: Open Pre-trained Transformer Language Models (Zhang et al) — an open pre-trained transformer-based language model that follows other open-sourcing LLM efforts such as GPT-Neo; model sizes range from 125M to 175B parameters.
6) Gato: A Generalist Agent (DeepMind) — an agent built to work as a multi-modal, multi-task, multi-embodiment generalist policy; it performs all sorts of general tasks ranging from playing Atari to chatting to stacking blocks with a real robot arm.
7) Solving Quantitative Reasoning Problems with Language Models (Lewkowycz et al) — proposes Minerva, a large language model pretrained on general natural language data and further trained on technical content; evaluated on several tasks requiring quantitative reasoning.
8) No Language Left Behind (Meta AI) — introduces a massive translation model (NLLB-200), capable of translating between 200 languages.
9) Stable Diffusion (Rombach et al) — a text-to-image model to generate detailed images conditioned on text descriptions; can be applied to other tasks such as inpainting, outpainting, and generating image-to-image translations guided by a text prompt.
10) Whisper (OpenAI) — an open-source model called Whisper that approaches human-level robustness and accuracy in English speech recognition.
11) Make-A-Video (Singer et al) — introduces a state-of-the-art text-to-video model that can generate videos from a text prompt.
12) Galactica (Ross et al) — a large language model for the science domain trained on a massive scientific corpus.
—
Cross-posted from Twitter. For a curation of more exciting ML papers in 2023 find me on Twitter.
Great roundup! 2022 was packed with breakthrough ML papers—really helpful to see the key ones highlighted in one place.