Let’s take a look at some of the top trending ML papers of 2022:
1) A ConvNet for the 2020s (Liu et al) — Vision Transformers took off this year but this work proposes ConvNeXt to reexamine the design spaces and test the limits of a pure ConvNet on several vision tasks. The ConvNets vs. Transformers debate continues.
2) Language Models as Zero-Shot Planners (Huang et al) — studies the possibility of grounding high-level tasks to actionable steps for embodied agents. Pre-trained LLMs are used to extract knowledge to perform common-sense grounding by planning actions.
3) OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework (Wang et al) — introduces a unified paradigm for effective multimodal pre-training that support all kinds of uni-modal and cross-modal tasks.
4) Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer (Yang et al) — proposes a new paradigm for more efficiently tuning large neural networks via zero-shot hyperparameter tuning.
5) OPT: Open Pre-trained Transformer Language Models (Zhang et al) — an open pre-trained transformer-based language model that follows other open-sourcing LLM efforts such as GPT-Neo; model sizes range from 125M to 175B parameters.
6) Gato: A Generalist Agent (DeepMind) — an agent built to work as a multi-modal, multi-task, multi-embodiment generalist policy; it performs all sorts of general tasks ranging from playing Atari to chatting to stacking blocks with a real robot arm.
7) Solving Quantitative Reasoning Problems with Language Models (Lewkowycz et al) — proposes Minerva, a large language model pretrained on general natural language data and further trained on technical content; evaluated on several tasks requiring quantitative reasoning.
8) No Language Left Behind (Meta AI) — introduces a massive translation model (NLLB-200), capable of translating between 200 languages.
9) Stable Diffusion (Rombach et al) — a text-to-image model to generate detailed images conditioned on text descriptions; can be applied to other tasks such as inpainting, outpainting, and generating image-to-image translations guided by a text prompt.
10) Whisper (OpenAI) — an open-source model called Whisper that approaches human-level robustness and accuracy in English speech recognition.
11) Make-A-Video (Singer et al) — introduces a state-of-the-art text-to-video model that can generate videos from a text prompt.
12) Galactica (Ross et al) — a large language model for the science domain trained on a massive scientific corpus.
—
Cross-posted from Twitter. For a curation of more exciting ML papers in 2023 find me on Twitter.