Scale is All You Need?

Progress in artificial intelligence has come in the form of a bag of tricks, some of which have proven to be unreasonably effective: word embeddings as a way of representing semantic similarity; the attention mechanism for contextualization; self-supervised learning on curated datasets; instruction tuning of models; reinforcement learning based on human feedback. The key to making these tricks “unreasonably effective” has turned out to be scale, such as taking the number of parameters in models up by several orders of magnitude.

Despite the amazing recent progress, many applications will require further step function improvements in accuracy, reliability and creativity. Is even more scale all we need? This is one way to read Rich Sutton’s famous “The Bitter Lesson” essay. Will more scale give us the levels of accuracy and reliability, including absence of hallucinations, required for legal work or medical advice? Will more scale provide the creativity to create new scientific knowledge, such as novel proofs?

Or might we need some more tricks. And would those be entirely new tricks or ones we already know but haven’t fully integrated yet into our current bag. Another way to read the essay is simply as an exhortation to look for methods that can scale arbitrarily with computation (Sutton calls out search and learning). Of course it could also turn out that we need some tricks that don’t rely on scaling in quite the same way or at all.

Here is a list of known candidates that seem promising:

1. Learning in simulated environments

Humans learn a lot from living in the world. Robots exploring real or simulated worlds are learning more about causality. Simulation is already being used extensively in training models for autonomous vehicles. There is also work on more general simulation, such as ThreeDWorld out of MIT (original paper from 2020). Simulation can scale arbitrarily with computation and thus fits with Sutton’s conclusion. A recent fun and compelling result by Jim Fan is a robot dog walking on top of an exercise ball simply based on simulation.

2. Multi-agent learning

Much of human behavior and learning arises from interactions with other humans. Multiagent environments are showing promise for similar effects among models, as recently demonstrated in a sandbox inspired by The Sims (paper and code). Multi agent learning too can keep scaling with computation and also leverage simulation.

3. Reliable use of tools

One of the earliest departures for humans from (most) other species comes in our use of tools. Today very few people would attempt multiplying large numbers in their head. We would use a calculator instead. A promising direction is training models to reliably know when and how to use such tools (see for example the Toolformer paper). This feels less like a trick that can scale with computation but rather a way to reliably leverage all of our existing investment in computation.

4. Explicit structured knowledge

Prior to the breakthroughs with LLMs a lot of work had gone into explicit knowledge representations in the form of ontologies, knowledge graphs, rules-based-systems and the like. Accessing those kinds of systems could be seen as a specialized case of tool use but also seems to have additional potential. UnifiedSKG is one way to bring a lot of so-called structured knowledge grounding techniques into modes (paper and code). Sutton explicitly called out attempts to build knowledge into agents as having been outperformed by statistical methods. But human-machine collaboration might dramatically grow the extent of explicit knowledge.

An “unreasonably effective” AI generated video**

These are just four examples of tricks that we already know and that are being actively researched. There is a ton more to explore, including the effects of adding different types of memory, bringing in techniques from search, or wrapping calls to models in workflows.

All of this research is exciting because of the significant gains achieved by these “known tricks” even in early and still relatively crude versions. This progress raises the tantalizing possibility that we can get much further with current and next generation open source models (*). With the right “core” there could be an ecosystem akin to what we have with open source operating systems (that’s one way to interpret Andrej Karpathy’s tweet about an LLM OS).

Now it is also still possible that we will encounter some fundamental limitation with autoregressive models as the core building block. That would mean we need new tricks, some of which might be quite fundamental. We may, for example, have to find more complex objective functions for pre-training. That could work in the opposite direction where training might become more expensive.

Why does all of this matter? From a market structure and innovation perspective it would be fantastic if the cost of training runs doesn’t continue to explode, which would heavily favor a highly concentrated market dominated by a few closed source systems (they would be closed source to recoup the exploding cost of the training runs).

We are excited about our existing AI portfolio including Hume, Abridge, Clarifai, Casetext (already exited), and several other recent investments we have yet to announce. Based on the above, we also believe that a lot remains to be figured out. We are partial to composable open systems where these “tricks” don’t need to all be figured out by one company but can be contributed to by many. These systems might well be specific to a problem set, such as medical diagnosis or materials discovery.

* There have been a lot of important arguments from a safety and x-risk perspective against pushing open source systems further. This post can’t possibly address these but it is worth noting that many of the tricks mentioned above might allow the core system to do less, which would increase safety and reduce risk.

**The above video was made with almost no human input. We fed this blog post into Claude and used a single prompt to develop a corresponding trailer with a 30 second script, style guidelines, formatted b roll prompts, and a music track prompt. The results were copy and pasted into RunwayML, Suno, and Eleven Labs and then stitched together without edit. We took it upon ourselves to add a title at the end.

1 comment

0x4Da1...C657

12h

Salut ! Les progrès de l'IA sont vraiment fascinants, surtout quand on voit comment ces « trucs » deviennent si efficaces à grande échelle. Après avoir lu ton article, j'avais besoin d'une petite pause et j'ai fait un tour sur https://playjonny.ca/fr-ca . J'ai utilisé un boni pour les joueurs canadiens et j'ai essayé quelques tours. Au début, j'ai enchaîné les pertes, mais sur un tour, j'ai eu un peu plus de chance et j'ai décroché un beau gain. C'était une façon amusante de se changer les idées avant de se replonger dans la réflexion sur les paramètres des modèles et l'apprentissage des techniques. Parfois, une courte pause permet d'aborder des sujets complexes plus facilement.

More from Union Square Ventures

Union Square Ventures

Jan 27

2024 USV Core Fund

We recently started investing out of our newest USV Core Fund. As with each of our previous funds, while it is a new vehicle, our approach will stay the same: small fund, thesis driven, high conviction, and low velocity. We’ll focus on being long term and dedicated partners to a small number of teams creating projects and businesses that are aligned with our thesis. We’ll continue to commit once and then partner with the companies throughout their lifetimes. We run a collaborative partnership...

Union Square Ventures

Jan 23

Investing at the Edge of Large Markets Under Transformative Pressure

Union Square Ventures turns 20 this year.  Brad and Fred began to deploy the first USV fund in 2004. The dot com bubble had recently popped, mod...