Hello to all 1,151 of you and welcome to the 166 who signed up in the past two days! If you haven’t yet, subscribe to follow my efforts to tell the story of modern AI:
Tomorrow is the anniversary of OpenAI’s “Aligning Language Models to Follow Instructions”, a seminal paper that I co-authored (just kidding – I had no reason to be listed as an author but Ryan is as kind as he is intelligent).
The technique pioneered by Ryan, Long, and team was used to create ChatGPT and understanding what was done is useful in appreciating the opportunity for startups building differentiated product experiences with AI.
What was done
First, they basically did GPT Cosplay. Humans pretended to be the AI and wrote “good” replies to different ways people may want to interact with the model. For example, a human would be shown a news article and asked to summarize it. That person would read the article and write a summary. All of these examples were then used to fine-tune a model.
They then had people look at multiple AI-generated replies for a given query and rate them from best to worst. “Here’s one way AI could respond, and another, and another, … which do you like best?” These ratings were used to train a second model, which predicts how “good” the AI-generated content is.
One model generates content and, almost like a coach, the other modeI evaluates how well it performed (“you did pretty good on this one!”). This feedback is then used to improve the model that generates content.
Results
The proof is in the pudding with ChatGPT. But the results from the original paper are profound: a model trained this way outperforms a model that’s 100X larger.
Why this matters
The amount of data used to achieve the results in the paper was relatively small. They had people write ~10,000 “good” responses and make ~30,000 ratings. And since the data was spread across a range of use-cases – from copywriting to Q&A, summarization to classification and others – there was an even smaller amount of data for any given use-case. This technique is obtainable for startups.
A product is the result of many opinionated decisions that need to be considered together to create the optimal user experience. This is one area where the difference between research and product is felt. What does a “good” response mean, absent from the product experience it’s enabling? Certainly a good summary for an app targeted at medical professionals would differ from what’s best for an app targeted at readers of the news. And the best summary for either would account for the product’s UI. Startups that care deeply about the user experience will care deeply about the data used to fine-tune their model.
Scale is already offering the infrastructure and human contractors to do this. Inevitably there will be robust open source tools for these pipelines. A small amount of data, distributed across a range of use-cases, led to a 100X improvement. As startups embrace this technique, in a focused and opinionated way, we’re going to see transformative products emerge.
Follow me on Twitter: @fraser
If you’re building a startup with AI, email me!