#gpt-2 — 1sec.ai

Fine-tuning GPT-2 from human preferences

Researchers fine-tuned a 774M parameter GPT-2 model using human feedback for various tasks. The model learned to match human preferences, but those preferences sometimes conflicted with the researchers' goals. For example, in summarization tasks, human labelers preferred verbatim copying from the input, resulting in the model learning to copy. This experiment shows that fine-tuning with human feedback can lead to unexpected behaviors.

Key takeaways

Fine-tuning GPT-2 with human feedback led to unexpected behaviors.
Human labelers preferred verbatim copying in summarization tasks.
60k human labels were required for summarization tasks.

OOpenAI#fine-tuning #human-preferences #gpt-2