Weak-to-strong generalization

OpenAI · Thu, 14 Dec 2023 00:00:00 GMT

We present a new research direction for superalignment, together with promising initial results: can we leverage the generalization properties of deep learning to control strong models with weak super...