Much prior work has studied generative modeling of video data using a variety of methods, including recurrent networks,(1,)(2,)(3) generative adversarial networks,(4,)(5,)(6,)(7) autoregressive transformers,(8,)(9) and diffusion models.(10,)(11,)(12) These works often focus on a narrow category of visual data, on shorter videos, or on videos of a fixed size. Sora is a generalist model of visual data—it can generate videos and images spanning diverse durations, aspect ratios and resolutions, up to a full minute of high definition video.