close
close

Multi-view video with 3D camera control stability AI

Today we publish a stable virtual camera that is currently in the research preview. This multi-view diffusion model transforms 2D images into immersive 3D videos with realistic depth and perspective without complex reconstruction or scene-specific optimization. We invite the research community to examine its skills and contribute to its development.

A virtual camera is a digital tool used for filmmaking and 3D animation to grasp and navigate digital scenes in real time. Stable virtual camera builds on this concept and combines the familiar control of traditional virtual cameras with the performance of the generative AI in order to offer precise, intuitive control over 3D video outputs.

In contrast to conventional 3D video models, which are dependent on large quantities of input pictures or complex pre -processing, a stable virtual camera creates new views of a scene from one or more input images in the custom camera arrangements. The model creates consistent and smooth 3D video outputs and provides seamless trajectory videos across dynamic camera paths.

The model is available for research under one Non -commercial license. You can read the paper HereDownload the weights Hugand grab the code Github.

skills

The stable virtual camera offers extended functions for generating 3D videos, including:

  • Dynamic camera control: Supports user-defined camera jewelry and several dynamic camera paths, including: 360 °, lemiscate (∞-shaped path), spiral, dolly zoom, dolly zoom out, zoom, zoom, move forward, go forward, ruin into the air, the pan left, the right pan and rolls.

  • Flexible inputs: Generates 3D videos from just one input picture or up to 32.

  • Several side conditions: Videos in Square (1: 1), portrait (9:16), landscape (16: 9) and other custom side conditions without additional training.

  • Long video: The 3D consistency in videos ensures up to 1,000 frames, whereby seamless loops and smooth transitions are made possible, even if the same points of view are revised.

Research and model architecture

The stable virtual camera achieves state-of-the-art results in NVS benchmarks (new view synthesis), outperforming models such as Viewcrafter and Cat3d. It is characterized by both NVS with large viewpoints, which emphasizes the generation capacity, as well as in NVS of the small view, which prioritizes the temporal smoothness.