SANA-WM, a 2.6B open-source world model for 1-minute 720p video

nvlabs.github.io

30 points by mjgil 1 hour ago

mejutoco 20 minutes ago

They all look like video games. I guess Unreal Engine is used to create synthetic data for training.

pferdone 40 minutes ago

First video with the guy walking the mountain in snow has consistency issues with the cave entrance. Which is "expected" at this model size?!

Leonard_of_Q 20 minutes ago

Most videos seem to have some issues like that, e.g. the book on the table in the library video takes up different shapes every now and then.
The 'Refiner' effect seems to do the opposite if the examples are representative as in all cases the 1-st stage images look better than the 'refined' ones. Less clutter, more realistic, less 'cowbell' for those who know the phrase.

Fischgericht 56 minutes ago

So, where is the download? I can't find it on Github, and on your web page the download button is disabled.

Also, will this run on RTX 4090 with 24GB memory?

Thank you!

mjgil 52 minutes ago

Scroll down and there are more videos --- seems like models will be there "soon".

bobkb 52 minutes ago

The trouble is the lack of training available to these models compared to the ones like Seedance and Kling who seems to be tapping into their unlimited video inventory. Many models like LTX is technically good but when it comes to slightly different camera movements or the subject interacting with objects they struggle. For a recent example we had to use sample videos generated by closed source models and then use the same for final video.

vessenes 47 minutes ago

I tend to think of these NV Labs models as architectural demos and ‘free razor blades’ — they’re more intended to inform internal R&D, get customers something that lets them do what they want quickly, and enhance the state of the art.
In this case, what looks interesting is the one minute coherence and the massive speedup - they claim 36x over open models with similar capabilities. You can tell they aren’t aiming for state of the art visuals — looks very SD 1.5 in terms of the output quality.

jaspanglia 1 hour ago

The most exciting part is that it’s open-source — innovation is going to compound fast.

rvz 1 hour ago

Given that is where everything is going, why not just get there faster by open-sourcing Seedance 2.0, Happyhorse, Veo 3 and all the others.