AISVIT / AI Video / Text to Video
Seedance 2.0 Text to Video Generator
Generate short videos from text with Seedance 2.0. Describe the action, mood, camera movement, and audio, then create a synchronized AI video in 480p or 720p.
About this model
Seedance 2.0 creates short videos from text, images, video clips, and audio — all in one pass with synchronized sound, stronger natural motion, and multimodal reference guidance for consistent subjects and style.
When is this model useful?
Use Seedance 2.0 when you need a short, expressive clip with believable motion and audio, or when you want to combine reference images, videos, and audio in one generation.
Best fit tasks
- Short ad videos, product demos, teaser scenes, social posts, and campaign concept tests.
- Text-to-video generation when you want to quickly explore story, atmosphere, camera movement, or scene direction.
- Image-to-video animation from a product shot, portrait, illustration, or finished key visual.
- Multimodal reference generations combining images, video clips, and audio files for outfit changes, product showcases, or music-synced content.
- Character, dialogue, ambience, or sound-effect scenes where synchronized audio matters.
Main advantages
- Audio and video are generated together, so speech, effects, ambience, and motion are better aligned.
- The model is stronger on complex action such as dancing, sports, object interaction, and camera movement.
- You can combine up to 9 reference images, 3 reference videos, and 3 reference audio files in one generation. Reference them in the prompt as [Image1], [Video1], [Audio1], etc.
- It supports multiple frame shapes, including wide, square, vertical, and cinematic-wide formats, up to 1080p.
Limitations to know
- Start or last-frame images cannot be combined with reference images in the same generation.
- Reference audio files require at least one reference image or video to be provided alongside.
- For predictable credit calculation, use a specific duration instead of automatic duration selection.
- Using reference videos raises the credit rate (video_in pricing); text and image-only inputs use a lower rate.
How to use this model
Start with a clear scene prompt, then add reference files only when you need to lock the subject, style, motion, or audio.
Simple workflow
- Describe the subject, action, location, mood, lighting, camera movement, and any audio you want.
- For image-to-video, upload a start frame. If the ending matters, add a last-frame image.
- Use reference images (up to 9) to preserve a character, outfit, product, or style. Reference them in the prompt as [Image1], [Image2], etc.
- Add reference videos (up to 3) for motion transfer, video editing, or style reference. Reference them as [Video1], [Video2], etc.
- Add reference audio files (up to 3) for audio-driven generation or lip-sync. Reference them as [Audio1], etc. Requires at least one reference image or video.
Supported inputs
- Required: a text prompt.
- Optional: one start image for image-to-video generation.
- Optional: one last-frame image to guide the ending (requires a start image; cannot be used with reference images).
- Optional: up to 9 reference images for subject, product, or style consistency.
- Optional: up to 3 reference video clips (total max 15 s) for motion transfer or video editing.
What you get
- A generated MP4 video file.
- 480p, 720p, or 1080p output depending on selected resolution.
- Video with synchronized audio when Generate audio is enabled, or silent output when it is turned off.
- A short clip suitable for ads, demos, presentations, and social content.
Other modes for this model
More Text to Video models
AISVIT pricing details
- 480p without reference videos: 8 credits per second
- 720p without reference videos: 18 credits per second
- 1080p without reference videos: 45 credits per second
- 480p with reference videos: 10 credits per second
- 720p with reference videos: 22 credits per second