AISVIT / AI Video / Text to Video

Seedance 2.0 Text to Video Generator

Generate short videos from text with Seedance 2.0. Describe the action, mood, camera movement, and audio, then create a synchronized AI video in 480p or 720p.

About this model

Seedance 2.0 creates short videos from text, images, video clips, and audio — all in one pass with synchronized sound, stronger natural motion, and multimodal reference guidance for consistent subjects and style.

When is this model useful?

Use Seedance 2.0 when you need a short, expressive clip with believable motion and audio, or when you want to combine reference images, videos, and audio in one generation.

Best fit tasks

  • Short ad videos, product demos, teaser scenes, social posts, and campaign concept tests.
  • Text-to-video generation when you want to quickly explore story, atmosphere, camera movement, or scene direction.
  • Image-to-video animation from a product shot, portrait, illustration, or finished key visual.
  • Multimodal reference generations combining images, video clips, and audio files for outfit changes, product showcases, or music-synced content.
  • Character, dialogue, ambience, or sound-effect scenes where synchronized audio matters.

Main advantages

  • Audio and video are generated together, so speech, effects, ambience, and motion are better aligned.
  • The model is stronger on complex action such as dancing, sports, object interaction, and camera movement.
  • You can combine up to 9 reference images, 3 reference videos, and 3 reference audio files in one generation. Reference them in the prompt as [Image1], [Video1], [Audio1], etc.
  • It supports multiple frame shapes, including wide, square, vertical, and cinematic-wide formats, up to 1080p.

Limitations to know

  • Start or last-frame images cannot be combined with reference images in the same generation.
  • Reference audio files require at least one reference image or video to be provided alongside.
  • For predictable credit calculation, use a specific duration instead of automatic duration selection.
  • Using reference videos raises the credit rate (video_in pricing); text and image-only inputs use a lower rate.

How to use this model

Start with a clear scene prompt, then add reference files only when you need to lock the subject, style, motion, or audio.

Simple workflow

  1. Describe the subject, action, location, mood, lighting, camera movement, and any audio you want.
  2. For image-to-video, upload a start frame. If the ending matters, add a last-frame image.
  3. Use reference images (up to 9) to preserve a character, outfit, product, or style. Reference them in the prompt as [Image1], [Image2], etc.
  4. Add reference videos (up to 3) for motion transfer, video editing, or style reference. Reference them as [Video1], [Video2], etc.
  5. Add reference audio files (up to 3) for audio-driven generation or lip-sync. Reference them as [Audio1], etc. Requires at least one reference image or video.

Supported inputs

  • Required: a text prompt.
  • Optional: one start image for image-to-video generation.
  • Optional: one last-frame image to guide the ending (requires a start image; cannot be used with reference images).
  • Optional: up to 9 reference images for subject, product, or style consistency.
  • Optional: up to 3 reference video clips (total max 15 s) for motion transfer or video editing.

What you get

  • A generated MP4 video file.
  • 480p, 720p, or 1080p output depending on selected resolution.
  • Video with synchronized audio when Generate audio is enabled, or silent output when it is turned off.
  • A short clip suitable for ads, demos, presentations, and social content.

Other modes for this model

More Text to Video models

AISVIT pricing details

  • 480p without reference videos: 8 credits per second
  • 720p without reference videos: 18 credits per second
  • 1080p without reference videos: 45 credits per second
  • 480p with reference videos: 10 credits per second
  • 720p with reference videos: 22 credits per second