AISVIT / AI Video / Image to Video

Veo 3.1 — Image to Video

Image to Video with Veo 3.1 in AISVIT. Animate still images into dynamic videos with AI. Add camera motion, subject movement, and cinematic transitions from a single source image.

About this model

Premium Google model for 4-8 second text-to-video and image-to-video clips with native synchronized audio, 720p/1080p output, and reference image support.

When is this model useful?

Veo 3.1 is strongest when you need a short clip that feels polished, realistic, and tightly aligned to the prompt, with sound generated together with the visuals.

Best fit tasks

Text-to-video clips for ads, brand storytelling, product demos, mood films, and short cinematic scenes.
Image-to-video animation when you want to bring a photo, illustration, concept frame, or product visual to life without manual motion design.
Scenes where speech, ambience, or sound effects matter, because the model can generate synchronized audio together with the video.
Content that needs stronger visual consistency for a character or object by using 1-3 reference images.

Main advantages

Veo 3.1 follows detailed prompts well, including style, camera movement, lighting, mood, and audio cues.
This integration supports 720p and 1080p output, plus 16:9 and 9:16 formats for websites, Shorts, Reels, or TikTok.
You can animate from a start image and optionally guide the ending with a last frame for smoother visual transitions.
Reference images help the model hold onto subject appearance, style, and key details more reliably.

Limitations to know

The model is built for short generations only: you can choose 4, 6, or 8 seconds per render.
According to Google's documentation, English prompts are the safest choice for the most predictable results.
Reference image mode only works with 16:9 and 8-second duration; when reference images are used, the last frame is ignored.
This is a premium model, so pricing is higher than faster or lighter alternatives, and enabling audio doubles the per-second rate.

How to use this model

The best workflow for Veo 3.1 is to start with a clear scene description, then add only the controls that actually improve the result.

Simple workflow

Write the prompt in plain language. For the most stable results, use English and describe the subject, action, setting, style, lighting, camera movement, and any sounds you want in the scene.
Choose a duration of 4, 6, or 8 seconds. For ad concepts or teasers, 4-6 seconds is often enough, while 8 seconds gives the action more room to develop.
Select 16:9 for wide output or 9:16 for vertical content. If you use reference images, keep the setup at 16:9 and 8 seconds.
Pick 720p for faster iterations or 1080p when you want a more presentation-ready final clip.
Turn on Generate audio if you want the model to create sound together with the video, such as speech, ambience, environmental noise, or simple effects.

Supported inputs

Required: a text prompt.
Optional: one start image for image-to-video generation.
Optional: one last-frame image to guide the ending transition.
Optional: 1 to 3 reference images for subject-consistent generation.
In the AISVIT upload flow, standard image formats such as JPG, PNG, and WEBP are the safest choice.

What you get

A generated MP4 video file.
24 frames per second output.
A 4, 6, or 8 second video.
720p or 1080p resolution.
Video with audio when Generate audio is enabled, or silent output when it is turned off.

Other workflows for this model

Text to Video

AISVIT pricing details

Without audio: 20 credits per second
With audio: 40 credits per second