Interface VideoProjectParams

Video-specific parameters for video workflows (t2v, i2v, s2v, ia2v, a2v, animate). Only applicable when using video models like wan_v2.2-14b-fp8_t2v or ltx23-22b-fp8_t2v_distilled. Includes frame count, fps, shift, and reference assets (image, audio, video).

  • Always generate video at 16fps internally
  • The fps parameter (16 or 32) only controls post-render frame interpolation
  • fps=32 doubles the frames via interpolation after generation
  • Frame count is always calculated as: duration * 16 + 1
  • Example: 5 seconds at 32fps = 81 frames generated, then interpolated to 161 output frames
  • Generate video at the actual specified FPS (1-60 fps range)
  • No post-render interpolation - fps directly affects generation
  • Frame count is calculated as: duration * fps + 1
  • Frame count must follow the pattern: 1 + n*8 (i.e., 1, 9, 17, 25, 33, ...)
  • Example: 5 seconds at 24fps = 121 frames (since 121 = 1 + 15*8)
  • External API-backed video models for text-to-video, image-to-video, multimodal reference generation, image+audio-to-video, and video-to-video
  • Generate at fixed 24fps
  • Direct SDK project duration range is 4 to 15 seconds
  • Frame count is calculated as: duration * 24 + 1
  • Vendor reference limits are 9 images, 3 videos, 3 audios, and 12 asset files total
interface VideoProjectParams {
    appSource?: string;
    audioDuration?: number;
    audioIdentityStrength?: number;
    audioStart?: number;
    controlNet?: VideoControlNetParams;
    detailerStrength?: number;
    disableNSFWFilter?: boolean;
    duration?: number;
    firstFrameStrength?: number;
    fps?: number;
    frames?: number;
    generateAudio?: boolean;
    guidance?: number;
    height?: number;
    lastFrameStrength?: number;
    loras?: string[];
    loraStrengths?: number[];
    modelId: string;
    negativePrompt?: string;
    network?: SupernetType;
    numberOfMedia: number;
    outputFormat?: "mp4";
    positivePrompt: string;
    referenceAudio?: InputMedia;
    referenceAudioIdentity?: InputMedia;
    referenceAudioUrls?: string[];
    referenceImage?: InputMedia;
    referenceImageEnd?: InputMedia;
    referenceImageUrls?: string[];
    referenceVideo?: InputMedia;
    referenceVideoUrls?: string[];
    sam2Coordinates?: { x: number; y: number }[];
    sampler?: string;
    scheduler?: string;
    seed?: number;
    shift?: number;
    steps?: number;
    stylePrompt?: string;
    teacacheThreshold?: number;
    tokenType?: TokenType;
    trimEndFrame?: boolean;
    type: "video";
    videoStart?: number;
    width?: number;
}

Hierarchy (View Summary)

Properties

appSource?: string

Optional client app/source label to attach to the project request for server-side attribution.

audioDuration?: number

Audio duration in seconds for audio-driven workflows (s2v, ia2v, a2v). Specifies how many seconds of audio to use. If not provided, defaults to 30 seconds on the server.

audioIdentityStrength?: number

Controls how strongly the speaker's vocal identity is applied. Uses an extra forward pass per denoising step to amplify identity features. Range: 0-10. Default: 3.0. Set to 0 to disable (skips extra forward pass). Only used when referenceAudioIdentity is provided.

audioStart?: number

Audio start position in seconds for audio-driven workflows (s2v, ia2v, a2v). Specifies where to begin reading from the audio file. Default: 0

ControlNet parameters for LTX-2.3 v2v workflows. Specifies which control signal to extract from the reference video.

detailerStrength?: number

Detailer LoRA strength for LTX-2.3 v2v IC-Control workflows. The detailer LoRA is always loaded alongside the control LoRA (canny/pose/depth). Range: 0.0-1.0, default 0.6.

disableNSFWFilter?: boolean

Disable NSFW filter for Project. Default is false, meaning NSFW filter is enabled. If image triggers NSFW filter, it will not be available for download.

duration?: number

Duration of the video in seconds. Supported range 1 to 10 (WAN), 4 to 20 (LTX-2.3), or 4 to 15 (Seedance direct SDK projects).

The SDK automatically calculates the correct frame count based on the model:

  • WAN 2.2: duration * 16 + 1 (always 16fps generation)
  • LTX-2.3: duration * fps + 1, snapped to frame step constraint
  • Seedance: duration * 24 + 1
firstFrameStrength?: number

First frame strength for LTX-2.3 keyframe interpolation (when referenceImageEnd is provided). Controls how strictly the first frame is matched. Range: 0.0-1.0, default 0.6. Set to 0 to disable first frame (last-frame-only mode).

fps?: number

Frames per second for output video.

WAN 2.2 Models: Only 16 or 32 fps allowed. The 32fps option is post-render frame interpolation that doubles the output frames. Internal generation is always 16fps.

LTX-2.3 Models: Any value from 1-60 fps. This directly controls the generation frame rate - there is no post-render interpolation.

Seedance Models: Fixed 24fps external API generation.

frames?: number

Number of frames to generate.

Use duration instead. When using duration, the SDK automatically calculates the correct frame count based on the model type.

generateAudio?: boolean

Enable native audio generation for external API-backed video models that support it. Seedance defaults to audio enabled server-side; set to false to request a silent video.

guidance?: number

Guidance scale. For most Stable Diffusion models, optimal value is 7.5. For video models: Regular models range 0.7-8.0, LoRA version (lightx2v) range 0.7-1.6, step 0.01. This maps to guidanceScale in the keyFrame for both image and video models.

height?: number

Output video height. Only used if sizePreset is "custom"

lastFrameStrength?: number

Last frame strength for LTX-2.3 keyframe interpolation (when referenceImageEnd is provided). Controls how strictly the last frame is matched. Range: 0.0-1.0, default 0.6.

loras?: string[]

Array of LoRA IDs to apply. Available LoRAs are model-specific. The worker will download the LoRA if not already present on the persistent volume. LoRA IDs are resolved to filenames via the worker config API. Example: ['multiple_angles']

loraStrengths?: number[]

Array of LoRA strengths corresponding to each LoRA in the loras array. Values should be between 0.0 and 2.0. Defaults to 1.0 if not specified. Example: [0.9]

modelId: string

ID of the model to use, available models are available in the availableModels property of the ProjectsApi instance.

negativePrompt?: string

Prompt for what to be avoided. If not provided, server default is used.

network?: SupernetType

Override current network type. Default value can be read from sogni.account.currentAccount.network

numberOfMedia: number

Number of media files to generate. Depending on project type, this can be number of images or number of videos.

outputFormat?: "mp4"

Output video format. For now only 'mp4' is supported, defaults to 'mp4'.

positivePrompt: string

Prompt for what to be created

referenceAudio?: InputMedia

Reference audio for audio-driven video workflows (s2v, ia2v, a2v).

referenceAudioIdentity?: InputMedia

Reference audio for ID-LoRA speaker identity transfer (LTX-2.3 only). Provide a ~5 second audio clip of the target speaker's voice. The model uses this to transfer vocal identity into the generated video. Available on t2v, i2v, and v2v LTX-2.3 workflows. Not compatible with audio-driven workflows (s2v, ia2v, a2v).

referenceAudioUrls?: string[]

Seedance-only audio context references. These must be publicly accessible HTTPS URLs. Seedance does not support text+audio-only requests; include at least one image or video reference when using audio URL references.

referenceImage?: InputMedia

Reference image for video workflows. Maps to: startImage (i2v), characterImage (animate), referenceImage (s2v, ia2v)

referenceImageEnd?: InputMedia

Optional end image for i2v interpolation workflows. When provided with referenceImage, the video will interpolate between the two images.

referenceImageUrls?: string[]

Seedance-only loose image context references. These must be publicly accessible HTTPS URLs that the vendor can fetch. Use referenceImage / referenceImageEnd when the image should lock the first or last frame.

referenceVideo?: InputMedia

Reference video for animate and v2v (ControlNet) workflows. Maps to: drivingVideo (animate-move), sourceVideo (animate-replace), referenceVideo (v2v)

referenceVideoUrls?: string[]

Seedance-only video context references. These must be publicly accessible HTTPS URLs and map to Seedance reference_video assets.

sam2Coordinates?: { x: number; y: number }[]

SAM2 click coordinates for subject detection in animate-replace workflows. Array of {x, y} coordinate objects indicating where the subject is located in the reference image.

Coordinates can be normalized (0.0-1.0) or absolute pixel values. Normalized coordinates are automatically converted to pixel values by the server. If not provided, the server defaults to the center of the frame.

Example: [{ x: 0.5, y: 0.5 }] for center of frame

sampler?: string

Sampler, available options depend on the model. Use sogni.projects.getModelOptions(modelId) to get the list of available samplers.

scheduler?: string

Scheduler, available options depend on the model. Use sogni.projects.getModelOptions(modelId) to get the list of available schedulers.

seed?: number

Seed for one of images in project. Other will get random seed. Must be Uint32

shift?: number

Shift parameter for video diffusion models. Controls motion intensity. Range: 1.0-8.0, step 0.1. Default: 8.0 for regular models, 5.0 for speed lora (lightx2v) except s2v and animate which use 8.0

steps?: number

Number of steps. For most Stable Diffusion models, optimal value is 20.

stylePrompt?: string

Image style prompt. If not provided, server default is used.

teacacheThreshold?: number

TeaCache optimization threshold for T2V and I2V models. Range: 0.0-1.0. 0.0 = disabled. Recommended: 0.15 for T2V (~1.5x speedup), 0.2 for I2V (conservative quality-focused)

tokenType?: TokenType

Select which tokens to use for the project. If not specified, the Sogni token will be used.

trimEndFrame?: boolean

Trim the last frame from the generated video. Used for seamless stitching of transition videos where the last frame duplicates the end reference image. Default: false

type: "video"
videoStart?: number

Video start position in seconds for animate workflows (animate-move, animate-replace). Specifies where to begin reading from the reference video file. Default: 0

width?: number

Output video width. Only used if sizePreset is "custom"