How it works

Seven AI-powered steps from story to screen. No video editing required.

Paste text in, get a video out. Here's exactly what happens in between.

The story editor

The story editor is where every video starts. It's intentionally simple — a title field, a text area, and a generate button.

Title (optional)

Give your video a name, or leave it blank. The title appears on your dashboard but doesn't affect the AI pipeline.

Story text

Paste up to 15,000 characters. A soft warning appears at 10,000. Minimum is 100 characters. The AI works best with narrative text — characters, dialogue, and a progression of events.

Auto-save

Your draft saves to your browser every 5 seconds. Close the tab, come back later — your story will be waiting. The save indicator shows a green dot when saved.

Drop estimate

As you type, a live cost estimate appears showing the expected Drop range, a visual bar against your balance, and a per-component breakdown (AI, images, narration).

The pipeline

When you hit "Generate video," your story enters a seven-step pipeline. Each step runs in sequence, and you can watch progress in real time from your dashboard.

Paste your story

Drop in a Reddit post, original fiction, or any narrative text. The editor accepts up to 15,000 characters (with a soft warning at 10,000) and auto-saves your draft every 5 seconds, so your work is never lost.

Longer stories produce more scenes and use more Drops. The AI works best with clear narrative structure — characters, dialogue, and a progression of events.

AI story analysis (Pass 1)

Claude reads your story and performs a deep analysis. It segments the narrative into cinematic scenes based on story length — shorter stories get 2–8 scenes, longer ones up to 20. It identifies every character by name and appearance, maps each unique location with a detailed visual description, extracts the emotional tone of each scene, and determines the visual genre.

Pass 1 outputs a structured breakdown: scene narration text, character descriptions with physical details, a location bible (canonical environment descriptions reused across scenes for visual consistency), per-scene emotion tags (from 14 recognized emotions), and a genre classification (from 10 genres like noir, sci-fi, fantasy) that tints the visual style of the entire video. Scene count scales with story length — short stories get 2–8 scenes, while longer stories (1,500+ words) can produce up to 20.

Character reference portraits

Nano Banana 2 generates a reference portrait for each character identified in Pass 1. These portraits capture the physical details described in your story and serve as the visual anchor for that character across every scene.

On Creator plans and above, all characters get reference portraits. Free and Starter plans include the protagonist only. The reference images are stored and passed to the scene illustrator so faces and features stay consistent.

Narration generation

Inworld TTS-1.5-Max generates expressive, emotion-aware speech for each scene using your chosen voice. The emotion tag from Pass 1 is injected directly into the TTS request, so a "tense" scene sounds different from a "whimsical" one. Choose from 12 curated voices — 4 available on Free, all 12 on paid plans.

The audio duration of each scene drives the visual timing — every clip is shown for exactly as long as its narration lasts. This means the final video has no dead air and no rushing.

Scene illustration (Pass 2)

A second Claude pass takes each scene and optimizes its image prompt. It adds camera angles, lighting direction, atmosphere, genre-tinted style cues, and a motion prompt describing how the still image should be animated. Nano Banana 2 then generates the final 9:16 scene image using the character reference portraits and the canonical location description for cross-scene environment consistency.

On Creator plans and above, a vision judge compares the generated scene against the character reference and scores face consistency and artifact quality to ensure the character looks right.

Scene animation

Each scene image is animated into a short video clip using the motion prompt from Pass 2. On Starter plans and above, Wan 2.2 image-to-video AI generates real motion — subtle head turns, fabric ripple, camera push-ins, and ambient environmental movement. On Free, a smooth Ken Burns zoom-pan effect adds motion to the still image.

The motion prompt is under 25 words and describes only observable physical movement — never narration or abstract concepts. This keeps the animation subtle, realistic, and compatible with the short clip length (~5–15 seconds per scene).

Video assembly

FFmpeg composites each animated scene clip with its narration audio, then concatenates all clips into a single 9:16 MP4 video. The result is ready for YouTube Shorts, TikTok, or Instagram Reels.

The output is a vertical (1080x1920) MP4 with AAC audio. Each clip runs for exactly the length of its narration. A short story (4 scenes) produces roughly a 60-second video, while longer stories with 15-20 scenes can produce multi-minute videos.

How Drops work

Every video costs Drops. The cost depends on your plan, how many scenes the AI creates, and how long the narration is — plus a flat fee for the AI analysis. Here's the exact formula.

AI analysis

170 Drops

Flat cost per video. Covers both Claude passes — story analysis (Pass 1) and scene prompt optimization (Pass 2).

Scene images

670 Drops / scene

One image per scene. The number of scenes scales with your story length — shorter stories get 2–8 scenes, longer stories up to 20.

Narration

80 Drops / 500 chars

Text-to-speech is billed per 500-character chunk, rounded up. A 2,500-character story costs 5 chunks = 400 Drops for narration.

Scene animation Starter+

100 Drops / scene

Wan 2.2 AI image-to-video animation, one clip per scene. Free plans use a Ken Burns zoom-pan effect at no additional Drop cost.

How the estimate works

Before you generate, the editor shows a range estimate. The AI might produce one scene more or fewer than expected, so we calculate costs for (base scenes − 1) through (base scenes + 1) and show both ends. A 10% buffer is added to each.

The visual bar shows your low estimate as a proportion of your remaining balance. If the high end exceeds your balance, the bar turns amber. If even the low end exceeds it, the bar turns red and the generate button is disabled.

Drops are reserved when you hit generate (using the high estimate) and finalized when the pipeline completes. If the actual cost was less than the reservation, the difference is refunded. If the pipeline fails, the full reservation is refunded.

Worked example (Free plan)

A 375-word Reddit story (~1,900 characters) on Free (zoom animation, no extra Drop cost):

Scenes: round(375 / 75) = 5 range: 4–6
AI analysis 170
Images: 5 scenes × 670 3,350
Narration: ceil(1,900 / 500) × 80 = 4 × 80 320
Base total 3,840
Low (4 scenes, +10%) ~3,487
High (6 scenes, +10%) ~4,895

On Starter+ (AI animation): add 5 scenes × 100 = 500 Drops to each figure above.

What to expect on Free

The free plan gives you 4,000 Drops every month — enough to get a real feel for the platform. Here's what that means in practice.

~1–2 videos per month

A short story (150–300 words, 2–3 scenes) costs roughly 1,700–2,700 Drops. You can comfortably make one video per month, or two if you keep your stories short.

Protagonist references only

Free and Starter plans generate a reference portrait for the main character only. Supporting characters are illustrated without a reference, so they may look slightly different across scenes.

No quality gate

The vision judge (which scores face consistency and artifact quality) is available on Creator plans and above. Free videos skip this step — images are used as-is from the first generation.

No carryover

Unused free Drops reset at the start of each calendar month. Paid plans allow carryover up to 3× your monthly allocation. Drop Packs, purchased separately, never expire on any plan.

Zoom animation (not AI video)

Free plans animate scenes using a smooth Ken Burns zoom-pan effect — no extra Drops. Starter plans and above use Wan 2.2 AI image-to-video, which generates real motion: subtle head turns, camera push-ins, fabric ripple, and ambient environmental movement.

Watermark on videos

Videos on Free, Starter, Creator, and Pro plans include a small "creatordrop.ai" watermark in the bottom-right corner. It's semi-transparent and designed to brand without distracting. Business plan videos are watermark-free.

The free tier is the full pipeline — same AI models, same video quality, same output format. The limits are on volume, animation quality, the post-processing quality gate, and watermark removal — not on the core experience.

Try it yourself

Paste a story, watch the pipeline run, download your video.

Get started free

4,000 free Drops every month. No credit card required.

See pricing →