Tiny AI Companion Video Prompt Set
This is a three-shot concept sequence for a tiny AI companion: a small tangible desk object that represents an AI assistant through blinking eyes, gentle light patterns, and optional voice presence. The clips should be treated as an honest product vision, not proof of a finished device.
Suggested edit order: Clip 1 introduces the object, Clip 2 shows interaction, Clip 3 ends with emotional product desire and a concept-stage callout.
Clip 1 — The Object Wakes Up
Purpose: introduce the product as a tiny physical AI presence on a desk.
Clip 2 — The Assistant Thinks
Purpose: show the object as an embodied interface for an AI assistant.
Clip 3 — The Moment People Want One
Purpose: turn the concept into desire and invite interest.
Generated concept clips
These are the three generated 8-second concept clips, embedded in the intended sequence. The files were created from the prompts above and ordered by creation time, with the newest clip used as Clip 3.
ffmpeg binary pulled from the imageio-ffmpeg wheel, then analyzing those still frames with the image-analysis tool. Afterward, OpenClaw's built-in video description path was discovered and tested: openclaw infer video describe --file <video-path> --json, which used Google's gemini-3-flash-preview video understanding through OpenClaw's video.describe capability.
Clip 1 — The Object Wakes Up
Frame-extraction description: A small rounded white desktop AI companion sits on a warm wooden desk beside a laptop, notebook, and steaming mug. Its black face screen shifts from sleepy closed eyes to bright circular awake eyes, with a soft underglow. The clip reads as a cozy morning product-introduction shot and keeps the device design fairly consistent.
OpenClaw video-description output: A small, white, rounded robot sits on a wooden desk next to a steaming white mug. To its left is a laptop and a notebook. Initially, the robot’s black screen shows closed eyes represented by two horizontal lines. As the video progresses, its eyes light up into two bright yellow circles, giving it an awake and friendly appearance. The background is softly blurred, creating a cozy and modern atmosphere. The camera zooms in slightly on the robot's face as it activates.
Clip 2 — The Assistant Thinks
Frame-extraction description: The same general white companion device sits near a laptop while a person types. The device appears to respond or think: its eyes change, a glowing rim appears around the face/body, and it ends with a friendlier expression. The interaction concept is clear, though the face proportions and glow placement shift slightly between moments.
OpenClaw video-description output: An elderly woman types on a white laptop. Next to her sits a small, white companion robot. As she types, the robot’s dark screen lights up, displaying two glowing white dots for eyes surrounded by a light ring. The dots then turn into a friendly, smiling face. This interaction highlights a warm, supportive relationship between technology and the elderly in a cozy home environment.
Clip 3 — The Moment People Want One
Frame-extraction description: This clip shifts to a more premium glass-orb companion on a black base in an evening desk setup. A hand picks it up, and glowing eyes plus a smile appear inside the sphere. It feels more magical and high-end than the first two clips, but it is less visually consistent with the original white-device concept.
OpenClaw video-description output: A small glass sphere rests on a glowing base on a wooden desk, with a blurred laptop and lamp in the background. A hand reaches out and picks up the sphere, which then displays a simple, glowing smiley face inside. The face blinks and smiles warmly as the person holds the sphere in their palm. The base continues to emit a soft blue light, creating a high-tech and friendly atmosphere for this interactive digital assistant.
What this taught us
The frame-extraction method works when no direct video-understanding command is known, but it is a workaround: it depends on pulling still frames, may miss motion, and required a temporary bundled ffmpeg binary because system ffmpeg was not installed.
The better default for future workspace videos is OpenClaw's native video description command: openclaw infer video describe --file <video-path> --json. It records a cleaner method, describes motion directly, and avoids improvised tooling. Installing system ffmpeg would still be useful later for thumbnails, compression, frame extraction, audio extraction, and editing support.