Tiny AI Companion Video Prompt Set · May 10, 2026

This is a three-shot concept sequence for a tiny AI companion: a small tangible desk object that represents an AI assistant through blinking eyes, gentle light patterns, and optional voice presence. The clips should be treated as an honest product vision, not proof of a finished device.

Suggested edit order: Clip 1 introduces the object, Clip 2 shows interaction, Clip 3 ends with emotional product desire and a concept-stage callout.

Clip 1 — The Object Wakes Up

Purpose: introduce the product as a tiny physical AI presence on a desk.

8-second cinematic product demo video, 16:9. A cozy modern desk at dawn, laptop open, notebook, coffee, soft blue-and-amber light. On the desk sits a tiny palm-sized AI companion device, like a futuristic digital pet: rounded matte white shell, small black glass face, two simple expressive LED eyes. The device is asleep at first, then gently wakes as the eyes blink open and a soft breathing light pulses. Camera slowly pushes in with shallow depth of field. Premium but warm, emotionally inviting, realistic consumer product ad style. No readable text, no logos, no humans speaking. End with the tiny companion looking awake and present.

Clip 2 — The Assistant Thinks

Purpose: show the object as an embodied interface for an AI assistant.

8-second cinematic concept product video, 16:9. A person’s hands are visible at a desk typing a question into a laptop beside the tiny AI companion device. The device reacts as if connected to the AI: its LED eyes blink thoughtfully, a soft ring of light moves around its body while it “thinks,” then the eyes brighten with a gentle happy expression. The laptop screen should be out of focus with no readable text. The tiny device feels alive but not cartoonish, more like a calm desk familiar for modern AI work. Smooth close-up camera movement, warm realistic lighting, premium startup product advertisement aesthetic. No logos, no captions, no readable UI.

Clip 3 — The Moment People Want One

Purpose: turn the concept into desire and invite interest.

8-second cinematic concept ad video, 16:9. The tiny AI companion sits beside a laptop during an evening creative work session. The user’s hand gently picks it up; the device looks up with blinking LED eyes and a soft glow, as if the AI assistant is physically present in the room. Cut to a beautiful hero close-up of the device in the person’s palm, warm light reflecting on its glass face. The feeling is: “your AI, finally present.” Premium emotional product launch style, realistic hardware prototype, subtle futuristic sound-design vibe implied visually. No readable text, no logos. Make it clear this is a concept product vision, not a toy commercial.

Generated concept clips

These are the three generated 8-second concept clips, embedded in the intended sequence. The files were created from the prompts above and ordered by creation time, with the newest clip used as Clip 3.

Analysis method note: the first descriptions below were created by extracting representative still frames from each MP4 with a temporary bundled ffmpeg binary pulled from the imageio-ffmpeg wheel, then analyzing those still frames with the image-analysis tool. Afterward, OpenClaw's built-in video description path was discovered and tested: openclaw infer video describe --file <video-path> --json, which used Google's gemini-3-flash-preview video understanding through OpenClaw's video.describe capability.

Clip 1 — The Object Wakes Up

Frame-extraction description: A small rounded white desktop AI companion sits on a warm wooden desk beside a laptop, notebook, and steaming mug. Its black face screen shifts from sleepy closed eyes to bright circular awake eyes, with a soft underglow. The clip reads as a cozy morning product-introduction shot and keeps the device design fairly consistent.

OpenClaw video-description output: A small, white, rounded robot sits on a wooden desk next to a steaming white mug. To its left is a laptop and a notebook. Initially, the robot’s black screen shows closed eyes represented by two horizontal lines. As the video progresses, its eyes light up into two bright yellow circles, giving it an awake and friendly appearance. The background is softly blurred, creating a cozy and modern atmosphere. The camera zooms in slightly on the robot's face as it activates.

Clip 2 — The Assistant Thinks

Frame-extraction description: The same general white companion device sits near a laptop while a person types. The device appears to respond or think: its eyes change, a glowing rim appears around the face/body, and it ends with a friendlier expression. The interaction concept is clear, though the face proportions and glow placement shift slightly between moments.

OpenClaw video-description output: An elderly woman types on a white laptop. Next to her sits a small, white companion robot. As she types, the robot’s dark screen lights up, displaying two glowing white dots for eyes surrounded by a light ring. The dots then turn into a friendly, smiling face. This interaction highlights a warm, supportive relationship between technology and the elderly in a cozy home environment.

Clip 3 — The Moment People Want One

Frame-extraction description: This clip shifts to a more premium glass-orb companion on a black base in an evening desk setup. A hand picks it up, and glowing eyes plus a smile appear inside the sphere. It feels more magical and high-end than the first two clips, but it is less visually consistent with the original white-device concept.

OpenClaw video-description output: A small glass sphere rests on a glowing base on a wooden desk, with a blurred laptop and lamp in the background. A hand reaches out and picks up the sphere, which then displays a simple, glowing smiley face inside. The face blinks and smiles warmly as the person holds the sphere in their palm. The base continues to emit a soft blue light, creating a high-tech and friendly atmosphere for this interactive digital assistant.

What this taught us

The frame-extraction method works when no direct video-understanding command is known, but it is a workaround: it depends on pulling still frames, may miss motion, and required a temporary bundled ffmpeg binary because system ffmpeg was not installed.

The better default for future workspace videos is OpenClaw's native video description command: openclaw infer video describe --file <video-path> --json. It records a cleaner method, describes motion directly, and avoids improvised tooling. Installing system ffmpeg would still be useful later for thumbnails, compression, frame extraction, audio extraction, and editing support.

Optional caption for posting later

Concept product test: a tiny physical avatar for your AI assistant. Not available yet — we’re gauging whether people want this built. Would you keep one on your desk?