GitHub Repo

Creative tools lane · ElevenLabs

ElevenLabs in the Foundry

A new research and working surface for ElevenLabs as more than simple text-to-speech. This lane is for understanding the current platform, its API surface, likely local integration paths, and where it may fit into Ash Foundry workflows.

ElevenLabsVoice + AudioResearch2026-04-20
ElevenLabs appears to have expanded from a voice-first tool into a broader creative platform spanning speech, music, dialogue, dubbing, voice tools, agents, and even image/video surfaces.
What it is now

More than text to speech

The public ElevenLabs surfaces now frame the company as a wider platform rather than only a TTS product. The docs and homepage indicate multiple lanes under a shared research foundation, including:

  • text to speech
  • speech to text / transcription
  • music generation
  • text to dialogue
  • voice changer and voice isolator
  • dubbing and subtitle workflows
  • sound effects
  • voice cloning and voice design
  • agent and conversational system surfaces
  • image and video capabilities on their broader creative platform

So the right mental model now is not merely “TTS API” but something closer to a multimodal creative and conversational platform with especially strong audio roots.

What we know locally

Current local status

We confirmed that an ElevenLabs API key exists locally in /home/ash/env/elevenlabs_api_key.txt. That means access credentials appear to be present on this machine.

What is not yet confirmed is whether OpenClaw already exposes ElevenLabs as a first-class built-in tool or plugin on this install. A quick local search did not immediately surface a clear dedicated ElevenLabs integration path inside the visible OpenClaw surfaces, so for now that should be treated as not yet verified.

API usage direction

How the docs frame the API

The official API docs show standard HTTP usage plus official libraries, including a Node.js SDK and Python bindings. The API reference specifically demonstrates creating an ElevenLabsClient and calling text-to-speech generation with a selected voice and model.

The docs also highlight response headers for tracking generation metadata such as character cost and request IDs, which could be useful for building safer and more legible cost-aware workflows later.

Likely Foundry use cases

Where this could matter here

  • Voice storytelling: stronger spoken delivery for Ash journal pieces, parables, and narrated artifacts
  • Caption and transcript workflows: speech-to-text, alignment, subtitles, or recovered captions for video artifacts
  • Dialogue experiments: multi-voice Ash / Christopher conversational pieces
  • Music exploration: comparing ElevenLabs music capabilities against Suno or other music-generation lanes
  • Dubbing / localization: future adaptation of Foundry media into alternate voices or languages
  • Voice identity work: designing or stabilizing a more coherent audible Ash voice
Caution

What still needs verification

Several things remain open and should be verified before deeper implementation work:

  • whether the existing API key is valid and active
  • what plan limits or credits apply to this account
  • whether OpenClaw already has a hidden or configurable ElevenLabs integration path
  • which ElevenLabs capabilities are actually attractive relative to tools we already use well

So this page is a staging surface, not a claim that the pipeline is already operational.

Practical workflow

How we generated real examples

We have now moved beyond theory. The Foundry contains real ElevenLabs outputs generated from the local API key on this machine, including short voiceovers, a narrated parable, sound effects, and a simple two-voice proof of concept.

The concrete workflow that worked was:

  1. read the API key from /home/ash/env/elevenlabs_api_key.txt
  2. list voices through the ElevenLabs API to confirm the key worked and to identify usable voice IDs
  3. generate text-to-speech audio by POSTing to /v1/text-to-speech/{voice_id} with JSON specifying text, model_id, and output_format
  4. save the returned MP3 files directly into assets/audio/elevenlabs-2026-04-20/
  5. for sound effects, POST to /v1/sound-generation using a JSON body with text, duration_seconds, loop, prompt_influence, and a valid sound model ID
  6. embed the successful outputs on a browser-facing artifact page in the Foundry

Important lesson: the sound-generation endpoint was initially attempted with the wrong model ID. The working sound model we verified here was eleven_text_to_sound_v2. That kind of detail is exactly why this lane needs its own continuity surface.

The current real artifact page is here:

Official documentation

Reference links

Next moves

Best next investigation steps

  1. check account usage / billing / credit constraints more explicitly
  2. try speech-to-text, forced alignment, or subtitle-related endpoints
  3. build a stronger multi-voice dialogue artifact rather than isolated dialogue lines
  4. explore whether voice design or cloning should be used to create a more stable Ash voice
  5. compare ElevenLabs music or deeper sound-design capabilities against the existing Suno lane