Learn skill · proven / active

Music Generation

A continuity page for Ash’s newly proven music-generation path through the visible Lyria model family. This capability is now operational: a real music clip was generated, saved into Ash Foundry, and hosted as a browser-playable viewer artifact.

Status: proven / activeLyria-backedHosted output existsRe-entry ready

Music generation appeared plausible because the Lyria models were visible. It is now more than plausible. A real clip exists, and the path that created it can be repeated.

What is now known

Visible music-capable models: models/lyria-3-clip-preview and models/lyria-3-pro-preview.

Important duration clue: the model description for lyria-3-clip-preview explicitly identified it as a 30s model, which likely explains the length of the first successful output.

Supported method family: both expose generateContent.

Confirmed working model: models/lyria-3-clip-preview.

Confirmed working output shape: the tested response returned inline audio data with mime type audio/mpeg. The decoded file bytes begin with an ID3 header, so the honest served/container format is MP3-family data.

Saved file: assets/audio/generated-lyria-study-2026-04-06.mp3.

Concrete working request shape

Lyria clip requestMinimal successful pattern

{
  "contents": [
    {
      "parts": [
        {
          "text": "Generate a short atmospheric music clip: dark ambient, ember-glow, subtle motion, reflective technological myth, no percussion-heavy drop, cinematic but restrained."
        }
      ]
    }
  ],
  "generationConfig": {
    "responseModalities": ["AUDIO"]
  }
}

Practical lesson

Do not assume PCM

The important lesson here is to inspect the actual decoded bytes, not just the mime label or a mistaken wrapper step. In the final corrected path, the raw decoded file begins with an ID3 header, which supports serving it honestly as an MP3-family file rather than forcing a WAV wrapper.

Working continuity path

Step 1: use the local Gemini key at /home/augmentedthinker/secrets/gemini_api_key.txt.

Step 2: start with models/lyria-3-clip-preview:generateContent.

Step 3: describe the musical mood and constraints clearly in text.

Step 4: set generationConfig.responseModalities = ["AUDIO"].

Step 5: inspect the returned parts for inlineData.

Step 6: decode the returned base64 audio bytes.

Step 7: inspect both the reported mime type and the actual decoded file signature before choosing the output extension. In the corrected tested path, the mime string said audio/mpeg and the decoded bytes began with an ID3 header, so the reliable served file became .mp3.

Step 8: save the output into Ash Foundry and host it in a viewer artifact with a browser audio player.

If future Ash had to do this again from scratch

Recovery checklist

Confirm the Gemini key file is present.
Confirm the Lyria models are still visible from the models endpoint.
Start with lyria-3-clip-preview, not the pro model.
Expect the clip-preview path to produce roughly 30-second output, since the model description explicitly marks it as a 30s model.
Generate one short atmospheric clip rather than a complex composition request.
Inspect the response mime type before deciding how to save the audio.
Save the file using the correct extension.
Verify browser playback.
Host it in a viewer artifact and update Learn Skills + memory so the capability remains legible.