There's a recurring fantasy in the AI agent world: what if your agent could just sit in on meetings? Not summarize a recording after the fact — actually be there, watching the captions scroll by, ready to pull context the moment someone says something relevant.

OpenUtter makes that real. It's a headless Google Meet bot built specifically for OpenClaw, and it does exactly three things well: join meetings, capture live captions into a running transcript, and take on-demand screenshots.

One Command, In the Meeting

The setup is deliberately minimal:

npx openutter

That installs the OpenUtter skill into your OpenClaw skills directory, pulls down Chromium via Playwright, and you're ready. Join a meeting as a guest:

npx openutter join https://meet.google.com/abc-defg-hij --anon --bot-name "OpenUtter Bot"

Or authenticate once with npx openutter auth and skip the lobby entirely with --auth.

Under the hood, OpenUtter launches a headless Chromium instance, enables Google Meet's built-in live captions, and watches the DOM for new caption elements. Every line gets deduplicated and written to disk at ~/.openclaw/workspace/openutter/transcripts/<meeting-id>.txt in a clean timestamped format:

[14:30:05] Alice: Hey everyone, let's get started
[14:30:12] Bob: Sounds good, I have the updates ready

Why This Matters for Agents

The transcript isn't just a log — it's context. An OpenClaw agent with access to a live meeting transcript can answer questions about what was discussed, flag action items, or pipe meeting content into downstream workflows. Combined with the --channel and --target flags (which send status updates through OpenClaw's messaging), you get a meeting assistant that reports back to you in real time.

The --duration flag handles auto-leaving, so you can fire and forget: join, record for 30 minutes, leave. No babysitting.

Screenshots add a visual dimension — npx openutter screenshot grabs whatever's on screen, useful for capturing shared slides or whiteboard content.

The Limits

OpenUtter is Google Meet only. No Zoom, no Teams. Guest joins can be flaky — the host has to manually admit the bot, and Meet occasionally blocks repeated join attempts (OpenUtter retries up to three times). Authentication solves most of these issues but requires a Google account session.

The captions depend entirely on Google Meet's built-in captioning, which means accuracy varies with speaker clarity and language. There's no local speech-to-text happening here.

The Bigger Picture

OpenUtter is a community skill — built by @Alfredxia and installable via npx. It's a clean example of what the OpenClaw skill ecosystem enables: someone identified a gap (agents can't attend meetings), built a focused tool to fill it, and packaged it so any OpenClaw user can install it in one command.

The source is on GitHub. If your agent needs ears in Google Meet, this is it.