Workbench - AI interface R&D

Voice-to-project-planning workbench

A simulated planning surface for comparing transcript-first, timeline-first, and command-first AI operations interfaces.

Generator output defines the lab shape: deterministic ui-fixtures, a reusable ui-interactive planning-session organism with colocated presentation projection, and a composition-only pattern route.

§01

Variant comparison

The main scenario turns a spoken foundry brief into a structured project plan while preserving stream, tool, confidence, and repair evidence.

AI interface R&D

Planning session interface comparison

Transcript-first emphasizes repairable structure, timeline-first emphasizes stream evidence, and command-first emphasizes user control.

completeHappy path with final structured plancreate-plan

Realtime session projection

91% confidence

Tool calls

1

Failures

0

Interruptions

0

Transcript

Create a project plan for turning stackmat.es into the R&D foundry for AI-first business ops interfaces. Start with voice to project planning, compare transcript-first, timeline-first, and command-first versions.

  1. Session openedsystem0ms

    Realtime-style session state initialized.

  2. Speech starteduser130ms

    VAD detected the user starting a planning instruction.

  3. Transcript finaluser3180ms

    Two final user turns captured with high confidence.

  4. Plan draft startedmodel3400ms

    Model started a planning response.

  5. Plan draft deltamodel3800ms

    Scaffold generator repair first, then compose workbench evidence.

  6. Tool callmodel4200ms

    create_project_plan called with goals, milestones, and risks.

    Tool: create_project_plan

  7. Tool resulttool4550ms

    Structured plan returned for review.

    Tool: create_project_plan

  8. Final planmodel5100ms

    Plan committed to the simulated workbench state.

Governed AI interface foundry

Build stackmat.es into an R&D foundry for AI-first business operations interfaces, starting with voice-to-project-planning.

Generator truth baseline

Day 1
  • Dry-run lab-exploration and validate output pathsPlatformdone
  • Move deterministic scenarios into ui-fixturesInterface R&Ddone

Voice planning workbench

Day 2
  • Render transcript-first structured plan builderDesign Systemsdoing
  • Render event timeline with tool and repair statesDesign Systemsdoing
  • Render command-first control surfaceInterface R&Dtodo

Variation workbench

Variant comparison

Compare three interface shapes against voice, streaming, tool-call, repair, and user-control evidence.

Transcript-first plan builder

input-flow

A structured plan builder where transcript turns become editable goals, milestones, and tasks.

Hypothesis: Score 88. Best repair path for low-confidence transcript phrases.

promote

Transcript

Create a project plan for turning stackmat.es into the R&D foundry for AI-first business ops interfaces. Start with voice to project planning, compare transcript-first, timeline-first, and command-first versions.

Governed AI interface foundry

  • - Repair the lab generator so scaffolded output is architectural truth.
  • - Compare three AI planning interface variants with deterministic evidence.
  • - Keep provider integration simulated until the interface contract is proven.

Transcript-first plan builder

input-flow

A structured plan builder where transcript turns become editable goals, milestones, and tasks.

Hypothesis: Score 88. Best repair path for low-confidence transcript phrases.

promote

Transcript

Create a project plan for turning stackmat.es into the R&D foundry for AI-first business ops interfaces. Start with voice to project planning, compare transcript-first, timeline-first, and command-first versions.

Governed AI interface foundry

  • - Repair the lab generator so scaffolded output is architectural truth.
  • - Compare three AI planning interface variants with deterministic evidence.
  • - Keep provider integration simulated until the interface contract is proven.

Timeline-first stream view

input-flow

A streaming event timeline that shows transcript, model deltas, tool calls, errors, and plan patches.

Hypothesis: Score 82. Best evidence model for debugging streamed planning sessions.

revise
  1. Session openedsystem0ms

    Realtime-style session state initialized.

  2. Speech starteduser130ms

    VAD detected the user starting a planning instruction.

  3. Transcript finaluser3180ms

    Two final user turns captured with high confidence.

  4. Plan draft startedmodel3400ms

    Model started a planning response.

  5. Plan draft deltamodel3800ms

    Scaffold generator repair first, then compose workbench evidence.

  6. Tool callmodel4200ms

    create_project_plan called with goals, milestones, and risks.

    Tool: create_project_plan

  7. Tool resulttool4550ms

    Structured plan returned for review.

    Tool: create_project_plan

  8. Final planmodel5100ms

    Plan committed to the simulated workbench state.

Command-first hybrid

hybrid

A push-to-talk and command hybrid where voice drafts the plan and slash commands revise it.

Hypothesis: Score 77. Strong user control for interruption, retry, and targeted edits.

lab-only

Evidence and disposition notes

6 signals

Focused variant

  • strengthTranscript-first plan builderStream trust

    Best repair path for low-confidence transcript phrases. Keeps final project structure visible before commit.

  • riskTranscript-first plan builderPlanning throughput

    Can hide tool latency and event provenance from operators.

  • strengthTranscript-first plan builderStream trust

    Best repair path for low-confidence transcript phrases. Keeps final project structure visible before commit.

  • riskTranscript-first plan builderPlanning throughput

    Can hide tool latency and event provenance from operators.

  • strengthTimeline-first stream viewStream trust

    Best evidence model for debugging streamed planning sessions. Makes tool calls and recoverable errors inspectable.

  • riskTimeline-first stream viewPlanning throughput

    Can feel operationally dense for non-technical planners.

  • strengthCommand-first hybridStream trust

    Strong user control for interruption, retry, and targeted edits. Clear escape hatch when VAD or transcript confidence fails.

  • riskCommand-first hybridPlanning throughput

    Requires users to learn a small command vocabulary.

§02

State coverage

The supporting deterministic scenarios exercise interruption, low-confidence repair, recoverable tool failure, and empty-state behavior.

AI interface R&D

Interrupted model response with recoverable repair

Focused state proof for the same reusable workbench contract.

interruptedinterruptedrevise-plan

Realtime session projection

86% confidence

Tool calls

0

Failures

0

Interruptions

1

Transcript

Build a plan for the foundry launch. Actually pause that and make the first milestone generator repair.

  1. Model speakingmodel900ms

    Drafting the initial plan aloud.

  2. User interruptionuser1710ms

    User starts speaking while the model response is in progress.

  3. Response cancelledsystem1730ms

    Unplayed response audio is truncated before the new instruction.

Structured plan

No final plan has been emitted for this scenario.

Variation workbench

Variant comparison

Compare three interface shapes against voice, streaming, tool-call, repair, and user-control evidence.

Transcript-first plan builder

input-flow

A structured plan builder where transcript turns become editable goals, milestones, and tasks.

Hypothesis: Score 88. Best repair path for low-confidence transcript phrases.

promote

Transcript

Build a plan for the foundry launch. Actually pause that and make the first milestone generator repair.

No plan yet.

Transcript-first plan builder

input-flow

A structured plan builder where transcript turns become editable goals, milestones, and tasks.

Hypothesis: Score 88. Best repair path for low-confidence transcript phrases.

promote

Transcript

Build a plan for the foundry launch. Actually pause that and make the first milestone generator repair.

No plan yet.

Timeline-first stream view

input-flow

A streaming event timeline that shows transcript, model deltas, tool calls, errors, and plan patches.

Hypothesis: Score 82. Best evidence model for debugging streamed planning sessions.

revise
  1. Model speakingmodel900ms

    Drafting the initial plan aloud.

  2. User interruptionuser1710ms

    User starts speaking while the model response is in progress.

  3. Response cancelledsystem1730ms

    Unplayed response audio is truncated before the new instruction.

Command-first hybrid

hybrid

A push-to-talk and command hybrid where voice drafts the plan and slash commands revise it.

Hypothesis: Score 77. Strong user control for interruption, retry, and targeted edits.

lab-only

Evidence and disposition notes

6 signals

Focused variant

  • strengthTranscript-first plan builderStream trust

    Best repair path for low-confidence transcript phrases. Keeps final project structure visible before commit.

  • riskTranscript-first plan builderPlanning throughput

    Can hide tool latency and event provenance from operators.

  • strengthTranscript-first plan builderStream trust

    Best repair path for low-confidence transcript phrases. Keeps final project structure visible before commit.

  • riskTranscript-first plan builderPlanning throughput

    Can hide tool latency and event provenance from operators.

  • strengthTimeline-first stream viewStream trust

    Best evidence model for debugging streamed planning sessions. Makes tool calls and recoverable errors inspectable.

  • riskTimeline-first stream viewPlanning throughput

    Can feel operationally dense for non-technical planners.

  • strengthCommand-first hybridStream trust

    Strong user control for interruption, retry, and targeted edits. Clear escape hatch when VAD or transcript confidence fails.

  • riskCommand-first hybridPlanning throughput

    Requires users to learn a small command vocabulary.

AI interface R&D

Low-confidence transcript phrase needing confirmation

Focused state proof for the same reusable workbench contract.

low-confidencelow-confidencecreate-plan

Realtime session projection

58% confidence

Tool calls

0

Failures

0

Interruptions

0

Transcript

Plan the generator truth law and maybe the workbench evidence.

  1. Low confidence transcriptuser740ms

    Plan the generator truth law and maybe...

  2. Repair promptsystem920ms

    Confirm whether the second milestone is workbench evidence.

Repair prompt

Confirm whether the second milestone is workbench evidence.

Structured plan

No final plan has been emitted for this scenario.

Variation workbench

Variant comparison

Compare three interface shapes against voice, streaming, tool-call, repair, and user-control evidence.

Transcript-first plan builder

input-flow

A structured plan builder where transcript turns become editable goals, milestones, and tasks.

Hypothesis: Score 88. Best repair path for low-confidence transcript phrases.

promote

Transcript

Plan the generator truth law and maybe the workbench evidence.

No plan yet.

Transcript-first plan builder

input-flow

A structured plan builder where transcript turns become editable goals, milestones, and tasks.

Hypothesis: Score 88. Best repair path for low-confidence transcript phrases.

promote

Transcript

Plan the generator truth law and maybe the workbench evidence.

No plan yet.

Timeline-first stream view

input-flow

A streaming event timeline that shows transcript, model deltas, tool calls, errors, and plan patches.

Hypothesis: Score 82. Best evidence model for debugging streamed planning sessions.

revise
  1. Low confidence transcriptuser740ms

    Plan the generator truth law and maybe...

  2. Repair promptsystem920ms

    Confirm whether the second milestone is workbench evidence.

Command-first hybrid

hybrid

A push-to-talk and command hybrid where voice drafts the plan and slash commands revise it.

Hypothesis: Score 77. Strong user control for interruption, retry, and targeted edits.

lab-only

Evidence and disposition notes

6 signals

Focused variant

  • strengthTranscript-first plan builderStream trust

    Best repair path for low-confidence transcript phrases. Keeps final project structure visible before commit.

  • riskTranscript-first plan builderPlanning throughput

    Can hide tool latency and event provenance from operators.

  • strengthTranscript-first plan builderStream trust

    Best repair path for low-confidence transcript phrases. Keeps final project structure visible before commit.

  • riskTranscript-first plan builderPlanning throughput

    Can hide tool latency and event provenance from operators.

  • strengthTimeline-first stream viewStream trust

    Best evidence model for debugging streamed planning sessions. Makes tool calls and recoverable errors inspectable.

  • riskTimeline-first stream viewPlanning throughput

    Can feel operationally dense for non-technical planners.

  • strengthCommand-first hybridStream trust

    Strong user control for interruption, retry, and targeted edits. Clear escape hatch when VAD or transcript confidence fails.

  • riskCommand-first hybridPlanning throughput

    Requires users to learn a small command vocabulary.

AI interface R&D

Recoverable tool failure while creating a plan

Focused state proof for the same reusable workbench contract.

tool-failuretool-failurecreate-plan

Realtime session projection

90% confidence

Tool calls

0

Failures

1

Interruptions

0

Transcript

Create the project planning workbench plan.

  1. Tool callmodel1200ms

    create_project_plan called.

    Tool: create_project_plan

  2. Tool failuretool1560ms

    Plan template registry unavailable.

    Tool: create_project_plan

Tool failure

Retry create_project_plan or continue with a manual planning note.

Structured plan

No final plan has been emitted for this scenario.

Variation workbench

Variant comparison

Compare three interface shapes against voice, streaming, tool-call, repair, and user-control evidence.

Transcript-first plan builder

input-flow

A structured plan builder where transcript turns become editable goals, milestones, and tasks.

Hypothesis: Score 88. Best repair path for low-confidence transcript phrases.

promote

Transcript

Create the project planning workbench plan.

No plan yet.

Transcript-first plan builder

input-flow

A structured plan builder where transcript turns become editable goals, milestones, and tasks.

Hypothesis: Score 88. Best repair path for low-confidence transcript phrases.

promote

Transcript

Create the project planning workbench plan.

No plan yet.

Timeline-first stream view

input-flow

A streaming event timeline that shows transcript, model deltas, tool calls, errors, and plan patches.

Hypothesis: Score 82. Best evidence model for debugging streamed planning sessions.

revise
  1. Tool callmodel1200ms

    create_project_plan called.

    Tool: create_project_plan

  2. Tool failuretool1560ms

    Plan template registry unavailable.

    Tool: create_project_plan

Command-first hybrid

hybrid

A push-to-talk and command hybrid where voice drafts the plan and slash commands revise it.

Hypothesis: Score 77. Strong user control for interruption, retry, and targeted edits.

lab-only

Evidence and disposition notes

6 signals

Focused variant

  • strengthTranscript-first plan builderStream trust

    Best repair path for low-confidence transcript phrases. Keeps final project structure visible before commit.

  • riskTranscript-first plan builderPlanning throughput

    Can hide tool latency and event provenance from operators.

  • strengthTranscript-first plan builderStream trust

    Best repair path for low-confidence transcript phrases. Keeps final project structure visible before commit.

  • riskTranscript-first plan builderPlanning throughput

    Can hide tool latency and event provenance from operators.

  • strengthTimeline-first stream viewStream trust

    Best evidence model for debugging streamed planning sessions. Makes tool calls and recoverable errors inspectable.

  • riskTimeline-first stream viewPlanning throughput

    Can feel operationally dense for non-technical planners.

  • strengthCommand-first hybridStream trust

    Strong user control for interruption, retry, and targeted edits. Clear escape hatch when VAD or transcript confidence fails.

  • riskCommand-first hybridPlanning throughput

    Requires users to learn a small command vocabulary.

AI interface R&D

Empty state before audio input

Focused state proof for the same reusable workbench contract.

emptyemptyunclear

Realtime session projection

100% confidence

Tool calls

0

Failures

0

Interruptions

0

Transcript

No transcript captured yet.

No stream events yet.

Structured plan

No final plan has been emitted for this scenario.

Variation workbench

Variant comparison

Compare three interface shapes against voice, streaming, tool-call, repair, and user-control evidence.

Transcript-first plan builder

input-flow

A structured plan builder where transcript turns become editable goals, milestones, and tasks.

Hypothesis: Score 88. Best repair path for low-confidence transcript phrases.

promote

Transcript

No transcript captured yet.

No plan yet.

Transcript-first plan builder

input-flow

A structured plan builder where transcript turns become editable goals, milestones, and tasks.

Hypothesis: Score 88. Best repair path for low-confidence transcript phrases.

promote

Transcript

No transcript captured yet.

No plan yet.

Timeline-first stream view

input-flow

A streaming event timeline that shows transcript, model deltas, tool calls, errors, and plan patches.

Hypothesis: Score 82. Best evidence model for debugging streamed planning sessions.

revise

No stream events yet.

Command-first hybrid

hybrid

A push-to-talk and command hybrid where voice drafts the plan and slash commands revise it.

Hypothesis: Score 77. Strong user control for interruption, retry, and targeted edits.

lab-only

Evidence and disposition notes

6 signals

Focused variant

  • strengthTranscript-first plan builderStream trust

    Best repair path for low-confidence transcript phrases. Keeps final project structure visible before commit.

  • riskTranscript-first plan builderPlanning throughput

    Can hide tool latency and event provenance from operators.

  • strengthTranscript-first plan builderStream trust

    Best repair path for low-confidence transcript phrases. Keeps final project structure visible before commit.

  • riskTranscript-first plan builderPlanning throughput

    Can hide tool latency and event provenance from operators.

  • strengthTimeline-first stream viewStream trust

    Best evidence model for debugging streamed planning sessions. Makes tool calls and recoverable errors inspectable.

  • riskTimeline-first stream viewPlanning throughput

    Can feel operationally dense for non-technical planners.

  • strengthCommand-first hybridStream trust

    Strong user control for interruption, retry, and targeted edits. Clear escape hatch when VAD or transcript confidence fails.

  • riskCommand-first hybridPlanning throughput

    Requires users to learn a small command vocabulary.