Workbench - AI interface R&D

DESIGN.md — paper

Voice-to-project-planning workbench

A simulated planning surface for comparing transcript-first, timeline-first, and command-first AI operations interfaces.

Generator output defines the lab shape: deterministic ui-fixtures, a reusable ui-interactive planning-session organism with colocated presentation projection, and a composition-only pattern route.

§01

voiceplanningtool calls

Variant comparison

The main scenario turns a spoken foundry brief into a structured project plan while preserving stream, tool, confidence, and repair evidence.

AI interface R&D

Planning session interface comparison

Transcript-first emphasizes repairable structure, timeline-first emphasizes stream evidence, and command-first emphasizes user control.

completeHappy path with final structured plancreate-plan

Realtime session projection

91% confidence

Tool calls

Failures

Interruptions

Transcript

Create a project plan for turning stackmat.es into the R&D foundry for AI-first business ops interfaces. Start with voice to project planning, compare transcript-first, timeline-first, and command-first versions.

Session openedsystem0ms
Realtime-style session state initialized.
Speech starteduser130ms
VAD detected the user starting a planning instruction.
Transcript finaluser3180ms
Two final user turns captured with high confidence.
Plan draft startedmodel3400ms
Model started a planning response.
Plan draft deltamodel3800ms
Scaffold generator repair first, then compose workbench evidence.
Tool callmodel4200ms
create_project_plan called with goals, milestones, and risks.
Tool: create_project_plan
Tool resulttool4550ms
Structured plan returned for review.
Tool: create_project_plan
Final planmodel5100ms
Plan committed to the simulated workbench state.

Governed AI interface foundry

Build stackmat.es into an R&D foundry for AI-first business operations interfaces, starting with voice-to-project-planning.

Generator truth baseline

Day 1

Dry-run lab-exploration and validate output pathsPlatformdone
Move deterministic scenarios into ui-fixturesInterface R&Ddone

Voice planning workbench

Day 2

Render transcript-first structured plan builderDesign Systemsdoing
Render event timeline with tool and repair statesDesign Systemsdoing
Render command-first control surfaceInterface R&Dtodo

Variation workbench

Variant comparison

Compare three interface shapes against voice, streaming, tool-call, repair, and user-control evidence.

Transcript-first plan builder

input-flow

A structured plan builder where transcript turns become editable goals, milestones, and tasks.

Hypothesis: Score 88. Best repair path for low-confidence transcript phrases.

promote

Transcript

Governed AI interface foundry

- Repair the lab generator so scaffolded output is architectural truth.
- Compare three AI planning interface variants with deterministic evidence.
- Keep provider integration simulated until the interface contract is proven.

Transcript-first plan builder

input-flow

A structured plan builder where transcript turns become editable goals, milestones, and tasks.

Hypothesis: Score 88. Best repair path for low-confidence transcript phrases.

promote

Transcript

Governed AI interface foundry

- Repair the lab generator so scaffolded output is architectural truth.
- Compare three AI planning interface variants with deterministic evidence.
- Keep provider integration simulated until the interface contract is proven.

Timeline-first stream view

input-flow

A streaming event timeline that shows transcript, model deltas, tool calls, errors, and plan patches.

Hypothesis: Score 82. Best evidence model for debugging streamed planning sessions.

revise

Session openedsystem0ms
Realtime-style session state initialized.
Speech starteduser130ms
VAD detected the user starting a planning instruction.
Transcript finaluser3180ms
Two final user turns captured with high confidence.
Plan draft startedmodel3400ms
Model started a planning response.
Plan draft deltamodel3800ms
Scaffold generator repair first, then compose workbench evidence.
Tool callmodel4200ms
create_project_plan called with goals, milestones, and risks.
Tool: create_project_plan
Tool resulttool4550ms
Structured plan returned for review.
Tool: create_project_plan
Final planmodel5100ms
Plan committed to the simulated workbench state.

Command-first hybrid

hybrid

A push-to-talk and command hybrid where voice drafts the plan and slash commands revise it.

Hypothesis: Score 77. Strong user control for interruption, retry, and targeted edits.

lab-only

Evidence and disposition notes

6 signals

Focused variant

strengthTranscript-first plan builderStream trust
Best repair path for low-confidence transcript phrases. Keeps final project structure visible before commit.
riskTranscript-first plan builderPlanning throughput
Can hide tool latency and event provenance from operators.

strengthTranscript-first plan builderStream trust
Best repair path for low-confidence transcript phrases. Keeps final project structure visible before commit.
riskTranscript-first plan builderPlanning throughput
Can hide tool latency and event provenance from operators.
strengthTimeline-first stream viewStream trust
Best evidence model for debugging streamed planning sessions. Makes tool calls and recoverable errors inspectable.
riskTimeline-first stream viewPlanning throughput
Can feel operationally dense for non-technical planners.
strengthCommand-first hybridStream trust
Strong user control for interruption, retry, and targeted edits. Clear escape hatch when VAD or transcript confidence fails.
riskCommand-first hybridPlanning throughput
Requires users to learn a small command vocabulary.

§02

interruptionrepairempty

State coverage

The supporting deterministic scenarios exercise interruption, low-confidence repair, recoverable tool failure, and empty-state behavior.

AI interface R&D

Interrupted model response with recoverable repair

Focused state proof for the same reusable workbench contract.

interruptedinterruptedrevise-plan

Realtime session projection

86% confidence

Tool calls

Failures

Interruptions

Transcript

Build a plan for the foundry launch. Actually pause that and make the first milestone generator repair.

Model speakingmodel900ms
Drafting the initial plan aloud.
User interruptionuser1710ms
User starts speaking while the model response is in progress.
Response cancelledsystem1730ms
Unplayed response audio is truncated before the new instruction.

Structured plan

No final plan has been emitted for this scenario.

Variation workbench

Variant comparison

Compare three interface shapes against voice, streaming, tool-call, repair, and user-control evidence.

Transcript-first plan builder

input-flow

A structured plan builder where transcript turns become editable goals, milestones, and tasks.

Hypothesis: Score 88. Best repair path for low-confidence transcript phrases.

promote

Transcript

Build a plan for the foundry launch. Actually pause that and make the first milestone generator repair.

No plan yet.

Transcript-first plan builder

input-flow

A structured plan builder where transcript turns become editable goals, milestones, and tasks.

Hypothesis: Score 88. Best repair path for low-confidence transcript phrases.

promote

Transcript

Build a plan for the foundry launch. Actually pause that and make the first milestone generator repair.

No plan yet.

Timeline-first stream view

input-flow

A streaming event timeline that shows transcript, model deltas, tool calls, errors, and plan patches.

Hypothesis: Score 82. Best evidence model for debugging streamed planning sessions.

revise

Model speakingmodel900ms
Drafting the initial plan aloud.
User interruptionuser1710ms
User starts speaking while the model response is in progress.
Response cancelledsystem1730ms
Unplayed response audio is truncated before the new instruction.

Command-first hybrid

hybrid

A push-to-talk and command hybrid where voice drafts the plan and slash commands revise it.

Hypothesis: Score 77. Strong user control for interruption, retry, and targeted edits.

lab-only

Evidence and disposition notes

6 signals

Focused variant

strengthTranscript-first plan builderStream trust
Best repair path for low-confidence transcript phrases. Keeps final project structure visible before commit.
riskTranscript-first plan builderPlanning throughput
Can hide tool latency and event provenance from operators.

strengthTranscript-first plan builderStream trust
Best repair path for low-confidence transcript phrases. Keeps final project structure visible before commit.
riskTranscript-first plan builderPlanning throughput
Can hide tool latency and event provenance from operators.
strengthTimeline-first stream viewStream trust
Best evidence model for debugging streamed planning sessions. Makes tool calls and recoverable errors inspectable.
riskTimeline-first stream viewPlanning throughput
Can feel operationally dense for non-technical planners.
strengthCommand-first hybridStream trust
Strong user control for interruption, retry, and targeted edits. Clear escape hatch when VAD or transcript confidence fails.
riskCommand-first hybridPlanning throughput
Requires users to learn a small command vocabulary.

AI interface R&D

Low-confidence transcript phrase needing confirmation

Focused state proof for the same reusable workbench contract.

low-confidencelow-confidencecreate-plan

Realtime session projection

58% confidence

Tool calls

Failures

Interruptions

Transcript

Plan the generator truth law and maybe the workbench evidence.

Low confidence transcriptuser740ms
Plan the generator truth law and maybe...
Repair promptsystem920ms
Confirm whether the second milestone is workbench evidence.

Repair prompt

Confirm whether the second milestone is workbench evidence.

Structured plan

No final plan has been emitted for this scenario.

Variation workbench

Variant comparison

Compare three interface shapes against voice, streaming, tool-call, repair, and user-control evidence.

Transcript-first plan builder

input-flow

A structured plan builder where transcript turns become editable goals, milestones, and tasks.

Hypothesis: Score 88. Best repair path for low-confidence transcript phrases.

promote

Transcript

Plan the generator truth law and maybe the workbench evidence.

No plan yet.

Transcript-first plan builder

input-flow

A structured plan builder where transcript turns become editable goals, milestones, and tasks.

Hypothesis: Score 88. Best repair path for low-confidence transcript phrases.

promote

Transcript

Plan the generator truth law and maybe the workbench evidence.

No plan yet.

Timeline-first stream view

input-flow

A streaming event timeline that shows transcript, model deltas, tool calls, errors, and plan patches.

Hypothesis: Score 82. Best evidence model for debugging streamed planning sessions.

revise

Low confidence transcriptuser740ms
Plan the generator truth law and maybe...
Repair promptsystem920ms
Confirm whether the second milestone is workbench evidence.

Command-first hybrid

hybrid

A push-to-talk and command hybrid where voice drafts the plan and slash commands revise it.

Hypothesis: Score 77. Strong user control for interruption, retry, and targeted edits.

lab-only

Evidence and disposition notes

6 signals

Focused variant

strengthTranscript-first plan builderStream trust
Best repair path for low-confidence transcript phrases. Keeps final project structure visible before commit.
riskTranscript-first plan builderPlanning throughput
Can hide tool latency and event provenance from operators.

strengthTranscript-first plan builderStream trust
Best repair path for low-confidence transcript phrases. Keeps final project structure visible before commit.
riskTranscript-first plan builderPlanning throughput
Can hide tool latency and event provenance from operators.
strengthTimeline-first stream viewStream trust
Best evidence model for debugging streamed planning sessions. Makes tool calls and recoverable errors inspectable.
riskTimeline-first stream viewPlanning throughput
Can feel operationally dense for non-technical planners.
strengthCommand-first hybridStream trust
Strong user control for interruption, retry, and targeted edits. Clear escape hatch when VAD or transcript confidence fails.
riskCommand-first hybridPlanning throughput
Requires users to learn a small command vocabulary.

AI interface R&D

Recoverable tool failure while creating a plan

Focused state proof for the same reusable workbench contract.

tool-failuretool-failurecreate-plan

Realtime session projection

90% confidence

Tool calls

Failures

Interruptions

Transcript

Create the project planning workbench plan.

Tool callmodel1200ms
create_project_plan called.
Tool: create_project_plan
Tool failuretool1560ms
Plan template registry unavailable.
Tool: create_project_plan

Tool failure

Retry create_project_plan or continue with a manual planning note.

Structured plan

No final plan has been emitted for this scenario.

Variation workbench

Variant comparison

Compare three interface shapes against voice, streaming, tool-call, repair, and user-control evidence.

Transcript-first plan builder

input-flow

A structured plan builder where transcript turns become editable goals, milestones, and tasks.

Hypothesis: Score 88. Best repair path for low-confidence transcript phrases.

promote

Transcript

Create the project planning workbench plan.

No plan yet.

Transcript-first plan builder

input-flow

A structured plan builder where transcript turns become editable goals, milestones, and tasks.

Hypothesis: Score 88. Best repair path for low-confidence transcript phrases.

promote

Transcript

Create the project planning workbench plan.

No plan yet.

Timeline-first stream view

input-flow

A streaming event timeline that shows transcript, model deltas, tool calls, errors, and plan patches.

Hypothesis: Score 82. Best evidence model for debugging streamed planning sessions.

revise

Tool callmodel1200ms
create_project_plan called.
Tool: create_project_plan
Tool failuretool1560ms
Plan template registry unavailable.
Tool: create_project_plan

Command-first hybrid

hybrid

A push-to-talk and command hybrid where voice drafts the plan and slash commands revise it.

Hypothesis: Score 77. Strong user control for interruption, retry, and targeted edits.

lab-only

Evidence and disposition notes

6 signals

Focused variant

strengthTranscript-first plan builderStream trust
Best repair path for low-confidence transcript phrases. Keeps final project structure visible before commit.
riskTranscript-first plan builderPlanning throughput
Can hide tool latency and event provenance from operators.

strengthTranscript-first plan builderStream trust
Best repair path for low-confidence transcript phrases. Keeps final project structure visible before commit.
riskTranscript-first plan builderPlanning throughput
Can hide tool latency and event provenance from operators.
strengthTimeline-first stream viewStream trust
Best evidence model for debugging streamed planning sessions. Makes tool calls and recoverable errors inspectable.
riskTimeline-first stream viewPlanning throughput
Can feel operationally dense for non-technical planners.
strengthCommand-first hybridStream trust
Strong user control for interruption, retry, and targeted edits. Clear escape hatch when VAD or transcript confidence fails.
riskCommand-first hybridPlanning throughput
Requires users to learn a small command vocabulary.

AI interface R&D

Empty state before audio input

Focused state proof for the same reusable workbench contract.

emptyemptyunclear

Realtime session projection

100% confidence

Tool calls

Failures

Interruptions

Transcript

No transcript captured yet.

No stream events yet.

Structured plan

No final plan has been emitted for this scenario.

Variation workbench

Variant comparison

Compare three interface shapes against voice, streaming, tool-call, repair, and user-control evidence.

Transcript-first plan builder

input-flow

A structured plan builder where transcript turns become editable goals, milestones, and tasks.

Hypothesis: Score 88. Best repair path for low-confidence transcript phrases.

promote

Transcript

No transcript captured yet.

No plan yet.

Transcript-first plan builder

input-flow

A structured plan builder where transcript turns become editable goals, milestones, and tasks.

Hypothesis: Score 88. Best repair path for low-confidence transcript phrases.

promote

Transcript

No transcript captured yet.

No plan yet.

Timeline-first stream view

input-flow

A streaming event timeline that shows transcript, model deltas, tool calls, errors, and plan patches.

Hypothesis: Score 82. Best evidence model for debugging streamed planning sessions.

revise

No stream events yet.

Command-first hybrid

hybrid

A push-to-talk and command hybrid where voice drafts the plan and slash commands revise it.

Hypothesis: Score 77. Strong user control for interruption, retry, and targeted edits.

lab-only

Evidence and disposition notes

6 signals

Focused variant

strengthTranscript-first plan builderStream trust
Best repair path for low-confidence transcript phrases. Keeps final project structure visible before commit.
riskTranscript-first plan builderPlanning throughput
Can hide tool latency and event provenance from operators.

strengthTranscript-first plan builderStream trust
Best repair path for low-confidence transcript phrases. Keeps final project structure visible before commit.
riskTranscript-first plan builderPlanning throughput
Can hide tool latency and event provenance from operators.
strengthTimeline-first stream viewStream trust
Best evidence model for debugging streamed planning sessions. Makes tool calls and recoverable errors inspectable.
riskTimeline-first stream viewPlanning throughput
Can feel operationally dense for non-technical planners.
strengthCommand-first hybridStream trust
Strong user control for interruption, retry, and targeted edits. Clear escape hatch when VAD or transcript confidence fails.
riskCommand-first hybridPlanning throughput
Requires users to learn a small command vocabulary.