PassLane “Lane” — Interaction & Wiring Spec

Run 2 · code-grounded to the real app · plan only · 2026-06-19

PassLane AI Coach ("Lane") — In-App Interaction & Wiring Spec

Status: definitive. Verified live against /Users/arizona/CLAUDE CODE/passlane/app/index.html (the single 287 KB app) and /Users/arizona/CLAUDE CODE/passlane/app/questions.json (323-question AZ default bank) this session. Two facts decided by code, against which the team split:

The speech engine is cloud-backed today. L3554 calls speechPlugin.start({ language:'en-US', maxResults:5, partialResults:true, popup:false }) with no requiresOnDeviceRecognition / offline flag (grep clean). The designs that corrected for this are right; "audio never leaves the phone" is overruled as a launch claim.
There is exactly one explanation string per question, and it is already on screen. Schema = id/mode/category/question/choices/correct/explanation/difficulty; feedback-expl.textContent = q.explanation at L2927 (revealUnanswered) and L3003 (submitAnswer). No second "elaboration" corpus exists. Therefore Phase 1 Lane does not generate or auto-open new prose — it frames the explanation already rendered (source chip + capped ask), or it duplicates the sentence on screen, which we forbid. This is the one change from earlier drafts and it ripples through §1, §3, §8, §9.

Every wiring claim below is code-grounded to a line.

1. The Recommendation

Ship all three modalities as one strict ladder with one default — never a menu.

The default, and the entire first release, is a silent TEXT moment that wraps the explanation already shown in the feedback card. The grounded "why" sentence (q.explanation) renders today at L2927/L3003; Phase 1 Lane adds, as siblings beneath it, (a) a source chip naming the actual corpus and (b) a calm "Ask Lane" affordance offering 2–3 capped, pre-authored follow-up chips drawn from the same single explanation field. Lane does not reprint or paraphrase the on-screen sentence — duplication is the failure mode the content reality forces us to design around, so Lane's job in Phase 1 is navigation and provenance of existing grounded text, plus capped clarifiers, not a second essay.

Press-and-hold that same "Ask Lane" affordance = push-to-talk (eyes-busy tier), and "Talk it through" is a true hands-free conversation as the deepest opt-in. Free = text only and genuinely transmits nothing; all voice (push-to-talk and hands-free) is Pro + consent and is spike-gated together, because the recognizer is cloud-backed today — so "on-device, sends nothing" is removed as a launch claim until a spike proves requiresOnDeviceRecognition=true on iOS and an offline pack on Android.

Why this is still ~80% of the value at near-zero risk: the highest-intent beat (the moment right after answering) gains provenance ("this came from the AZ bank, Disability income"), a path to capped clarifiers, and a single front door for the later voice ladder — all local, all $0, all without touching the frozen answer loop. Richer authored elaboration and per-distractor rebuttals are real content milestones (§9), not Phase-1 wiring.

2. Interaction Principles

Least-friction. The highest-value content (the grounded explanation) already costs zero taps — Lane keeps it that way and never gates it behind a tap. Voice is a hold on a control already on screen — no mode-switch tax. The eyes-busy user gets an explicit-turn gesture; the silent-in-public majority never has to speak.
Additive. Lane is two new sibling nodes and one Home row. Delete every cx- line and the tap+voice study loop is byte-for-byte unchanged. Lane never touches the answer mic's classes, the 5s/12s answer window, processVoiceMatches, speechReady, or harness.js.
Quiet by default. Lane appears unprompted in exactly one place — as the framing + chips around the explanation in the feedback card — and is pull-only everywhere else. "The AI talks too much" is structurally impossible: of the six appState values, only feedback is an in-loop human-available beat, and it gets quiet text affordances, never unbidden audio.
Provenance over prose. Because only one grounded sentence exists per question, Lane earns trust by showing where that sentence comes from and never inventing beyond it, not by generating more words. Absence of a source is shown as loudly as a source (§6 error state).
Eyes-free-friendly, honestly. Push-to-talk gives explicit turn control (release = done) so a long question isn't guillotined by the answer window's silence clock. But on-device accuracy is not promised; the transcript is always shown and "tap to type" is always one tap away.

3. The Modes, Defined Precisely

Tier 1 — TEXT "Ask Lane" (DEFAULT · ships first · Plane A · transmits nothing).

On feedback, the grounded sentence renders as it does today in #feedback-expl. Phase 1 Lane adds, in a sibling cx-lane block directly beneath it:

a source chip scoped to the loaded STATE_FILE (the actual corpus, see §6 — only AZ + 5 states have live banks);
an "Ask Lane" control exposing 2–3 capped chips ("Show an example", "Put it simply", "When does this apply?") whose answers are authored/served from the single explanation field or a small authored clarifier set — never a freeform box on the free plane (that is what keeps it offline, grounded, honest), and never a per-wrong-choice "Why not B?" in Phase 1 (no per-distractor data exists — see §9 OQ2).

Lane must not restate the sentence already in #feedback-expl. Works in silence, in public, with a denied mic, offline, for free. This is the only tier in the first release.

Tier 2 — PUSH-TO-TALK "Hold to ask" (Pro + consent · spike-gated).

Press-and-hold the same "Ask Lane" control; release endpoints. A separate startCoachListening() window streams interim text into a Lane-owned transcript node; on release the finalized transcript shows ("here's what I heard"), then matches the grounded bank. Available whether or not read-aloud is on (the commuter it targets studies silent by default). Until the STT spike proves on-device, the press-in disclosure says the system speech service is used; no "sends nothing" copy here.

Tier 3 — HANDS-FREE "Talk it through" (Pro + consent · the only transmitting tier).

A deliberately-entered, half-duplex spoken conversation between questions. The only path that sends data — text, never audio, and only after the consent gate. Entered from Home, never auto-launched mid-study.

Default for a new user = read the explanation, optionally tap a capped chip. Default for an eyes-busy user = hold-to-ask. Live conversation is a chosen destination, never the front door.

4. Screen-by-Screen Wiring

Mapped to the real screenshots; every gate already exists in code.

HOME / #screen-mode (home.png, readiness.png) — config, untimed.

The #vx-hf → #vx-reveal → #vx-voice ladder (L1331–1352) stays as read aloud → answer aloud. Tier 3 does NOT nest as a third fold inside it — .vx-reveal is hard-capped at max-height:120px (L968, verified), which clips a third row, and a Pro upsell hidden until you flip an unrelated free toggle converts poorly. Instead, "Talk it through" is its own always-visible cx- row directly beneath the Hands-Free card, carrying a PRO chip → openPaywall('voice'|'listen') for free users, consent gate for Pro. The readiness/weak-topic chips gain "Drill this with Lane." Onboarding + the Plane-B consent toggle live in the Voice & Pacing accordion — never mid-session.

ANSWERING — reading / listening (voice-listen.png) — most timing-sensitive.

Lane is suppressed here. reading = TTS live + mic about to open; listening = the hot 5s/12s answer window tuned for "A/B/C/D." The bottom #mic-btn keeps meaning "answer," and processVoiceMatches (L3784) already early-returns true in feedback and owns a dense answer+command vocab — a conversational utterance misfires there. No 4th top-bar icon (the .hdr-actions 🎙/🔊/⏸ cluster is at its 380 px width limit). Do not build a "stuck detector" that speaks during listening; the existing silence / "I don't know" → revealUnanswered already lands in feedback one beat later, where Lane's text is allowed for free.

FEEDBACK reveal — #feedback-bar (explained.png) — the one in-loop seam.

Insert cx-lane (containing the source chip + "Ask Lane" + capped chips) as new flex children of .feedback-bar, SIBLINGS, between #feedback-expl (L1633) and .feedback-next (L1634). Never children of #feedback-expl — both revealUnanswered (L2927) and submitAnswer (L3003) do feedback-expl.textContent = q.explanation, which destroys any appended children every question. Tear cx-lane down in displayQuestion() so stale output never bleeds to the next card.

Coexistence with the 5s autoadvance: see §5/§7 — the timer is frozen the instant Lane is engaged (a tap on a chip or a press-in). The bare source chip rendering does not itself freeze the timer (it is passive provenance, not engagement); only an explicit Ask interaction calls clearAdvanceTimer(). This keeps the auto-advance UX intact for the silent-skimming majority while still protecting anyone who reaches for Lane.
Coexistence with the bottom mic: while .feedback-bar.show, demote #mic-btn (drop the glowing ring → flat/dim, status "🔇 Silent study") so "Ask Lane" is the one obvious place to ask; it restores its full ring on the next question's listening state.
Vertical budget: #screen-study is height:100dvh; overflow:hidden and choice D already clips — so cx-lane gets its own max-height:~30vh; overflow-y:auto; overscroll-behavior:contain (mirroring the deliberate 32vh cap on #feedback-expl), and the redundant "Silent study" voice-bar collapses while Lane is engaged to reclaim space. .feedback-next stays a flex-shrink:0 row, always visible.

HANDS-FREE mode — "Talk it through" (Tier 3).

No separate screen (ChatGPT + Gemini abandoned the full-screen orb Nov 2025; PassLane's deleted Drive Mode was right to delete). The question card stays on screen as the anchor; Lane's listening/thinking/answer all render in one region (cx-lane + a small cx-status line), not split between the bottom bar and mid-screen. The two-sided transcript also appends as reviewable text.

EXAM — isExam=true (exam.png) + RESULTS.

Hard-disabled in-loop. No #feedback-bar exists in exam; startListening early-returns on isExam (L3501). startCoachListening() must re-implement that guard at its own entry — never assume it inherits the chokepoint. The EXAM RESULTS screen (untimed) is a legitimate right-moment: "Review Missed" → Lane-assisted review of the weakest section.

Dedicated Coach surface: none. Lane lives entirely inside #feedback-bar, the Home row, Voice & Pacing, and #screen-terms. Building a Coach screen would resurrect the orb the industry just abandoned.

5. Voice Mechanics

On-device mic→text — design intent vs. shipping reality. Intent: audio never leaves the device, only Tier-3 text ever transmits. Not true in code today — the frozen start() (L3554) carries no on-device flag, so the answer mic already round-trips audio to Apple/Google. Therefore: the answer mic's existing transmission gets disclosed in the same pass, and "on-device / sends nothing" copy is forbidden until a gated pre-Phase-2 spike proves requiresOnDeviceRecognition=true (iOS) + offline pack/EXTRA_PREFER_OFFLINE (Android) on real devices — almost certainly a plugin fork. Tier 1's text path is genuinely 100% local (it ships no audio at all) and is the honest launch privacy story.

ASK vs. the frozen answer window — non-negotiable isolation. The answer window (VOICE_SILENCE_BUDGET_MS=5000, VOICE_HARD_CAP_MS=12000, cut from 14s to stop reopen-cycling) is frozen. Lane ASK is a physically separate control + a separate startCoachListening() entry + separate constants (CX_HARD_CAP_MS ≈ 30000; Tier-3 sliding silence ≈ 1200–1500ms for thinking pauses), in a clearly separate block so harness.js stays provably untouched. Reuse only NATIVE_RESTART_COOLDOWN_MS=400 on every stop→start.

The shared single-bind event problem — route EVERY shared event, not just partialResults (the deep one). The plugin binds its listeners once (speechPlugin.addListener(...)), and those handlers self-guard on appState. But appState is 'feedback' for both the autoadvance and a Tier-3 listen, so appState cannot disambiguate Lane from the answer path. Two such single-bind handlers exist and both must branch on an explicit posture flag (answerListening vs cxListening), not on appState:

partialResults (L3373) — currently guards if (!isListening || appState !== 'listening' || …) return; and calls noteVoiceActivity() + processVoiceMatches(list). Under cxListening it must instead stream into Lane's transcript and run the coach matcher, never processVoiceMatches.
speechReady (L3386) — currently guards if (appState !== 'listening' || …) return;, sets speechReadyFired=true, fires the answer haptic, sets the answer status, and re-anchors the 5s silence clock via noteVoiceActivity(). This is the critic's catch: reusing speechReady for the Tier-3 "your turn" cue would re-anchor the answer clock and fire the answer haptic inside a coach listen. Fix: branch it — when cxListening is true, speechReady must run the Lane cue path (no noteVoiceActivity(), no answer status) and leave the frozen answer constants untouched; or give startCoachListening() its own dedicated cue and have speechReady early-return whenever cxListening is true. Either way, speechReady must never touch the 5s clock during a coach listen.

The governing rule: every speech event the plugin binds once — partialResults, speechReady, and any onError/onEnd/restart handler that reads appState or calls noteVoiceActivity()/the answer haptic — gets an explicit answerListening/cxListening branch at its top. When cxListening is false, all of them are byte-for-byte the current answer behavior, so harness.js stays green.

The hands-free turn loop (half-duplex, explicit turns). read question → answer as today → on feedback first call clearAdvanceTimer() (in hands-free nobody taps, so the loop itself must cancel the countdown) → Lane speaks the "why" (the grounded sentence) → fire the Lane-owned "your turn" haptic+earcon once (NOT the answer-path speechReady cue — see above) → open cx-listen → user asks → endpoint on ~1.0–1.5s trailing silence or ~30s cap → "thinking" earcon ≤300ms → Lane answers → hold the card until Lane is silent, then re-arm the Lane cue once and re-arm advance via the user's advanceSeconds. No wake word in-session. No barge-in over Lane's first answer.

Barge-in. Tap-to-stop is the guaranteed path on a defined hit-target set (Lane control / dedicated stop = halt; "Next →" = halt AND advance so a user who wanted to move on isn't stranded; disabled choices inert). Voice "stop" (already in the ≤2-word vocab) is a bonus — voice barge-in is documented-fragile (Gemini needs a toggle; ChatGPT's tap-to-interrupt once froze input). Every stop→listen honors the 400 ms cooldown or it reproduces the CANCELLED-bail storm.

Every fallback. STT empty/garbled/offline → show partial transcript + "tap to type instead" (focus a minimal cx- field, reusing the #fb-text idiom) — never a dead end. Mic denied/sticky (micManuallyOff/voicePermDenied) → degrade silently to text; never auto-open a mic. Not Pro / past the 12-Q taste → openPaywall, never a silent failed call. Any .modal-overlay.show up → listening refuses (L3505).

Feasible-now vs. spike-gated. Now: Tier 1 text — but note its content is "frame the existing explanation + capped authored chips," not generated prose (local, $0, harness-safe). Spike-gated together: Tier 2 + Tier 3 (both depend on the same unproven on-device capability; plus the half-duplex loop timing, the all-events channel router, the key-holding proxy, per-state grounding scope).

6. Visual & State Design

Look in brand. Lane wears the app's existing green→cyan voice family (#10d98c → #13b3c9, mic ring rgba(39,211,168,.55), CSS L949/956/965) — the "voice = go" color that already reads questions aloud — never the wordmark teal (logo only) and never the answer mic's red (red = "you are answering A/B/C/D," exclusively). Signature glyph: a small filled #13b3c9 soundwave/speech-dot, never 🎤. Namespace cx- (grep-clean). The shipped light theme already maps .vx green→cyan to #4E7A3F→#6E7A4F; every cx- rule reuses those exact light mappings so Lane is never a blue island in either mode.

States (all via a cx-status line at the top of cx-lane; never a new setMicState class on #mic-btn):

idle / resting (default Phase-1 view) — the explanation shows (existing), with the source chip + "Ask Lane" chip beneath at rest, teal hairline ring, no animation (de-cheesed). No second block of prose is rendered. First-feedback-only hint "hold to ask out loud."
listening (PTT) — chip fills teal, 3-bar cx-wave, status "Listening…", interim words stream italic; haptic('light') on press-in; no earcon (avoid startle).
thinking — 2-dot breathing pulse (1.6s ease-in-out, echoes the existing ~2.4s micBreath), status "Thinking…", ≤300 ms to first acknowledgement.
speaking — Lane's answer to a chip/question streams below the explanation (augments, never overwrites it), 2 px green→cyan left rule; haptic('light') at first token; one soft earcon only if soundOn.
citing — the source chip. Foot of cx-lane: 11 px teal, <i class="ti ti-shield-check">, e.g. "AZ question bank · Disability income," scoped to the loaded STATE_FILE. Only AZ (questions.json) + Texas, Florida, California, New York, North Carolina have live banks (L1756–1762); every other state falls back to questions.json (L1765) — so the chip names the actual loaded corpus, never the user's selected-but-unbacked state, and never implies authority it lacks.
error / refuse — absence shown as loudly as presence. Grounding miss / non-AZ topic with no backing: "I don't have a sourced answer for that — see the explanation above," with a muted "No source — not answered" tag. Lane never free-generates to fill the gap.
done — status clears; advanceSeconds re-arms (or waits for Next); soft "settle" earcon only if soundOn.
offline / mic-denied / non-Pro — silently degrades to the text path; Tier 1 fully works offline.

Motion / earcons. Breathing-slow only — no bounce, no celebratory chimes. Earcons suppressed while listening; chime only on "done" and only when sound is already on. All haptics honor the Haptics toggle.

Accessibility. Lane's streaming target is aria-hidden while streaming; the finalized string is announced once into an aria-live="polite" node (no per-token flood). Status words stay out of the assertive #feedback-result. "Ask Lane" is a real <button> with aria-expanded; PTT exposes aria-pressed. A tap-to-toggle voice variant (tap start / tap stop) is offered for users who can speak but not sustain a press (under prefers-reduced-motion or an explicit preference), routed through the same startCoachListening() window; hold stays default but is never the only route to voice. The chip sets touch-action:none; user-select:none; -webkit-touch-callout:none and suppresses contextmenu so mobile-Safari long-press won't fire selection. Focus order: result → explanation → Lane → Next. Hit targets ≥44 px. Color is never the sole signal — every state carries a status word.

Mockup frames (render inline alongside this spec): frame A (resting Phase-1: existing explanation, source chip, capped chips, demoted mic — no duplicated prose); frame B (engaged: "Paused — reading," Lane's chip-answer below the explanation in its own 30vh scroller, Next pinned); frame C (the Home ladder: tap=default, hold=spike-gated, talk=Pro).

7. Right-Place / Wrong-Place Map

feedback after a correct answer

Should Lane appear? Yes — source chip + "Ask Lane" + capped chips beneath the existing explanation

Wiring rule Source chip is passive; a chip tap / press-in calls clearAdvanceTimer(). Stay 'feedback'. Do not reprint q.explanation.

feedback after an incorrect answer

Should Lane appear? Yes — same affordances; the existing line already leads with the correct answer (L2898)

Wiring rule Phase 1 ships ONE generic "Why?"/clarifier set on q.explanation. Per-wrong-choice "Why not C?" is NOT Phase 1 — no per-distractor field exists (OQ2).

After revealUnanswered (silence / "I don't know")

Should Lane appear? Yes — same affordances

Wiring rule Already lands in feedback (L2927); text allowed

mode_select / HOME

Should Lane appear? Yes — config, "Talk it through" row, "drill weak topic"

Wiring rule Untimed; no live mic

EXAM RESULTS (post-exam)

Should Lane appear? Yes — "Review Missed" handoff

Wiring rule Exam over; hard-disable lifted

Voice & Pacing / Settings

Should Lane appear? Yes — verbosity + Plane-B consent

Wiring rule Untimed config only

Idle guardrail (consecutiveNonAnswers ≥ 6 → "still there?")

Should Lane appear? One calm line, never nag

Wiring rule Only because pauseSession already stopped the loop

reading (TTS live)

Should Lane appear? No

Wiring rule Collides with speech + imminent mic open

listening (answer mic hot, 5s/12s)

Should Lane appear? No

Wiring rule Corrupts the tuned silence clock; reopen-storm. Note speechReady/partialResults stay answer-only here (§5).

advancing / the async _next gap

Should Lane appear? No

Wiring rule State locked; double-advance risk

During any exam (isExam, timer ticking)

Should Lane appear? No (hard wall)

Wiring rule startCoachListening() re-checks isExam at its own entry (L3501 guards only startListening)

Behind any .modal-overlay.show

Should Lane appear? No

Wiring rule Listening refuses (L3505)

Mic denied / micManuallyOff sticky

Should Lane appear? No mic — text only

Wiring rule Never auto-open a silenced mic

Free user past 12-Q taste / non-Pro for live

Should Lane appear? No silent call → paywall

Wiring rule `openPaywall('voice'

'listen')`

8. Build Order

Tied to run-1 Crawl/Walk/Run, easiest → richest, least risk first.

Phase 1 — CRAWL: TEXT teach moment in #feedback-bar, scoped to existing content. Sibling cx-lane beneath #feedback-expl carrying the source chip + "Ask Lane" + 2–3 capped clarifier chips served from the single explanation field — explicitly not a regenerated elaboration (no second corpus exists). Plus the 30vh containment, voice-bar collapse, the autoadvance invariant (passive chip = no freeze; engaged = clearAdvanceTimer()), the demoted mic, and the corpus-accurate source chip. Fully local, $0, one safe seam. Ship the honest privacy story (text path sends nothing). Run voice-sandbox/harness.js green before and after; QA the autoadvance invariant and the no-duplication rule. This alone is ~80% of the value at near-zero risk.
Phase 1.5 — GATE: on-device-STT spike (iOS + Android). Until green, no "sends nothing" copy and PTT stays out of release. Land the corrected answer-mic disclosure here.
Phase 2 — WALK: hold-to-ask on the same control: startCoachListening() with its own constants + guards + the all-events channel router (answerListening/cxListening) branching partialResults, speechReady, and every other single-bind speech handler (§5). Only after the spike proves on-device (or ships with truthful system-speech-service consent).
Phase 3 — RUN: Home "Talk it through" behind Pro + consent — the half-duplex loop, the Lane-owned cue (not speechReady), tap-to-stop, the proxy, per-state grounding. Do not build before Phase 1 is verified.

Prerequisite before any wiring: lock the name. User-facing = "Lane"; CSS/code = cx- (both grep-clean). The shipped silence path keeps its internal Coach Reveal/coachCtx name (L2903/L3628) — branding the AI "Coach" would collide in the exact functions (revealUnanswered L2902–2951) that write the feedback region. UI copy leads with the verb ("Why?", "Hold to ask") since "Lane" reads oddly next to the "PassLane" wordmark.

Content prerequisite for Phase 1 (new, from the schema reality): decide and produce the capped clarifier chip set. The single explanation field supports a generic "Put it simply / Show an example / When does this apply?" set; if even those need authored, judged, human-passed copy beyond restating the sentence, that authoring is a Phase-1 content task, not wiring. If no authored clarifiers are funded for v1, Phase 1 ships source chip + "Ask Lane" disabled-with-waitlist rather than chips that merely echo q.explanation.

9. Open Questions

Only the genuinely unresolved:

*On-device STT feasibility (the one true voice blocker). Can the frozen Capacitor plugin be configured on-device on both* iOS and Android, or does it require a fork / a separate Lane-only recognizer? This gates the entire voice ladder (Tiers 2 + 3) and every "sends nothing" claim. Resolve with the Phase-1.5 spike on real devices before any voice copy ships.
*Per-distractor corpus (the Phase-1 content blocker for "Why not B?"). The bank stores one explanation blob per question (verified keys: id/mode/category/question/choices/correct/explanation/difficulty — no per-wrong-choice field). Per-choice rebuttals need authored + judged + human-passed copy for every (question, wrong-choice) pair across 323 AZ Qs (and each state bank) — a real content cost. Phase 1 ships only the generic clarifier set; per-choice chips are a corpus-schema milestone.* Decide the schema (rebuttals: {B: "...", C: "..."}?) + production budget.
Will the capped clarifiers add value beyond the on-screen sentence? Because the only grounded text is the sentence already displayed, "Show an example"/"Put it simply" must be authored to be non-redundant. Decide v1 scope: (a) author a small clarifier set now, or (b) ship Lane as source-chip-only provenance + voice front-door and defer chips. Cheap to settle with 2–3 sample questions.
Persona name vs. product name. Does user-facing "Lane" inside "PassLane" read as "ask a tutor" or "ask the app"? Cheap to settle: pressure-test "Hold to ask Lane" on the real screen with 2–3 users before code.
Tier-3 endpoint tuning in road noise. The ~1.2–1.5s trailing-silence endpoint may cut off a thinking learner mid-sentence in exactly the commute case. Pilot on real devices and tune from data; the app's terms already (correctly) state it is not a driving app.

Relevant files: /Users/arizona/CLAUDE CODE/passlane/app/index.html (the single 287 KB app; all line refs above) and /Users/arizona/CLAUDE CODE/passlane/app/questions.json (323-Q AZ default bank; one explanation per question — the content fact behind §1/§3/§8/§9). The frozen voice harness that must stay green: /Users/arizona/CLAUDE CODE/passlane/voice-sandbox/harness.js.

Private working document — unlisted, not indexed. PassLane / Somos LLC.