Status: definitive. Verified live against /Users/arizona/CLAUDE CODE/passlane/app/index.html (the single 287 KB app) and /Users/arizona/CLAUDE CODE/passlane/app/questions.json (323-question AZ default bank) this session. Two facts decided by code, against which the team split:
speechPlugin.start({ language:'en-US', maxResults:5, partialResults:true, popup:false }) with no requiresOnDeviceRecognition / offline flag (grep clean). The designs that corrected for this are right; "audio never leaves the phone" is overruled as a launch claim.id/mode/category/question/choices/correct/explanation/difficulty; feedback-expl.textContent = q.explanation at L2927 (revealUnanswered) and L3003 (submitAnswer). No second "elaboration" corpus exists. Therefore Phase 1 Lane does not generate or auto-open new prose — it frames the explanation already rendered (source chip + capped ask), or it duplicates the sentence on screen, which we forbid. This is the one change from earlier drafts and it ripples through §1, §3, §8, §9.Every wiring claim below is code-grounded to a line.
Ship all three modalities as one strict ladder with one default — never a menu.
The default, and the entire first release, is a silent TEXT moment that wraps the explanation already shown in the feedback card. The grounded "why" sentence (q.explanation) renders today at L2927/L3003; Phase 1 Lane adds, as siblings beneath it, (a) a source chip naming the actual corpus and (b) a calm "Ask Lane" affordance offering 2–3 capped, pre-authored follow-up chips drawn from the same single explanation field. Lane does not reprint or paraphrase the on-screen sentence — duplication is the failure mode the content reality forces us to design around, so Lane's job in Phase 1 is navigation and provenance of existing grounded text, plus capped clarifiers, not a second essay.
Press-and-hold that same "Ask Lane" affordance = push-to-talk (eyes-busy tier), and "Talk it through" is a true hands-free conversation as the deepest opt-in. Free = text only and genuinely transmits nothing; all voice (push-to-talk and hands-free) is Pro + consent and is spike-gated together, because the recognizer is cloud-backed today — so "on-device, sends nothing" is removed as a launch claim until a spike proves requiresOnDeviceRecognition=true on iOS and an offline pack on Android.
Why this is still ~80% of the value at near-zero risk: the highest-intent beat (the moment right after answering) gains provenance ("this came from the AZ bank, Disability income"), a path to capped clarifiers, and a single front door for the later voice ladder — all local, all $0, all without touching the frozen answer loop. Richer authored elaboration and per-distractor rebuttals are real content milestones (§9), not Phase-1 wiring.
cx- line and the tap+voice study loop is byte-for-byte unchanged. Lane never touches the answer mic's classes, the 5s/12s answer window, processVoiceMatches, speechReady, or harness.js.appState values, only feedback is an in-loop human-available beat, and it gets quiet text affordances, never unbidden audio.Tier 1 — TEXT "Ask Lane" (DEFAULT · ships first · Plane A · transmits nothing).
On feedback, the grounded sentence renders as it does today in #feedback-expl. Phase 1 Lane adds, in a sibling cx-lane block directly beneath it:
STATE_FILE (the actual corpus, see §6 — only AZ + 5 states have live banks);explanation field or a small authored clarifier set — never a freeform box on the free plane (that is what keeps it offline, grounded, honest), and never a per-wrong-choice "Why not B?" in Phase 1 (no per-distractor data exists — see §9 OQ2).Lane must not restate the sentence already in #feedback-expl. Works in silence, in public, with a denied mic, offline, for free. This is the only tier in the first release.
Tier 2 — PUSH-TO-TALK "Hold to ask" (Pro + consent · spike-gated).
Press-and-hold the same "Ask Lane" control; release endpoints. A separate startCoachListening() window streams interim text into a Lane-owned transcript node; on release the finalized transcript shows ("here's what I heard"), then matches the grounded bank. Available whether or not read-aloud is on (the commuter it targets studies silent by default). Until the STT spike proves on-device, the press-in disclosure says the system speech service is used; no "sends nothing" copy here.
Tier 3 — HANDS-FREE "Talk it through" (Pro + consent · the only transmitting tier).
A deliberately-entered, half-duplex spoken conversation between questions. The only path that sends data — text, never audio, and only after the consent gate. Entered from Home, never auto-launched mid-study.
Default for a new user = read the explanation, optionally tap a capped chip. Default for an eyes-busy user = hold-to-ask. Live conversation is a chosen destination, never the front door.
Mapped to the real screenshots; every gate already exists in code.
HOME / #screen-mode (home.png, readiness.png) — config, untimed.
The #vx-hf → #vx-reveal → #vx-voice ladder (L1331–1352) stays as read aloud → answer aloud. Tier 3 does NOT nest as a third fold inside it — .vx-reveal is hard-capped at max-height:120px (L968, verified), which clips a third row, and a Pro upsell hidden until you flip an unrelated free toggle converts poorly. Instead, "Talk it through" is its own always-visible cx- row directly beneath the Hands-Free card, carrying a PRO chip → openPaywall('voice'|'listen') for free users, consent gate for Pro. The readiness/weak-topic chips gain "Drill this with Lane." Onboarding + the Plane-B consent toggle live in the Voice & Pacing accordion — never mid-session.
ANSWERING — reading / listening (voice-listen.png) — most timing-sensitive.
Lane is suppressed here. reading = TTS live + mic about to open; listening = the hot 5s/12s answer window tuned for "A/B/C/D." The bottom #mic-btn keeps meaning "answer," and processVoiceMatches (L3784) already early-returns true in feedback and owns a dense answer+command vocab — a conversational utterance misfires there. No 4th top-bar icon (the .hdr-actions 🎙/🔊/⏸ cluster is at its 380 px width limit). Do not build a "stuck detector" that speaks during listening; the existing silence / "I don't know" → revealUnanswered already lands in feedback one beat later, where Lane's text is allowed for free.
FEEDBACK reveal — #feedback-bar (explained.png) — the one in-loop seam.
Insert cx-lane (containing the source chip + "Ask Lane" + capped chips) as new flex children of .feedback-bar, SIBLINGS, between #feedback-expl (L1633) and .feedback-next (L1634). Never children of #feedback-expl — both revealUnanswered (L2927) and submitAnswer (L3003) do feedback-expl.textContent = q.explanation, which destroys any appended children every question. Tear cx-lane down in displayQuestion() so stale output never bleeds to the next card.
clearAdvanceTimer(). This keeps the auto-advance UX intact for the silent-skimming majority while still protecting anyone who reaches for Lane..feedback-bar.show, demote #mic-btn (drop the glowing ring → flat/dim, status "🔇 Silent study") so "Ask Lane" is the one obvious place to ask; it restores its full ring on the next question's listening state.#screen-study is height:100dvh; overflow:hidden and choice D already clips — so cx-lane gets its own max-height:~30vh; overflow-y:auto; overscroll-behavior:contain (mirroring the deliberate 32vh cap on #feedback-expl), and the redundant "Silent study" voice-bar collapses while Lane is engaged to reclaim space. .feedback-next stays a flex-shrink:0 row, always visible.HANDS-FREE mode — "Talk it through" (Tier 3).
No separate screen (ChatGPT + Gemini abandoned the full-screen orb Nov 2025; PassLane's deleted Drive Mode was right to delete). The question card stays on screen as the anchor; Lane's listening/thinking/answer all render in one region (cx-lane + a small cx-status line), not split between the bottom bar and mid-screen. The two-sided transcript also appends as reviewable text.
EXAM — isExam=true (exam.png) + RESULTS.
Hard-disabled in-loop. No #feedback-bar exists in exam; startListening early-returns on isExam (L3501). startCoachListening() must re-implement that guard at its own entry — never assume it inherits the chokepoint. The EXAM RESULTS screen (untimed) is a legitimate right-moment: "Review Missed" → Lane-assisted review of the weakest section.
Dedicated Coach surface: none. Lane lives entirely inside #feedback-bar, the Home row, Voice & Pacing, and #screen-terms. Building a Coach screen would resurrect the orb the industry just abandoned.
On-device mic→text — design intent vs. shipping reality. Intent: audio never leaves the device, only Tier-3 text ever transmits. Not true in code today — the frozen start() (L3554) carries no on-device flag, so the answer mic already round-trips audio to Apple/Google. Therefore: the answer mic's existing transmission gets disclosed in the same pass, and "on-device / sends nothing" copy is forbidden until a gated pre-Phase-2 spike proves requiresOnDeviceRecognition=true (iOS) + offline pack/EXTRA_PREFER_OFFLINE (Android) on real devices — almost certainly a plugin fork. Tier 1's text path is genuinely 100% local (it ships no audio at all) and is the honest launch privacy story.
ASK vs. the frozen answer window — non-negotiable isolation. The answer window (VOICE_SILENCE_BUDGET_MS=5000, VOICE_HARD_CAP_MS=12000, cut from 14s to stop reopen-cycling) is frozen. Lane ASK is a physically separate control + a separate startCoachListening() entry + separate constants (CX_HARD_CAP_MS ≈ 30000; Tier-3 sliding silence ≈ 1200–1500ms for thinking pauses), in a clearly separate block so harness.js stays provably untouched. Reuse only NATIVE_RESTART_COOLDOWN_MS=400 on every stop→start.
The shared single-bind event problem — route EVERY shared event, not just partialResults (the deep one). The plugin binds its listeners once (speechPlugin.addListener(...)), and those handlers self-guard on appState. But appState is 'feedback' for both the autoadvance and a Tier-3 listen, so appState cannot disambiguate Lane from the answer path. Two such single-bind handlers exist and both must branch on an explicit posture flag (answerListening vs cxListening), not on appState:
partialResults (L3373) — currently guards if (!isListening || appState !== 'listening' || …) return; and calls noteVoiceActivity() + processVoiceMatches(list). Under cxListening it must instead stream into Lane's transcript and run the coach matcher, never processVoiceMatches.speechReady (L3386) — currently guards if (appState !== 'listening' || …) return;, sets speechReadyFired=true, fires the answer haptic, sets the answer status, and re-anchors the 5s silence clock via noteVoiceActivity(). This is the critic's catch: reusing speechReady for the Tier-3 "your turn" cue would re-anchor the answer clock and fire the answer haptic inside a coach listen. Fix: branch it — when cxListening is true, speechReady must run the Lane cue path (no noteVoiceActivity(), no answer status) and leave the frozen answer constants untouched; or give startCoachListening() its own dedicated cue and have speechReady early-return whenever cxListening is true. Either way, speechReady must never touch the 5s clock during a coach listen.The governing rule: every speech event the plugin binds once — partialResults, speechReady, and any onError/onEnd/restart handler that reads appState or calls noteVoiceActivity()/the answer haptic — gets an explicit answerListening/cxListening branch at its top. When cxListening is false, all of them are byte-for-byte the current answer behavior, so harness.js stays green.
The hands-free turn loop (half-duplex, explicit turns). read question → answer as today → on feedback first call clearAdvanceTimer() (in hands-free nobody taps, so the loop itself must cancel the countdown) → Lane speaks the "why" (the grounded sentence) → fire the Lane-owned "your turn" haptic+earcon once (NOT the answer-path speechReady cue — see above) → open cx-listen → user asks → endpoint on ~1.0–1.5s trailing silence or ~30s cap → "thinking" earcon ≤300ms → Lane answers → hold the card until Lane is silent, then re-arm the Lane cue once and re-arm advance via the user's advanceSeconds. No wake word in-session. No barge-in over Lane's first answer.
Barge-in. Tap-to-stop is the guaranteed path on a defined hit-target set (Lane control / dedicated stop = halt; "Next →" = halt AND advance so a user who wanted to move on isn't stranded; disabled choices inert). Voice "stop" (already in the ≤2-word vocab) is a bonus — voice barge-in is documented-fragile (Gemini needs a toggle; ChatGPT's tap-to-interrupt once froze input). Every stop→listen honors the 400 ms cooldown or it reproduces the CANCELLED-bail storm.
Every fallback. STT empty/garbled/offline → show partial transcript + "tap to type instead" (focus a minimal cx- field, reusing the #fb-text idiom) — never a dead end. Mic denied/sticky (micManuallyOff/voicePermDenied) → degrade silently to text; never auto-open a mic. Not Pro / past the 12-Q taste → openPaywall, never a silent failed call. Any .modal-overlay.show up → listening refuses (L3505).
Feasible-now vs. spike-gated. Now: Tier 1 text — but note its content is "frame the existing explanation + capped authored chips," not generated prose (local, $0, harness-safe). Spike-gated together: Tier 2 + Tier 3 (both depend on the same unproven on-device capability; plus the half-duplex loop timing, the all-events channel router, the key-holding proxy, per-state grounding scope).
Look in brand. Lane wears the app's existing green→cyan voice family (#10d98c → #13b3c9, mic ring rgba(39,211,168,.55), CSS L949/956/965) — the "voice = go" color that already reads questions aloud — never the wordmark teal (logo only) and never the answer mic's red (red = "you are answering A/B/C/D," exclusively). Signature glyph: a small filled #13b3c9 soundwave/speech-dot, never 🎤. Namespace cx- (grep-clean). The shipped light theme already maps .vx green→cyan to #4E7A3F→#6E7A4F; every cx- rule reuses those exact light mappings so Lane is never a blue island in either mode.
States (all via a cx-status line at the top of cx-lane; never a new setMicState class on #mic-btn):
cx-wave, status "Listening…", interim words stream italic; haptic('light') on press-in; no earcon (avoid startle).micBreath), status "Thinking…", ≤300 ms to first acknowledgement.haptic('light') at first token; one soft earcon only if soundOn.cx-lane: 11 px teal, <i class="ti ti-shield-check">, e.g. "AZ question bank · Disability income," scoped to the loaded STATE_FILE. Only AZ (questions.json) + Texas, Florida, California, New York, North Carolina have live banks (L1756–1762); every other state falls back to questions.json (L1765) — so the chip names the actual loaded corpus, never the user's selected-but-unbacked state, and never implies authority it lacks.advanceSeconds re-arms (or waits for Next); soft "settle" earcon only if soundOn.Motion / earcons. Breathing-slow only — no bounce, no celebratory chimes. Earcons suppressed while listening; chime only on "done" and only when sound is already on. All haptics honor the Haptics toggle.
Accessibility. Lane's streaming target is aria-hidden while streaming; the finalized string is announced once into an aria-live="polite" node (no per-token flood). Status words stay out of the assertive #feedback-result. "Ask Lane" is a real <button> with aria-expanded; PTT exposes aria-pressed. A tap-to-toggle voice variant (tap start / tap stop) is offered for users who can speak but not sustain a press (under prefers-reduced-motion or an explicit preference), routed through the same startCoachListening() window; hold stays default but is never the only route to voice. The chip sets touch-action:none; user-select:none; -webkit-touch-callout:none and suppresses contextmenu so mobile-Safari long-press won't fire selection. Focus order: result → explanation → Lane → Next. Hit targets ≥44 px. Color is never the sole signal — every state carries a status word.
Mockup frames (render inline alongside this spec): frame A (resting Phase-1: existing explanation, source chip, capped chips, demoted mic — no duplicated prose); frame B (engaged: "Paused — reading," Lane's chip-answer below the explanation in its own 30vh scroller, Next pinned); frame C (the Home ladder: tap=default, hold=spike-gated, talk=Pro).
feedback after a correct answerclearAdvanceTimer(). Stay 'feedback'. Do not reprint q.explanation.feedback after an incorrect answerq.explanation. Per-wrong-choice "Why not C?" is NOT Phase 1 — no per-distractor field exists (OQ2).revealUnanswered (silence / "I don't know")mode_select / HOMEconsecutiveNonAnswers ≥ 6 → "still there?")pauseSession already stopped the loopreading (TTS live)listening (answer mic hot, 5s/12s)speechReady/partialResults stay answer-only here (§5).advancing / the async _next gapisExam, timer ticking)startCoachListening() re-checks isExam at its own entry (L3501 guards only startListening).modal-overlay.showmicManuallyOff stickyTied to run-1 Crawl/Walk/Run, easiest → richest, least risk first.
#feedback-bar, scoped to existing content. Sibling cx-lane beneath #feedback-expl carrying the source chip + "Ask Lane" + 2–3 capped clarifier chips served from the single explanation field — explicitly not a regenerated elaboration (no second corpus exists). Plus the 30vh containment, voice-bar collapse, the autoadvance invariant (passive chip = no freeze; engaged = clearAdvanceTimer()), the demoted mic, and the corpus-accurate source chip. Fully local, $0, one safe seam. Ship the honest privacy story (text path sends nothing). Run voice-sandbox/harness.js green before and after; QA the autoadvance invariant and the no-duplication rule. This alone is ~80% of the value at near-zero risk.startCoachListening() with its own constants + guards + the all-events channel router (answerListening/cxListening) branching partialResults, speechReady, and every other single-bind speech handler (§5). Only after the spike proves on-device (or ships with truthful system-speech-service consent).speechReady), tap-to-stop, the proxy, per-state grounding. Do not build before Phase 1 is verified.Prerequisite before any wiring: lock the name. User-facing = "Lane"; CSS/code = cx- (both grep-clean). The shipped silence path keeps its internal Coach Reveal/coachCtx name (L2903/L3628) — branding the AI "Coach" would collide in the exact functions (revealUnanswered L2902–2951) that write the feedback region. UI copy leads with the verb ("Why?", "Hold to ask") since "Lane" reads oddly next to the "PassLane" wordmark.
Content prerequisite for Phase 1 (new, from the schema reality): decide and produce the capped clarifier chip set. The single explanation field supports a generic "Put it simply / Show an example / When does this apply?" set; if even those need authored, judged, human-passed copy beyond restating the sentence, that authoring is a Phase-1 content task, not wiring. If no authored clarifiers are funded for v1, Phase 1 ships source chip + "Ask Lane" disabled-with-waitlist rather than chips that merely echo q.explanation.
Only the genuinely unresolved:
explanation blob per question (verified keys: id/mode/category/question/choices/correct/explanation/difficulty — no per-wrong-choice field). Per-choice rebuttals need authored + judged + human-passed copy for every (question, wrong-choice) pair across 323 AZ Qs (and each state bank) — a real content cost. Phase 1 ships only the generic clarifier set; per-choice chips are a corpus-schema milestone.* Decide the schema (rebuttals: {B: "...", C: "..."}?) + production budget.Relevant files: /Users/arizona/CLAUDE CODE/passlane/app/index.html (the single 287 KB app; all line refs above) and /Users/arizona/CLAUDE CODE/passlane/app/questions.json (323-Q AZ default bank; one explanation per question — the content fact behind §1/§3/§8/§9). The frozen voice harness that must stay green: /Users/arizona/CLAUDE CODE/passlane/voice-sandbox/harness.js.