PassLane “Lane” — Interaction & Wiring Spec
Run 2 · code-grounded to the real app · plan only · 2026-06-19

PassLane AI Coach ("Lane") — In-App Interaction & Wiring Spec

Status: definitive. Verified live against /Users/arizona/CLAUDE CODE/passlane/app/index.html (the single 287 KB app) and /Users/arizona/CLAUDE CODE/passlane/app/questions.json (323-question AZ default bank) this session. Two facts decided by code, against which the team split:

  1. The speech engine is cloud-backed today. L3554 calls speechPlugin.start({ language:'en-US', maxResults:5, partialResults:true, popup:false }) with no requiresOnDeviceRecognition / offline flag (grep clean). The designs that corrected for this are right; "audio never leaves the phone" is overruled as a launch claim.
  2. There is exactly one explanation string per question, and it is already on screen. Schema = id/mode/category/question/choices/correct/explanation/difficulty; feedback-expl.textContent = q.explanation at L2927 (revealUnanswered) and L3003 (submitAnswer). No second "elaboration" corpus exists. Therefore Phase 1 Lane does not generate or auto-open new prose — it frames the explanation already rendered (source chip + capped ask), or it duplicates the sentence on screen, which we forbid. This is the one change from earlier drafts and it ripples through §1, §3, §8, §9.

Every wiring claim below is code-grounded to a line.


1. The Recommendation

Ship all three modalities as one strict ladder with one default — never a menu.

The default, and the entire first release, is a silent TEXT moment that wraps the explanation already shown in the feedback card. The grounded "why" sentence (q.explanation) renders today at L2927/L3003; Phase 1 Lane adds, as siblings beneath it, (a) a source chip naming the actual corpus and (b) a calm "Ask Lane" affordance offering 2–3 capped, pre-authored follow-up chips drawn from the same single explanation field. Lane does not reprint or paraphrase the on-screen sentence — duplication is the failure mode the content reality forces us to design around, so Lane's job in Phase 1 is navigation and provenance of existing grounded text, plus capped clarifiers, not a second essay.

Press-and-hold that same "Ask Lane" affordance = push-to-talk (eyes-busy tier), and "Talk it through" is a true hands-free conversation as the deepest opt-in. Free = text only and genuinely transmits nothing; all voice (push-to-talk and hands-free) is Pro + consent and is spike-gated together, because the recognizer is cloud-backed today — so "on-device, sends nothing" is removed as a launch claim until a spike proves requiresOnDeviceRecognition=true on iOS and an offline pack on Android.

Why this is still ~80% of the value at near-zero risk: the highest-intent beat (the moment right after answering) gains provenance ("this came from the AZ bank, Disability income"), a path to capped clarifiers, and a single front door for the later voice ladder — all local, all $0, all without touching the frozen answer loop. Richer authored elaboration and per-distractor rebuttals are real content milestones (§9), not Phase-1 wiring.


2. Interaction Principles


3. The Modes, Defined Precisely

Tier 1 — TEXT "Ask Lane" (DEFAULT · ships first · Plane A · transmits nothing).

On feedback, the grounded sentence renders as it does today in #feedback-expl. Phase 1 Lane adds, in a sibling cx-lane block directly beneath it:

Lane must not restate the sentence already in #feedback-expl. Works in silence, in public, with a denied mic, offline, for free. This is the only tier in the first release.

Tier 2 — PUSH-TO-TALK "Hold to ask" (Pro + consent · spike-gated).

Press-and-hold the same "Ask Lane" control; release endpoints. A separate startCoachListening() window streams interim text into a Lane-owned transcript node; on release the finalized transcript shows ("here's what I heard"), then matches the grounded bank. Available whether or not read-aloud is on (the commuter it targets studies silent by default). Until the STT spike proves on-device, the press-in disclosure says the system speech service is used; no "sends nothing" copy here.

Tier 3 — HANDS-FREE "Talk it through" (Pro + consent · the only transmitting tier).

A deliberately-entered, half-duplex spoken conversation between questions. The only path that sends data — text, never audio, and only after the consent gate. Entered from Home, never auto-launched mid-study.

Default for a new user = read the explanation, optionally tap a capped chip. Default for an eyes-busy user = hold-to-ask. Live conversation is a chosen destination, never the front door.


4. Screen-by-Screen Wiring

Mapped to the real screenshots; every gate already exists in code.

HOME / #screen-mode (home.png, readiness.png) — config, untimed.

The #vx-hf → #vx-reveal → #vx-voice ladder (L1331–1352) stays as read aloud → answer aloud. Tier 3 does NOT nest as a third fold inside it.vx-reveal is hard-capped at max-height:120px (L968, verified), which clips a third row, and a Pro upsell hidden until you flip an unrelated free toggle converts poorly. Instead, "Talk it through" is its own always-visible cx- row directly beneath the Hands-Free card, carrying a PRO chip → openPaywall('voice'|'listen') for free users, consent gate for Pro. The readiness/weak-topic chips gain "Drill this with Lane." Onboarding + the Plane-B consent toggle live in the Voice & Pacing accordion — never mid-session.

ANSWERING — reading / listening (voice-listen.png) — most timing-sensitive.

Lane is suppressed here. reading = TTS live + mic about to open; listening = the hot 5s/12s answer window tuned for "A/B/C/D." The bottom #mic-btn keeps meaning "answer," and processVoiceMatches (L3784) already early-returns true in feedback and owns a dense answer+command vocab — a conversational utterance misfires there. No 4th top-bar icon (the .hdr-actions 🎙/🔊/⏸ cluster is at its 380 px width limit). Do not build a "stuck detector" that speaks during listening; the existing silence / "I don't know" → revealUnanswered already lands in feedback one beat later, where Lane's text is allowed for free.

FEEDBACK reveal — #feedback-bar (explained.png) — the one in-loop seam.

Insert cx-lane (containing the source chip + "Ask Lane" + capped chips) as new flex children of .feedback-bar, SIBLINGS, between #feedback-expl (L1633) and .feedback-next (L1634). Never children of #feedback-expl — both revealUnanswered (L2927) and submitAnswer (L3003) do feedback-expl.textContent = q.explanation, which destroys any appended children every question. Tear cx-lane down in displayQuestion() so stale output never bleeds to the next card.

HANDS-FREE mode — "Talk it through" (Tier 3).

No separate screen (ChatGPT + Gemini abandoned the full-screen orb Nov 2025; PassLane's deleted Drive Mode was right to delete). The question card stays on screen as the anchor; Lane's listening/thinking/answer all render in one region (cx-lane + a small cx-status line), not split between the bottom bar and mid-screen. The two-sided transcript also appends as reviewable text.

EXAM — isExam=true (exam.png) + RESULTS.

Hard-disabled in-loop. No #feedback-bar exists in exam; startListening early-returns on isExam (L3501). startCoachListening() must re-implement that guard at its own entry — never assume it inherits the chokepoint. The EXAM RESULTS screen (untimed) is a legitimate right-moment: "Review Missed" → Lane-assisted review of the weakest section.

Dedicated Coach surface: none. Lane lives entirely inside #feedback-bar, the Home row, Voice & Pacing, and #screen-terms. Building a Coach screen would resurrect the orb the industry just abandoned.


5. Voice Mechanics

On-device mic→text — design intent vs. shipping reality. Intent: audio never leaves the device, only Tier-3 text ever transmits. Not true in code today — the frozen start() (L3554) carries no on-device flag, so the answer mic already round-trips audio to Apple/Google. Therefore: the answer mic's existing transmission gets disclosed in the same pass, and "on-device / sends nothing" copy is forbidden until a gated pre-Phase-2 spike proves requiresOnDeviceRecognition=true (iOS) + offline pack/EXTRA_PREFER_OFFLINE (Android) on real devices — almost certainly a plugin fork. Tier 1's text path is genuinely 100% local (it ships no audio at all) and is the honest launch privacy story.

ASK vs. the frozen answer window — non-negotiable isolation. The answer window (VOICE_SILENCE_BUDGET_MS=5000, VOICE_HARD_CAP_MS=12000, cut from 14s to stop reopen-cycling) is frozen. Lane ASK is a physically separate control + a separate startCoachListening() entry + separate constants (CX_HARD_CAP_MS ≈ 30000; Tier-3 sliding silence ≈ 1200–1500ms for thinking pauses), in a clearly separate block so harness.js stays provably untouched. Reuse only NATIVE_RESTART_COOLDOWN_MS=400 on every stop→start.

The shared single-bind event problem — route EVERY shared event, not just partialResults (the deep one). The plugin binds its listeners once (speechPlugin.addListener(...)), and those handlers self-guard on appState. But appState is 'feedback' for both the autoadvance and a Tier-3 listen, so appState cannot disambiguate Lane from the answer path. Two such single-bind handlers exist and both must branch on an explicit posture flag (answerListening vs cxListening), not on appState:

The governing rule: every speech event the plugin binds once — partialResults, speechReady, and any onError/onEnd/restart handler that reads appState or calls noteVoiceActivity()/the answer haptic — gets an explicit answerListening/cxListening branch at its top. When cxListening is false, all of them are byte-for-byte the current answer behavior, so harness.js stays green.

The hands-free turn loop (half-duplex, explicit turns). read question → answer as today → on feedback first call clearAdvanceTimer() (in hands-free nobody taps, so the loop itself must cancel the countdown) → Lane speaks the "why" (the grounded sentence) → fire the Lane-owned "your turn" haptic+earcon once (NOT the answer-path speechReady cue — see above) → open cx-listen → user asks → endpoint on ~1.0–1.5s trailing silence or ~30s cap → "thinking" earcon ≤300ms → Lane answers → hold the card until Lane is silent, then re-arm the Lane cue once and re-arm advance via the user's advanceSeconds. No wake word in-session. No barge-in over Lane's first answer.

Barge-in. Tap-to-stop is the guaranteed path on a defined hit-target set (Lane control / dedicated stop = halt; "Next →" = halt AND advance so a user who wanted to move on isn't stranded; disabled choices inert). Voice "stop" (already in the ≤2-word vocab) is a bonus — voice barge-in is documented-fragile (Gemini needs a toggle; ChatGPT's tap-to-interrupt once froze input). Every stop→listen honors the 400 ms cooldown or it reproduces the CANCELLED-bail storm.

Every fallback. STT empty/garbled/offline → show partial transcript + "tap to type instead" (focus a minimal cx- field, reusing the #fb-text idiom) — never a dead end. Mic denied/sticky (micManuallyOff/voicePermDenied) → degrade silently to text; never auto-open a mic. Not Pro / past the 12-Q taste → openPaywall, never a silent failed call. Any .modal-overlay.show up → listening refuses (L3505).

Feasible-now vs. spike-gated. Now: Tier 1 text — but note its content is "frame the existing explanation + capped authored chips," not generated prose (local, $0, harness-safe). Spike-gated together: Tier 2 + Tier 3 (both depend on the same unproven on-device capability; plus the half-duplex loop timing, the all-events channel router, the key-holding proxy, per-state grounding scope).


6. Visual & State Design

Look in brand. Lane wears the app's existing green→cyan voice family (#10d98c → #13b3c9, mic ring rgba(39,211,168,.55), CSS L949/956/965) — the "voice = go" color that already reads questions aloud — never the wordmark teal (logo only) and never the answer mic's red (red = "you are answering A/B/C/D," exclusively). Signature glyph: a small filled #13b3c9 soundwave/speech-dot, never 🎤. Namespace cx- (grep-clean). The shipped light theme already maps .vx green→cyan to #4E7A3F→#6E7A4F; every cx- rule reuses those exact light mappings so Lane is never a blue island in either mode.

States (all via a cx-status line at the top of cx-lane; never a new setMicState class on #mic-btn):

Motion / earcons. Breathing-slow only — no bounce, no celebratory chimes. Earcons suppressed while listening; chime only on "done" and only when sound is already on. All haptics honor the Haptics toggle.

Accessibility. Lane's streaming target is aria-hidden while streaming; the finalized string is announced once into an aria-live="polite" node (no per-token flood). Status words stay out of the assertive #feedback-result. "Ask Lane" is a real <button> with aria-expanded; PTT exposes aria-pressed. A tap-to-toggle voice variant (tap start / tap stop) is offered for users who can speak but not sustain a press (under prefers-reduced-motion or an explicit preference), routed through the same startCoachListening() window; hold stays default but is never the only route to voice. The chip sets touch-action:none; user-select:none; -webkit-touch-callout:none and suppresses contextmenu so mobile-Safari long-press won't fire selection. Focus order: result → explanation → Lane → Next. Hit targets ≥44 px. Color is never the sole signal — every state carries a status word.

Mockup frames (render inline alongside this spec): frame A (resting Phase-1: existing explanation, source chip, capped chips, demoted mic — no duplicated prose); frame B (engaged: "Paused — reading," Lane's chip-answer below the explanation in its own 30vh scroller, Next pinned); frame C (the Home ladder: tap=default, hold=spike-gated, talk=Pro).


7. Right-Place / Wrong-Place Map

feedback after a correct answer
Should Lane appear? Yes — source chip + "Ask Lane" + capped chips beneath the existing explanation
Wiring rule Source chip is passive; a chip tap / press-in calls clearAdvanceTimer(). Stay 'feedback'. Do not reprint q.explanation.
feedback after an incorrect answer
Should Lane appear? Yes — same affordances; the existing line already leads with the correct answer (L2898)
Wiring rule Phase 1 ships ONE generic "Why?"/clarifier set on q.explanation. Per-wrong-choice "Why not C?" is NOT Phase 1 — no per-distractor field exists (OQ2).
After revealUnanswered (silence / "I don't know")
Should Lane appear? Yes — same affordances
Wiring rule Already lands in feedback (L2927); text allowed
mode_select / HOME
Should Lane appear? Yes — config, "Talk it through" row, "drill weak topic"
Wiring rule Untimed; no live mic
EXAM RESULTS (post-exam)
Should Lane appear? Yes — "Review Missed" handoff
Wiring rule Exam over; hard-disable lifted
Voice & Pacing / Settings
Should Lane appear? Yes — verbosity + Plane-B consent
Wiring rule Untimed config only
Idle guardrail (consecutiveNonAnswers ≥ 6 → "still there?")
Should Lane appear? One calm line, never nag
Wiring rule Only because pauseSession already stopped the loop
reading (TTS live)
Should Lane appear? No
Wiring rule Collides with speech + imminent mic open
listening (answer mic hot, 5s/12s)
Should Lane appear? No
Wiring rule Corrupts the tuned silence clock; reopen-storm. Note speechReady/partialResults stay answer-only here (§5).
advancing / the async _next gap
Should Lane appear? No
Wiring rule State locked; double-advance risk
During any exam (isExam, timer ticking)
Should Lane appear? No (hard wall)
Wiring rule startCoachListening() re-checks isExam at its own entry (L3501 guards only startListening)
Behind any .modal-overlay.show
Should Lane appear? No
Wiring rule Listening refuses (L3505)
Mic denied / micManuallyOff sticky
Should Lane appear? No mic — text only
Wiring rule Never auto-open a silenced mic
Free user past 12-Q taste / non-Pro for live
Should Lane appear? No silent call → paywall
Wiring rule `openPaywall('voice'
'listen')`

8. Build Order

Tied to run-1 Crawl/Walk/Run, easiest → richest, least risk first.

Prerequisite before any wiring: lock the name. User-facing = "Lane"; CSS/code = cx- (both grep-clean). The shipped silence path keeps its internal Coach Reveal/coachCtx name (L2903/L3628) — branding the AI "Coach" would collide in the exact functions (revealUnanswered L2902–2951) that write the feedback region. UI copy leads with the verb ("Why?", "Hold to ask") since "Lane" reads oddly next to the "PassLane" wordmark.

Content prerequisite for Phase 1 (new, from the schema reality): decide and produce the capped clarifier chip set. The single explanation field supports a generic "Put it simply / Show an example / When does this apply?" set; if even those need authored, judged, human-passed copy beyond restating the sentence, that authoring is a Phase-1 content task, not wiring. If no authored clarifiers are funded for v1, Phase 1 ships source chip + "Ask Lane" disabled-with-waitlist rather than chips that merely echo q.explanation.


9. Open Questions

Only the genuinely unresolved:

  1. *On-device STT feasibility (the one true voice blocker). Can the frozen Capacitor plugin be configured on-device on both* iOS and Android, or does it require a fork / a separate Lane-only recognizer? This gates the entire voice ladder (Tiers 2 + 3) and every "sends nothing" claim. Resolve with the Phase-1.5 spike on real devices before any voice copy ships.
  2. *Per-distractor corpus (the Phase-1 content blocker for "Why not B?"). The bank stores one explanation blob per question (verified keys: id/mode/category/question/choices/correct/explanation/difficulty — no per-wrong-choice field). Per-choice rebuttals need authored + judged + human-passed copy for every (question, wrong-choice) pair across 323 AZ Qs (and each state bank) — a real content cost. Phase 1 ships only the generic clarifier set; per-choice chips are a corpus-schema milestone.* Decide the schema (rebuttals: {B: "...", C: "..."}?) + production budget.
  3. Will the capped clarifiers add value beyond the on-screen sentence? Because the only grounded text is the sentence already displayed, "Show an example"/"Put it simply" must be authored to be non-redundant. Decide v1 scope: (a) author a small clarifier set now, or (b) ship Lane as source-chip-only provenance + voice front-door and defer chips. Cheap to settle with 2–3 sample questions.
  4. Persona name vs. product name. Does user-facing "Lane" inside "PassLane" read as "ask a tutor" or "ask the app"? Cheap to settle: pressure-test "Hold to ask Lane" on the real screen with 2–3 users before code.
  5. Tier-3 endpoint tuning in road noise. The ~1.2–1.5s trailing-silence endpoint may cut off a thinking learner mid-sentence in exactly the commute case. Pilot on real devices and tune from data; the app's terms already (correctly) state it is not a driving app.

Relevant files: /Users/arizona/CLAUDE CODE/passlane/app/index.html (the single 287 KB app; all line refs above) and /Users/arizona/CLAUDE CODE/passlane/app/questions.json (323-Q AZ default bank; one explanation per question — the content fact behind §1/§3/§8/§9). The frozen voice harness that must stay green: /Users/arizona/CLAUDE CODE/passlane/voice-sandbox/harness.js.

Private working document — unlisted, not indexed. PassLane / Somos LLC.