> speech
Use when implementing speech-to-text, live transcription, or audio transcription. Covers SpeechAnalyzer (iOS 26+), SpeechTranscriber, volatile/finalized results, AssetInventory model management, audio format handling.
curl "https://skillshub.wtf/CharlesWiltgen/Axiom/speech?format=md"Speech-to-Text with SpeechAnalyzer
Overview
SpeechAnalyzer is Apple's new speech-to-text API introduced in iOS 26. It powers Notes, Voice Memos, Journal, and Call Summarization. The on-device model is faster, more accurate, and better for long-form/distant audio than SFSpeechRecognizer.
Key principle: SpeechAnalyzer is modular—add transcription modules to an analysis session. Results stream asynchronously using Swift's AsyncSequence.
Decision Tree - SpeechAnalyzer vs SFSpeechRecognizer
Need speech-to-text?
├─ iOS 26+ only?
│ └─ Yes → SpeechAnalyzer (preferred)
├─ Need iOS 10-25 support?
│ └─ Yes → SFSpeechRecognizer (or DictationTranscriber)
├─ Long-form audio (meetings, lectures)?
│ └─ Yes → SpeechAnalyzer
├─ Distant audio (across room)?
│ └─ Yes → SpeechAnalyzer
└─ Short dictation commands?
└─ Either works
SpeechAnalyzer advantages:
- Better for long-form and conversational audio
- Works well with distant speakers (meetings)
- On-device, private
- Model managed by system (no app size increase)
- Powers Notes, Voice Memos, Journal
DictationTranscriber (iOS 26+): Same languages as SFSpeechRecognizer, but doesn't require user to enable Siri/dictation in Settings.
Red Flags
Use this skill when you see:
- "Live transcription"
- "Transcribe audio"
- "Speech-to-text"
- "SpeechAnalyzer" or "SpeechTranscriber"
- "Volatile results"
- Building Notes-like or Voice Memos-like features
Pattern 1 - File Transcription (Simplest)
Transcribe an audio file to text in one function.
import Speech
func transcribe(file: URL, locale: Locale) async throws -> AttributedString {
// Set up transcriber
let transcriber = SpeechTranscriber(
locale: locale,
preset: .offlineTranscription
)
// Collect results asynchronously
async let transcriptionFuture = try transcriber.results
.reduce(AttributedString()) { str, result in
str + result.text
}
// Set up analyzer with transcriber module
let analyzer = SpeechAnalyzer(modules: [transcriber])
// Analyze the file
if let lastSample = try await analyzer.analyzeSequence(from: file) {
try await analyzer.finalizeAndFinish(through: lastSample)
} else {
await analyzer.cancelAndFinishNow()
}
return try await transcriptionFuture
}
Key points:
analyzeSequence(from:)reads file and feeds audio to analyzerfinalizeAndFinish(through:)ensures all results are finalized- Results are
AttributedStringwith timing metadata
Pattern 2 - Live Transcription Setup
For real-time transcription from microphone.
Step 1 - Configure SpeechTranscriber
import Speech
class TranscriptionManager: ObservableObject {
private var transcriber: SpeechTranscriber?
private var analyzer: SpeechAnalyzer?
private var analyzerFormat: AudioFormatDescription?
private var inputBuilder: AsyncStream<AnalyzerInput>.Continuation?
@Published var finalizedTranscript = AttributedString()
@Published var volatileTranscript = AttributedString()
func setUp() async throws {
// Create transcriber with options
transcriber = SpeechTranscriber(
locale: Locale.current,
transcriptionOptions: [],
reportingOptions: [.volatileResults], // Enable real-time updates
attributeOptions: [.audioTimeRange] // Include timing
)
guard let transcriber else { throw TranscriptionError.setupFailed }
// Create analyzer with transcriber module
analyzer = SpeechAnalyzer(modules: [transcriber])
// Get required audio format
analyzerFormat = await SpeechAnalyzer.bestAvailableAudioFormat(
compatibleWith: [transcriber]
)
// Ensure model is available
try await ensureModel(for: transcriber)
// Create input stream
let (stream, builder) = AsyncStream<AnalyzerInput>.makeStream()
inputBuilder = builder
// Start analyzer
try await analyzer?.start(inputSequence: stream)
}
}
Step 2 - Ensure Model Availability
func ensureModel(for transcriber: SpeechTranscriber) async throws {
let locale = Locale.current
// Check if language is supported
let supported = await SpeechTranscriber.supportedLocales
guard supported.contains(where: {
$0.identifier(.bcp47) == locale.identifier(.bcp47)
}) else {
throw TranscriptionError.localeNotSupported
}
// Check if model is installed
let installed = await SpeechTranscriber.installedLocales
if installed.contains(where: {
$0.identifier(.bcp47) == locale.identifier(.bcp47)
}) {
return // Already installed
}
// Download model
if let downloader = try await AssetInventory.assetInstallationRequest(
supporting: [transcriber]
) {
// Track progress if needed
let progress = downloader.progress
try await downloader.downloadAndInstall()
}
}
Note: Models are stored in system storage, not app storage. Limited number of languages can be allocated at once.
Step 3 - Handle Results
func startResultHandling() {
Task {
guard let transcriber else { return }
do {
for try await result in transcriber.results {
let text = result.text
if result.isFinal {
// Finalized result - won't change
finalizedTranscript += text
volatileTranscript = AttributedString()
// Access timing info
for run in text.runs {
if let timeRange = run.audioTimeRange {
print("Time: \(timeRange)")
}
}
} else {
// Volatile result - will be replaced
volatileTranscript = text
}
}
} catch {
print("Transcription failed: \(error)")
}
}
}
Pattern 3 - Audio Recording and Streaming
Connect AVAudioEngine to SpeechAnalyzer.
import AVFoundation
class AudioRecorder {
private let audioEngine = AVAudioEngine()
private var outputContinuation: AsyncStream<AVAudioPCMBuffer>.Continuation?
private let transcriptionManager: TranscriptionManager
func startRecording() async throws {
// Request permission
guard await AVAudioApplication.requestRecordPermission() else {
throw RecordingError.permissionDenied
}
// Configure audio session (iOS)
#if os(iOS)
let session = AVAudioSession.sharedInstance()
try session.setCategory(.playAndRecord, mode: .spokenAudio)
try session.setActive(true, options: .notifyOthersOnDeactivation)
#endif
// Set up transcriber
try await transcriptionManager.setUp()
transcriptionManager.startResultHandling()
// Stream audio to transcriber
for await buffer in try audioStream() {
try await transcriptionManager.streamAudio(buffer)
}
}
private func audioStream() throws -> AsyncStream<AVAudioPCMBuffer> {
let inputNode = audioEngine.inputNode
let format = inputNode.outputFormat(forBus: 0)
inputNode.installTap(
onBus: 0,
bufferSize: 4096,
format: format
) { [weak self] buffer, time in
self?.outputContinuation?.yield(buffer)
}
audioEngine.prepare()
try audioEngine.start()
return AsyncStream { continuation in
outputContinuation = continuation
}
}
}
Stream Audio with Format Conversion
extension TranscriptionManager {
private var converter: AVAudioConverter?
func streamAudio(_ buffer: AVAudioPCMBuffer) async throws {
guard let inputBuilder, let analyzerFormat else {
throw TranscriptionError.notSetUp
}
// Convert to analyzer's required format
let converted = try convertBuffer(buffer, to: analyzerFormat)
// Send to analyzer
let input = AnalyzerInput(buffer: converted)
inputBuilder.yield(input)
}
private func convertBuffer(
_ buffer: AVAudioPCMBuffer,
to format: AudioFormatDescription
) throws -> AVAudioPCMBuffer {
// Lazy initialize converter
if converter == nil {
let sourceFormat = buffer.format
let destFormat = AVAudioFormat(cmAudioFormatDescription: format)!
converter = AVAudioConverter(from: sourceFormat, to: destFormat)
}
guard let converter else {
throw TranscriptionError.conversionFailed
}
let outputBuffer = AVAudioPCMBuffer(
pcmFormat: converter.outputFormat,
frameCapacity: buffer.frameLength
)!
try converter.convert(to: outputBuffer, from: buffer)
return outputBuffer
}
}
Pattern 4 - Stopping Transcription
Properly finalize to get remaining volatile results as finalized.
func stopRecording() async {
// Stop audio
audioEngine.stop()
audioEngine.inputNode.removeTap(onBus: 0)
outputContinuation?.finish()
// Finalize transcription (converts remaining volatile to final)
try? await analyzer?.finalizeAndFinishThroughEndOfInput()
// Cancel any pending tasks
recognizerTask?.cancel()
}
Critical: Always call finalizeAndFinishThroughEndOfInput() to ensure volatile results are finalized.
Pattern 5 - Model Asset Management
Check Supported Languages
// Languages the API supports
let supported = await SpeechTranscriber.supportedLocales
// Languages currently installed on device
let installed = await SpeechTranscriber.installedLocales
Deallocate Languages
Limited number of languages can be allocated. Deallocate unused ones.
func deallocateLanguages() async {
let allocated = await AssetInventory.allocatedLocales
for locale in allocated {
await AssetInventory.deallocate(locale: locale)
}
}
Pattern 6 - Displaying Results with Timing
Highlight text during audio playback using timing metadata.
struct TranscriptView: View {
let transcript: AttributedString
@Binding var playbackTime: CMTime
var body: some View {
Text(highlightedTranscript)
}
var highlightedTranscript: AttributedString {
var result = transcript
for (range, run) in transcript.runs {
guard let timeRange = run.audioTimeRange else { continue }
let isActive = timeRange.containsTime(playbackTime)
if isActive {
result[range].backgroundColor = .yellow
}
}
return result
}
}
Anti-Patterns
Don't - Forget to finalize
// BAD - volatile results lost
func stopRecording() {
audioEngine.stop()
// Missing finalize!
}
// GOOD - volatile results become finalized
func stopRecording() async {
audioEngine.stop()
try? await analyzer?.finalizeAndFinishThroughEndOfInput()
}
Don't - Ignore format conversion
// BAD - format mismatch may fail silently
inputBuilder.yield(AnalyzerInput(buffer: rawBuffer))
// GOOD - convert to analyzer's format
let format = await SpeechAnalyzer.bestAvailableAudioFormat(compatibleWith: [transcriber])
let converted = try convertBuffer(rawBuffer, to: format)
inputBuilder.yield(AnalyzerInput(buffer: converted))
Don't - Skip model availability check
// BAD - may crash if model not installed
let transcriber = SpeechTranscriber(locale: locale, ...)
// Start using immediately
// GOOD - ensure model is ready
let transcriber = SpeechTranscriber(locale: locale, ...)
try await ensureModel(for: transcriber)
// Now safe to use
Presets Reference
| Preset | Use Case |
|---|---|
.offlineTranscription | File transcription, no real-time feedback needed |
.progressiveLiveTranscription | Live transcription with volatile updates |
Options Reference
TranscriptionOptions
- Default: None (standard transcription)
ReportingOptions
.volatileResults: Enable real-time approximate results
AttributeOptions
.audioTimeRange: Include CMTimeRange for each text segment
Platform Availability
| Platform | SpeechTranscriber | DictationTranscriber |
|---|---|---|
| iOS 26+ | Yes | Yes |
| macOS Tahoe+ | Yes | Yes |
| watchOS 26+ | No | Yes |
| tvOS 26+ | TBD | TBD |
Hardware requirements: Varies by device. Use supportedLocales to check.
Integration with Apple Intelligence
Combine with Foundation Models for summarization:
import FoundationModels
func generateTitle(for transcript: String) async throws -> String {
let session = LanguageModelSession()
let prompt = "Generate a short, clever title for this story: \(transcript)"
let response = try await session.respond(to: prompt)
return response.content
}
See axiom-ios-ai skill for Foundation Models details.
Checklist
Before shipping speech-to-text:
- Check locale support with
supportedLocales - Ensure model with
AssetInventory.assetInstallationRequest - Handle download progress for user feedback
- Convert audio to
bestAvailableAudioFormat - Enable
.volatileResultsfor live transcription - Call
finalizeAndFinishThroughEndOfInput()on stop - Handle timing with
.audioTimeRangeif needed - Clear volatile results when finalized result arrives
- Request microphone permission before recording
Resources
WWDC: 2025-277
Docs: /speech, /speech/speechanalyzer, /speech/speechtranscriber
Skills: coreml (on-device ML), axiom-ios-ai (Foundation Models)
> related_skills --same-repo
> axiom-eventkit
Use when working with ANY calendar event, reminder, EventKit permission, or EventKitUI controller. Covers access tiers (no-access, write-only, full), permission migration from pre-iOS 17, store lifecycle, reminder patterns, EventKitUI controller selection, Siri Event Suggestions, virtual conference extensions.
> axiom-eventkit-ref
Use when needing EventKit API details — EKEventStore, EKEvent, EKReminder, EventKitUI view controllers, EKCalendarChooser, authorization methods, predicate-based fetching, recurrence rules, Siri Event Suggestions donation, EKVirtualConferenceProvider, location-based reminders, and EKErrorDomain codes
> axiom-contacts
Use when accessing ANY contact data, requesting Contacts permissions, choosing between picker and store access, implementing Contact Access Button, or migrating to iOS 18 limited access. Covers authorization levels, CNContactStore, ContactProvider, key fetching, incremental sync.
> axiom-contacts-ref
Use when needing Contacts API details — CNContactStore, CNMutableContact, CNSaveRequest, CNContactFormatter, CNContactVCardSerialization, CNContactPickerViewController, ContactAccessButton, contactAccessPicker, ContactProvider extension, CNChangeHistoryFetchRequest, contact key descriptors, and CNError codes