Documentation Index
Fetch the complete documentation index at: https://mintlify.com/prism-php/prism/llms.txt
Use this file to discover all available pages before exploring further.
Configuration
'elevenlabs' => [
'api_key' => env('ELEVENLABS_API_KEY', ''),
'url' => env('ELEVENLABS_URL', 'https://api.elevenlabs.io/v1/'),
]
Speech-to-Text
ElevenLabs provides speech-to-text through their Scribe model with support for diarization and audio event tagging.
Basic Usage
use Prism\Prism\Facades\Prism;
use Prism\Prism\ValueObjects\Media\Audio;
$audioFile = Audio::fromPath('/path/to/recording.mp3');
$response = Prism::audio()
->using('elevenlabs', 'scribe_v1')
->withInput($audioFile)
->asText();
echo $response->text;
Provider-Specific Options
Language Detection
Specify the language code for better transcription accuracy:
$response = Prism::audio()
->using('elevenlabs', 'scribe_v1')
->withInput($audioFile)
->withProviderOptions([
'language_code' => 'en',
])
->asText();
Speaker Diarization
ElevenLabs can identify and separate different speakers in the audio:
$response = Prism::audio()
->using('elevenlabs', 'scribe_v1')
->withInput($audioFile)
->withProviderOptions([
'diarize' => true,
'num_speakers' => 2,
])
->asText();
// Access speaker information
$segments = $response->additionalContent['segments'] ?? [];
foreach ($segments as $segment) {
echo "Speaker {$segment['speaker']}: {$segment['text']}\n";
}
Audio Event Tagging
Detect non-speech audio events like laughter, applause, or background noise:
$response = Prism::audio()
->using('elevenlabs', 'scribe_v1')
->withInput($audioFile)
->withProviderOptions([
'tag_audio_events' => true,
])
->asText();
// Events are included in the transcription
echo $response->text;
// Example: "Hello [LAUGHTER] how are you? [APPLAUSE]"
Use Cases
Meeting Transcription with Speaker Identification
$meetingAudio = Audio::fromPath('/path/to/meeting.mp3');
$response = Prism::audio()
->using('elevenlabs', 'scribe_v1')
->withInput($meetingAudio)
->withProviderOptions([
'diarize' => true,
'num_speakers' => 4,
'language_code' => 'en',
'tag_audio_events' => true,
])
->asText();
// Process segments with speaker labels
$segments = $response->additionalContent['segments'] ?? [];
foreach ($segments as $segment) {
echo "[Speaker {$segment['speaker']}] {$segment['text']}\n";
}
Podcast Transcription
$podcastAudio = Audio::fromUrl('https://example.com/podcast.mp3');
$response = Prism::audio()
->using('elevenlabs', 'scribe_v1')
->withInput($podcastAudio)
->withProviderOptions([
'diarize' => true,
'num_speakers' => 2, // Host and guest
'tag_audio_events' => true, // Capture laughter, music, etc.
])
->asText();
Interview Transcription
$interviewAudio = Audio::fromPath('/path/to/interview.wav');
$response = Prism::audio()
->using('elevenlabs', 'scribe_v1')
->withInput($interviewAudio)
->withProviderOptions([
'diarize' => true,
'num_speakers' => 2,
'language_code' => 'en',
])
->asText();
// Generate formatted transcript
$segments = $response->additionalContent['segments'] ?? [];
$speakers = ['Interviewer', 'Guest'];
foreach ($segments as $segment) {
$speakerLabel = $speakers[$segment['speaker'] - 1] ?? "Speaker {$segment['speaker']}";
echo "{$speakerLabel}: {$segment['text']}\n\n";
}
Audio File Handling
ElevenLabs Scribe supports various audio formats:
use Prism\Prism\ValueObjects\Media\Audio;
// From local file path
$audio = Audio::fromPath('/path/to/audio.mp3');
$audio = Audio::fromPath('/path/to/audio.wav');
$audio = Audio::fromPath('/path/to/audio.m4a');
// From remote URL
$audio = Audio::fromUrl('https://example.com/recording.mp3');
// From base64 encoded data
$audio = Audio::fromBase64($base64AudioData, 'audio/mpeg');
// From binary content
$audioContent = file_get_contents('/path/to/audio.wav');
$audio = Audio::fromContent($audioContent, 'audio/wav');
Features
- ✅ Speech-to-Text with high accuracy
- ✅ Speaker Diarization (identify multiple speakers)
- ✅ Audio Event Tagging (detect non-speech sounds)
- ✅ Multi-language support
- ❌ Text-to-Speech (not yet implemented)
Best Practices
For Best Diarization Results
- Ensure clear audio quality
- Minimize background noise
- Specify the correct number of speakers
- Use a sample rate of at least 16kHz
For Accurate Transcription
- Use the correct language code
- Ensure good audio quality (clear speech, minimal noise)
- Use appropriate audio format (WAV or high-quality MP3)
- For long recordings, consider splitting into segments
Limitations
Text-to-Speech
ElevenLabs text-to-speech is not yet implemented in Prism. Use OpenAI or Groq for TTS functionality.
File Size
Check ElevenLabs documentation for current file size limits when processing audio files.