ElevenLabs - Prism

Configuration

'elevenlabs' => [
    'api_key' => env('ELEVENLABS_API_KEY', ''),
    'url' => env('ELEVENLABS_URL', 'https://api.elevenlabs.io/v1/'),
]

Speech-to-Text

ElevenLabs provides speech-to-text through their Scribe model with support for diarization and audio event tagging.

Basic Usage

use Prism\Prism\Facades\Prism;
use Prism\Prism\ValueObjects\Media\Audio;

$audioFile = Audio::fromPath('/path/to/recording.mp3');

$response = Prism::audio()
    ->using('elevenlabs', 'scribe_v1')
    ->withInput($audioFile)
    ->asText();

echo $response->text;

Provider-Specific Options

Language Detection

Specify the language code for better transcription accuracy:

$response = Prism::audio()
    ->using('elevenlabs', 'scribe_v1')
    ->withInput($audioFile)
    ->withProviderOptions([
        'language_code' => 'en',
    ])
    ->asText();

Speaker Diarization

ElevenLabs can identify and separate different speakers in the audio:

$response = Prism::audio()
    ->using('elevenlabs', 'scribe_v1')
    ->withInput($audioFile)
    ->withProviderOptions([
        'diarize' => true,
        'num_speakers' => 2,
    ])
    ->asText();

// Access speaker information
$segments = $response->additionalContent['segments'] ?? [];
foreach ($segments as $segment) {
    echo "Speaker {$segment['speaker']}: {$segment['text']}\n";
}

Audio Event Tagging

Detect non-speech audio events like laughter, applause, or background noise:

$response = Prism::audio()
    ->using('elevenlabs', 'scribe_v1')
    ->withInput($audioFile)
    ->withProviderOptions([
        'tag_audio_events' => true,
    ])
    ->asText();

// Events are included in the transcription
echo $response->text;
// Example: "Hello [LAUGHTER] how are you? [APPLAUSE]"

Use Cases

Meeting Transcription with Speaker Identification

$meetingAudio = Audio::fromPath('/path/to/meeting.mp3');

$response = Prism::audio()
    ->using('elevenlabs', 'scribe_v1')
    ->withInput($meetingAudio)
    ->withProviderOptions([
        'diarize' => true,
        'num_speakers' => 4,
        'language_code' => 'en',
        'tag_audio_events' => true,
    ])
    ->asText();

// Process segments with speaker labels
$segments = $response->additionalContent['segments'] ?? [];
foreach ($segments as $segment) {
    echo "[Speaker {$segment['speaker']}] {$segment['text']}\n";
}

Podcast Transcription

$podcastAudio = Audio::fromUrl('https://example.com/podcast.mp3');

$response = Prism::audio()
    ->using('elevenlabs', 'scribe_v1')
    ->withInput($podcastAudio)
    ->withProviderOptions([
        'diarize' => true,
        'num_speakers' => 2,  // Host and guest
        'tag_audio_events' => true,  // Capture laughter, music, etc.
    ])
    ->asText();

Interview Transcription

$interviewAudio = Audio::fromPath('/path/to/interview.wav');

$response = Prism::audio()
    ->using('elevenlabs', 'scribe_v1')
    ->withInput($interviewAudio)
    ->withProviderOptions([
        'diarize' => true,
        'num_speakers' => 2,
        'language_code' => 'en',
    ])
    ->asText();

// Generate formatted transcript
$segments = $response->additionalContent['segments'] ?? [];
$speakers = ['Interviewer', 'Guest'];

foreach ($segments as $segment) {
    $speakerLabel = $speakers[$segment['speaker'] - 1] ?? "Speaker {$segment['speaker']}";
    echo "{$speakerLabel}: {$segment['text']}\n\n";
}

Audio File Handling

Supported Formats

ElevenLabs Scribe supports various audio formats:

use Prism\Prism\ValueObjects\Media\Audio;

// From local file path
$audio = Audio::fromPath('/path/to/audio.mp3');
$audio = Audio::fromPath('/path/to/audio.wav');
$audio = Audio::fromPath('/path/to/audio.m4a');

// From remote URL
$audio = Audio::fromUrl('https://example.com/recording.mp3');

// From base64 encoded data
$audio = Audio::fromBase64($base64AudioData, 'audio/mpeg');

// From binary content
$audioContent = file_get_contents('/path/to/audio.wav');
$audio = Audio::fromContent($audioContent, 'audio/wav');

Features

✅ Speech-to-Text with high accuracy
✅ Speaker Diarization (identify multiple speakers)
✅ Audio Event Tagging (detect non-speech sounds)
✅ Multi-language support
❌ Text-to-Speech (not yet implemented)

Best Practices

For Best Diarization Results

Ensure clear audio quality
Minimize background noise
Specify the correct number of speakers
Use a sample rate of at least 16kHz

For Accurate Transcription

Use the correct language code
Ensure good audio quality (clear speech, minimal noise)
Use appropriate audio format (WAV or high-quality MP3)
For long recordings, consider splitting into segments

Limitations

Text-to-Speech

ElevenLabs text-to-speech is not yet implemented in Prism. Use OpenAI or Groq for TTS functionality.

File Size

Check ElevenLabs documentation for current file size limits when processing audio files.

Documentation Index

​Configuration

​Speech-to-Text

​Basic Usage

​Provider-Specific Options

​Language Detection

​Speaker Diarization

​Audio Event Tagging

​Use Cases

​Meeting Transcription with Speaker Identification

​Podcast Transcription

​Interview Transcription

​Audio File Handling

​Supported Formats

​Features

​Best Practices

​For Best Diarization Results

​For Accurate Transcription

​Limitations

​Text-to-Speech

​File Size