ElevenLabs streaming text-to-speech via WebSocket. Sends text chunks and receives audio data in real time, pushing TTSAudioRawFrame as chunks arrive.