This is a submission for the AssemblyAI Voice Agents Challenge
What I Built
Introducing Rhetoric Arena, an immersive, voice-driven, AI-powered debate platform designed for solo warriors, curious thinkers, and anyone obsessed with persuasion, logic, or public speaking. In this arena, you don’t just talk to an AI; you face off against it.
Users choose a debate topic, take a stance, and select from 9 wildly different AI personas, each with a unique voice, personality, and argument style. Then the battle begins. The user speaks, AssemblyAI’s streaming transcription converts speech to text in real time, and a Groq or Gemini-powered AI opponent counters with wit, facts, charm, or fury - depending on the persona you chose.
Built using Flask, Python, JavaScript, and AssemblyAI’s Universal-Streaming API, this project fits the Domain Expert Voice Agent category. It enables users to build real-life communication and persuasion skills, adapt to intellectual pressure, and face the emotional intensity of an actual debate, no other human needed. Just a fact-obsessed, smart, and ruthless 24×7 AI companion 🤖🤺.
🎭🤺⚔️ AI Personas (Debate Opponents)
- 😏 Sassy Coach – Teases and roasts you into better yet frustating arguments
- 💥 Ruthless Veteran – Logical, cold, unforgiving
- 🌸 Sweet Friend – Supportive and constructive
- 🕊️ Sweet Angel – Always fair, but challenges weak logic
- ✨ Your Bestie – Fun, casual, empowering
- 💘 Charming Rival – Flirty and sarcastic
- 🧠 Objective AI – Emotionless, purely logical
- 📚 Strict Teacher – Grades you, gives feedback
- 🌌 Deep Philosopher – Questions everything
🌐 Technologies Used
Python + Flask - Powers the backend logic, routes, and manages the debate lifecycle.
AssemblyAI - Handles real-time and batch transcription with Universal-Streaming.
Groq API - Provides blazing-fast LLM responses for instant debate replies.
Google Gemini - Acts as a fallback AI model when Groq is slow or unavailable.
JavaScript - Controls audio streaming, recording, and interactive UI events.
HTML/CSS/Tailwind CSS - Craft the responsive front-end layout and animations to enhance user experience.
Why I Built It
Because I’m an introvert. A loner.
I struggle with expressing myself in real life, especially under pressure, confrontation, or arguments.
I’m the most senior one in my hostel right now. Sounds awesome, unless you’re me: stuck alone, not wanting to chat too much with juniors. But yup, I am alone.
I panic during interviews.
I can’t always practice speaking because… who would I even talk to?
Rhetoric Arena is my rebellion against silence.
I built it so I could train, grow, and evolve - all on my own terms. A 24×7 intelligent, responsive, and emotionally complex AI that pushes me, not just to talk it out, but to think, express, and defend.
It’s not just my personal solution anymore.
I believe it can help anyone who needs a fearless debate companion, a way to sharpen their voice and the will to speak boldly.
Demo
The project isn’t deployed yet, but I’ve recorded a full walkthrough video showcasing its power and flow. Check it out here:-
GitHub Repository
Here's my Github repo for this project:-
🎯 Rhetoric Arena
Speak your mind and challenge the AI, because only through resistance do your ideas sharpen and your voice truly grow.
An advanced, open-source debate platform featuring intelligent AI opponents with distinct personalities, real-time voice transcription, dynamic text-to-speech, and comprehensive argument analysis. Built for developers, debaters, and anyone passionate about the art of persuasion.
Some project snapshots:-


About The Project
Rhetoric Arena revolutionizes how we practice and perfect the art of argumentation. Unlike static debate tools, this platform provides dynamic, personality-driven AI opponents that adapt to your arguments, challenge your reasoning, and help you develop stronger persuasive skills.
Whether you're a student preparing for competitions, a professional honing presentation skills, or simply someone who loves intellectual discourse, this product offers an immersive experience that makes learning argumentation engaging and effective.
Why Rhetoric Arena?
- 9 Unique AI Personalities: From supportive friends to ruthless critics
- Real-time Voice Processing…
This is an open-source project. Anyone can fork, clone, and run it locally. The README provides clear instructions, setup steps, and troubleshooting guidance.
Technical Implementation & AssemblyAI Integration
This is the beating heart 💓💓 of Rhetoric Arena, a beautifully layered system that listens, thinks, and speaks - just like a human debater. Let’s break down the major components and how they all fit together.
The core innovation is built on AssemblyAI’s Universal-Streaming technology. I implemented both real-time transcription and batch fallback transcription, ensuring users always get their words transcribed, whether live or delayed.
1️⃣ 🔁 Dual-Mode Transcription (AssemblyAI: Real-Time + Batch)
import assemblyai as aai
from assemblyai import StreamingClient, StreamingClientOptions
from typing import Callable, List, Dict
import time, re
class DebateTranscriber:
def __init__(self, api_key: str):
self.api_key = api_key
self.assemblyai_available = True
aai.settings.api_key = api_key
def start_realtime_transcription(self, on_transcript: Callable[[str], None]):
client = StreamingClient(
StreamingClientOptions(
api_key=self.api_key,
api_host="wss://api.assemblyai.com/v2/realtime/ws?sample_rate=16000"
)
)
def on_data(transcript):
if transcript.text:
on_transcript(transcript.text)
client.on("transcript", on_data)
client.connect()
return client
✨ Why it matters?
This opens a live WebSocket to AssemblyAI. Whenever a user speaks, it captures their voice, transcribes it as they speak, and sends it straight to the debate engine. The callback acts like a “ping” - whenever new words are transcribed, it tells the system what was said. No lag, no wait.
# 2. Batch Upload Transcription (Fallback)
def transcribe_audio(self, audio_path: str) -> str:
if not self.assemblyai_available:
return "AssemblyAI not available."
try:
transcriber = aai.Transcriber()
transcript = transcriber.transcribe(audio_path)
while transcript.status != "completed":
time.sleep(2)
transcript = transcriber.get_transcription(transcript.id)
return transcript.text
except Exception as e:
return f"[Fallback Error]: {str(e)}"
✨ Why it matters:
This ensures reliability. If the real-time pipeline is down or glitchy, the system falls back to traditional transcription. It uploads the audio, waits for AssemblyAI to finish, and retrieves the full text later. You never lose your voice, no matter what.
2️⃣ 🗣️🎭 Voice + Persona Engine
class PersonaResponder:
def __init__(self):
self.themes = {
"sassy": {
"personality": "You're witty, sarcastic, and emotionally expressive. Add spicy comebacks.",
"voice": "en_sassy_female"
},
"stoic": {
"personality": "You're calm, logical, firm. Never emotional. Use facts.",
"voice": "en_deep_male"
}
}
def generate_response(self, user_argument: str, topic: str, user_side: str, theme: str, debate_history: List[Dict]) -> str:
theme_info = self.themes.get(theme, self.themes["stoic"])
prompt = f"""
{theme_info['personality']}
Debate Topic: "{topic}"
User Said: "{user_argument}"
Your Position: Opposite of {user_side}
Past Rounds: {debate_history}
Respond in character. Focus on rhetoric, logic, emotion.
"""
ai_response = self._mock_ai_call(prompt)
return ai_response.strip()
def _mock_ai_call(self, prompt):
return f"[AI Rebuttal based on prompt: {prompt[:100]}...]"
✨ Why it matters?
This controls how the AI fights. Each persona is more than just a tone, it’s a full emotional and intellectual identity. Whether the AI is cool and calculated or wild and fiery, this engine ensures it stays in character, pushes hard, and always matches the vibe of the debate.
3️⃣ 🔊 Text-to-Speech with Persona Voice Switching
class TTSPlayer:
def __init__(self):
self.client = TTSClient()
def text_to_speech(self, text: str, debate_id: str, theme: str = "stoic"):
cleaned_text = self._remove_emojis(text)
voice_id = {
"sassy": "en_sassy_female",
"stoic": "en_deep_male"
}.get(theme, "en_default")
audio_path = f"./audios/{debate_id}_{theme}.mp3"
self.client.synthesize(text=cleaned_text, voice_id=voice_id, output_path=audio_path)
return audio_path
def _remove_emojis(self, text: str) -> str:
emoji_pattern = re.compile("["
u"\U0001F600-\U0001F64F" # emoticons
u"\U0001F300-\U0001F5FF" # symbols & pictographs
u"\U0001F680-\U0001F6FF" # transport & map symbols
u"\U0001F1E0-\U0001F1FF" # flags
"]+", flags=re.UNICODE)
return emoji_pattern.sub(r'', text)
✨ Why it matters:
Each persona sounds different. The stoic AI sounds deep and grounded, while the sassy one sounds expressive and spicy. This keeps the illusion alive: you’re debating a real character. Removing emojis avoids awkward TTS behavior (like reading “😎” as “smiling face with sunglasses”).
4️⃣ 🔄 Multi-Model AI Fallback (Groq ➡️⬅️ Gemini)
class AIAgent:
def _get_ai_response(self, prompt: str, use_groq: bool = True) -> str:
try:
if use_groq:
return self._call_groq(prompt)
else:
return self._call_gemini(prompt)
def _call_groq(self, prompt):
return f"Groq AI Response for: {prompt[:60]}..."
def _call_gemini(self, prompt):
return f"Gemini AI Response for: {prompt[:60]}..."
✨ Why it matters?
This adds resilience. If one AI provider lags or fails, the system immediately switches to another. Groq is lightning fast, whereas Gemini is conversationally rich. No matter what, your debate never stalls.
5️⃣ 🎤 JavaScript Audio Capture (Frontend Voice Recorder)
async function startRecording() {
const stream = await navigator.mediaDevices.getUserMedia({
audio: {
echoCancellation: true,
noiseSuppression: true,
}
});
const mediaRecorder = new MediaRecorder(stream);
mediaRecorder.start();
const audioChunks = [];
mediaRecorder.ondataavailable = event => {
audioChunks.push(event.data);
};
mediaRecorder.onstop = () => {
const audioBlob = new Blob(audioChunks);
const audioUrl = URL.createObjectURL(audioBlob);
};
}
✨ Why it matters?
This lets users record their voice directly from the browser, with noise suppression and echo reduction. Once recording ends, the audio gets uploaded and processed just like a live stream. The frontend becomes a clean, real-time stage for users to speak, hear themselves, and face off against the AI.
Final Thoughts
Rhetoric Arena isn’t just a project.
It’s a personal rebellion against silence, and a gift to every introvert, every curious learner, every job-seeker, every soul who just wants to be heard, argued with, and sharpened.
No more waiting for someone to challenge your ideas.
No more staying quiet because you’re afraid of not having the right words.
With this AI, you’ll always have:
- A worthy rival
- A smart coach
- A fierce friend
- And someone who will never let you stop growing.
Whether you want to become a better speaker, prepare for interviews, or just build courage - "step into the Rhetoric Arena". The AI awaits.
✨ Built with passion, persistence, and purpose by Divya.
For every quiet mind who just wants to speak up.
Thank you for reading till the end 🥹✨😊
Top comments (16)
Interesting! I built something similar. How do you find Groq responses?
I like it for the most part tbh, the personas are as per their style, the analysis and facts are correct.
Checked it out, I made this project on 23rd first. I made an x post about it. Not copied your idea 😅
Lol it’s cool.
Yup , dunno about style, but nice analysis
Great work!
Thank you 😊
another, great work 👍👍👍
all the very best ✨✨
Thank you 😁
Good work
Thank you
Great work mate!
Thank you for the support 🙂🙏
It's lovely, I saw a startup doing the same for interview preparation. They had a whole team working on it, and you did it on your own single-handedly.
Kudos!!!
Wait really 😳
Thank you so much for telling me about this, this was a huge moral booster.
But I am sure they were making it on a larger scale, and with a huge database, and many more functionalities 😅
nice
Thank you
Some comments may only be visible to logged-in visitors. Sign in to view all comments.