The recent CB Insights* report on Voice AI consolidation is a clear sign of the sector’s heat. But while capital is flowing and M&A is looming, most of the startups being celebrated have one major flaw: they’re not telephony-first.
They treat voice like a data event, not a communications foundation. They analyze conversations after the fact but lack control during them. That’s not Voice AI—that’s post-call transcription with branding. And, let’s not forget, that in today’s world everything from Facetime, Signal, WhatsApp, Telegram and the services like them, are really players in the telephony game too.
Why Many Voice AI Startups Miss the Mark
- AI Before Dial Tone These players start with language models and bolt them onto a media stream. But without control of the call path—SIP signaling, RTP, jitter buffers—they’re building on shaky ground.
- Relying on CPaaS Middlemen They don’t own the carrier layer. So when something breaks—poor call quality, missed DTMF, routing issues—they can’t fix it, let alone optimize it in real time.
- Detached from Real-Time Flow Insights after the call are helpful. But insight during the call changes the outcome. Most of these tools can’t influence the conversation as it happens.
Why Telephony-First Still Wins
A telephony-native foundation enables:
- Live responsiveness—AI that acts, not just listens.
- Quality assurance—from PSTN or SIP ingress all the way through transcription.
- Real-time orchestration—routing, assistant engagement, or IVR redirection based on voice input.
As I’ve said before, “If you’re not moving down the AI path and you offer functions like IVR, voice recording, contact center, or conferencing—not to mention regular calling—you will be nothing more than PSTN 1.0, not VoIP 3.0.”
Companies That Get It: Telephony-First Voice AI
CarrierX (A Comunicano Client) Led by Dave Erickson, the pioneer behind FreeConferenceCall.com, and CTO Eugene Tcipnjatov, CarrierX brings a full-stack carrier-grade VoIP backbone with programmable infrastructure at its core. Their architecture isn’t an overlay—it’s native. That gives platforms built on CarrierX—like Phound.App—the foundation they need for call control, routing, media handling, and quality assurance. CarrierX’s infrastructure is SIP-native, scalable, and reliable at global levels, making it the kind of platform Voice AI needs to run at real-time speed and precision.
Phound + MagnumAI (A Comunicano Client) Phound uses CarrierX as its voice and telephony backbone—so it inherits all the carrier-grade call control and media reliability that others can’t touch. MagnumAI, layered on top, brings post-call conversation intelligence: indexing, search, summarization, and recall.
What’s important here is the separation of concerns:
- CarrierX handles the call infrastructure.
- Phound.App delivers the front-end experience.
- MagnumAI brings asynchronous intelligence after the call.
This is a properly layered Voice AI stack. It’s not retrofitted. It’s designed—with each piece doing what it does best, while preserving the integrity of the call.
Conveyr, from longtime comrade in arms, sometimes client, and always a friend, Erik Lagerway (Xten, CounterPath, Dialpad, Hookflash, SignalWire), is a developer-first Voice AI platform that lives where the call happens—in the session. It blends SIP-native telephony, transcription, and real-time logic. With over 300 external service integrations like: HubSpot, Slack and Claude’s MCP, Conveyr enables smart agents and assistants that can react within the call. It’s voice-aware and latency-sensitive, because it was built from the core up by people who’ve lived on the wire.
Dialpad + TalkIQ: A Full-Stack AI Deployment (A sometimes Comunicano Client, and a personal investment) When Dialpad acquired TalkIQ, they didn’t just integrate a feature—they redefined the company’s product DNA. Today, Dialpad’s Voice Intelligence (Vi™) is embedded across the entire product suite:
- Dialpad Connect: Full telephony suite with Real-time transcription, smart call summaries, live coaching
- Dialpad Meetings: Instant action items, follow-up recommendations, and speaker tracking.
- Dialpad Contact Center (CCaaS): Sentiment scoring, agent assist, QA automation.
- Dialpad Sell: Live objection handling, script prompts, and performance coaching.
Because Dialpad owns its call infrastructure—from SIP to WebRTC to AI inference—they can execute these features live, with millisecond accuracy. That’s what true voice-native AI looks like.
SignalWire + FreeSWITCH DNA (a former Comunicano client – launch stage) SignalWire, born from the FreeSWITCH project, offers a programmable voice core and a low-latency media fabric. Their AI Agent solution puts conversational logic inside the SIP/media flow—not around it. That means businesses can design agents that don’t just “talk”—they interact in real-time with routing engines, IVRs, and human agents. Voice AI becomes the operator, not just the stenographer.
VCons (IETF Standard) VCons—short for Voice Conversation Containers—standardizes the packaging of transcripts, metadata, and contextual conversation data. The handiwork of long time friend, sometimes client, Thomas (Mr. Mashup) McCarthy-Howe, enables systems to pass around post-call intelligence while preserving the telephony integrity that matters in regulated industries and enterprise workflows.
Final Word: Intelligence Is Only as Good as the Infrastructure
If you’re building Voice AI today, you have two choices:
1 Own the call—build on telephony-native infrastructure and shape the conversation in real time.
2 Watch the call—analyze it after the fact and hope it tells you something useful.
CarrierX, Phound.App with MagnumAI, Conveyr, SignalWire, and Dialpad all prove the first path leads to outcomes as does the vCON standard. Everyone else? They’re hoping a transcript will fix the problems they weren’t built to hear in the first place.