What an AI Answering Service Does in the First 90 Seconds

The first 90 seconds of an AI answering service call decide whether the caller stays on the line, gets routed correctly, and ends the call with their problem captured. Most of what determines that outcome happens in the first 30 seconds, the greeting, the first question, the AI's listening pattern. This post walks through exactly what's happening, second by second.

I'll use OnCrew's call flow as the working example because it's the one I know intimately. The general pattern applies to most contractor-specific AI services.

Second 0: The handoff

The call rings the AI's number. The voice stack accepts the call within one ring. Latency to the first spoken word should be under 800 milliseconds, anything longer feels broken.

What's happening technically:

SIP or VoIP signaling completes.
Audio stream opens.
Greeting plays.

Second 1-3: The greeting

The greeting is the first impression. It should:

Identify your business. "Thanks for calling [your business name]."
Set expectations. "This is [AI assistant name], how can I help you today?"
Be warm but not over-scripted. No "I value your call" canned phrases.

The greeting should run 2-3 seconds. Longer means the caller has to wait to speak, which feels slow at 2 AM.

Second 3-10: The first question lands

The AI asks the open-ended first question: "What's going on?" or "Can you tell me what's happening?" or "How can I help you?"

The wording is intentionally open. It doesn't anchor the caller into "are you scheduling an appointment?" framing, it lets the customer describe the situation in their own words.

Second 10-30: The caller talks

The caller describes the situation. The AI listens, transcribes, and runs intent classification in real time.

What the AI is doing in this window:

Speech-to-text the caller's audio.
Intent classification on the partial transcript. Is this an emergency? A routine call? An estimate request? A non-service call?
Keyword detection for safety branches: gas, smoke, fire, sparking, burning, water, flooding, sewage, panel, CO, alarm.
Sentiment analysis for urgency. A panicked caller voice gets prioritized differently from a calm caller.

By the time the caller finishes their first sentence, the AI has a working theory of what kind of call this is.

Second 30-45: The first response

The AI responds. The response depends on the intent classification:

If safety branch fired:

"I'm hearing that [safety situation]. I want to walk you through the right steps right now."
Branch script runs (evacuate, call 911, shut off equipment, etc.).

If emergency intent (non-safety) fired:

"It sounds like this is urgent. Let me get you the right help fast. Can I get a callback number first in case we get disconnected?"

If routine intent fired:

"Got it, sounds like [paraphrase of situation]. Let me get a few details from you so we can get this scheduled."

If non-service intent fired:

"Thanks for calling. Let me make sure we route you to the right person, could you tell me a little more about what you need?"

The first response is the most important AI utterance of the call. It tells the caller whether they're being heard correctly. A wrong-paraphrased response loses the caller's trust immediately.

Second 45-60: Callback number capture

Before going deeper into intake, the AI captures the callback number. The reason is failover: if the call drops, the AI needs to be able to dial back.

Wording: "Quick, can you give me a callback number in case we get disconnected?"

Capture, confirm by repeating, move on. Should take 10-15 seconds.

Second 60-90: Intake begins

By second 60, the AI is in full intake mode. The structured questions begin:

Service address.
Specific symptom (paraphrased back for confirmation).
When the symptom started.
Equipment details if applicable.
Anyone home?

The intake continues past the 90-second mark for the next 3-5 minutes, depending on call type.

What changes at second 90

By the 90-second mark, three things have been established:

Intent classified. Emergency, urgent, routine, estimate, or non-service.
Safety branches triggered if applicable. If gas smell or CO alarm, the customer is already being walked through evacuation steps.
Callback number captured. Disconnects are now recoverable.

This is the inflection point. Calls that get past 90 seconds cleanly almost always complete successfully. Calls that have problems at 90 seconds (misheard situation, wrong branch, latency issues) usually don't recover.

What good looks like in the first 90 seconds

Pull a recording from your AI answering service and listen for these markers:

Greeting under 3 seconds.
Open-ended first question.
No anchoring on appointment scheduling early.
Paraphrase of caller's situation for confirmation.
Safety branch fires within 15 seconds of safety keyword.
Callback number captured before deep intake.
Tone matches caller (calm if caller is calm, brisk if caller is urgent).

If all 7 are present in your recordings, the script is working. If 4-5 are present, you have refinement to do. If 3 or fewer are present, the vendor's script doesn't fit contractor work.

What bad looks like

The failure patterns:

Slow greeting (5+ seconds). Feels like the call didn't connect properly.

Anchored first question ("Are you scheduling an appointment today?"). Excludes emergency intent from the customer's framing.

No paraphrase confirmation. Customer wonders if they were heard.

Safety branch doesn't fire on keyword. The AI didn't catch "I smell gas". This is a script gap.

Callback number captured at end of call instead of beginning. Disconnects are unrecoverable.

Tone mismatch. AI sounds chipper while customer is panicking.

If you hear these in your recordings, the AI script needs refinement or the vendor isn't the right fit.

How to test this on a demo

Place a test call. Use a real scenario. Listen for the 7 good markers above. Time the first 90 seconds.

If a vendor's demo line fails on more than 2 of the 7 markers, the production line will fail on more. Don't go live until the demo is clean.

For more, see the AI answering service product page, the emergency call routing setup guide, and the contractor virtual receptionist buyer's guide.

FAQs

What's the right voice: male, female, accent?

Modern AI voice models offer multiple options. The right choice depends on your customer base. Most contractor shops do well with a neutral-accent voice that sounds professional without being robotic. Some shops match their region (Texan, Northeastern, Southern). Test a few.

Can I hear my own AI's first 90 seconds before going live?

Yes. Any reputable vendor will let you place a test call and listen to the recording. If they won't, walk.

How does the AI handle a caller who pauses or interrupts?

Good voice stacks handle interruptions natively: if the caller starts talking while the AI is speaking, the AI stops and listens. Pauses up to 5-10 seconds are tolerated; longer pauses trigger a polite "are you still there?" prompt.

What if the caller has a strong accent or speech difficulty?

Modern AI models handle a wide range of accents in 2026. Severe speech difficulties (heavy slurring, breathing issues, distress) should trigger an escalation to a human, the AI should not try to push through with intake when the caller is clearly impaired.