Definitely Doug 3/8/24: Voice AI Goes Mainstream

by Doug Rice

Hotel contact centers have been around for at least 75 years. Some are large-scale centers with hundreds of agents in one place (or even thousands virtually), while others may consist of a few phones answering guest calls at a hotel front desk, PBX room, or sales office. Contact centers are less important than they were 30 years ago: customers now use websites, mobile apps, live chat, bots, or emails for many transactions, rather than speaking to an agent sitting in front of a screen.

But voice contact centers are not going away anytime soon. Every hotel business must accommodate customers who still want to use that curious legacy technology that is the voice telephone call, or whose needs or issues cannot be handled by antiquated self-service tech stacks. Live chat often requires the same knowledge and uses the same agents, if only for escalations that chatbots cannot handle.

And increasingly, contact centers must deal with a severely constrained labor force that is challenging to find and hire, difficult to train, hard to retain, and becoming more expensive every year.

This week’s topic is how Artificial Intelligence (AI) is making contact centers more efficient and effective, simultaneously addressing labor requirements, agent skills and turnover, and customer satisfaction. I spent 15 years of my career as a consultant with a deep specialization in contact center operations. Admittedly this was 20+ years ago, but the basics had not changed much until recently. Today, I am truly amazed at how much the technology has evolved, mostly in the past few years, as represented by the products I reviewed for this article.

Some of these are targeted at big, complex contact centers or global operations, while others can make sense even for a single 100-room limited-service hotel. The better ones provide options to centralize calls to individual hotels without losing the local knowledge that is needed to address them professionally.

There has been a great deal of AI-led innovation in this space in the past few years, and some large (if not yet widespread) successes within the hotel industry. These solutions utilize automatic speech recognition (ASR), language models, and natural language generation to great effect, but not entirely in ways you might expect (hint: it’s not just voice-enabled ChatGPT). Early adopters are finding the biggest benefits include faster onboarding and training for staff, better coaching capabilities (especially for work-at-home agents), higher retention rates, shorter call times, a dramatic reduction in hold times, quicker problem resolution, and improved customer satisfaction.

These solutions are already changing the landscape of contact center management dramatically. And while they are solving tactical problems like labor shortages and agent retention, their long-term impact will be to enable business processes that better align with the 21st century consumer. Some contact centers are already doing that, both in hospitality and elsewhere.  

For this article, I spoke with six companies that all offer products with interactive voice recognition and voice response, and that are specifically working with hotels. The picture I will paint reflects the best of the best, with no single product doing everything mentioned (although a few come close). The companies each have different focuses and strengths, but every product I saw was impressive and worth a serious look.

The companies and platforms I reviewed for this article included Balto, BluIP’s AIVA Connect (AI Virtual Assistant), NICE’s CXone, PolyAI, SoundHound’s Smart Answering, and Tenyx.  My deepest thanks to the senior executives from these companies, who provided great insight into how the next generation of products has taken shape.

As usual, I will focus on best practices and potential benefits rather than trying to rate or compare specific companies. As you read on, most hotel companies (and even some of their vendors) will see huge opportunities that these new products can create for you. All six companies have hotel experience, a few of them at serious scale. There is a reason for their success.

What’s Changed?

Voice bots are rapidly replacing traditional Interactive Voice Response (or IVR), which can understand specific words but can only deliver prerecorded responses and limited vocalization of certain data (like maybe your loyalty account points balance). IVR has been around for decades (a classic example is “For reservations, press or say one,” but most deployments end in caller frustration because their ability to understand even single words is often poor, and they can only do a few (typically simple) things well.

How many times have you called a bank after finding the self-service options for a particular issue were not there or didn’t work, only to have to listen to a long list of options that have no relevance? You might hear your balances and minimum payment due four times, authenticate yourself twice, and repeatedly try to guess which magic word will get you to a person (who will then almost certainly have to transfer you to someone else) And when you get to a human, how much of the information you already provided do you now need to repeat?

IVRs in many environments typically handle only 10% to 20% of calls without human intervention. While they can save costs by offloading simpler calls that would otherwise need a human, they do so with the side effect of high customer frustration.

The newer AI voice bots flip this upside down. Depending on the application, training, interfaces, and other factors, some can fully address the needs of as many as 85% or 90% of callers as the implementation fully matures. The limit for many contact centers is likely to be lower (especially in early days, before integrations can be built to address certain common needs), but I believe 60% to 80% should be achievable by a typical hotel company that wants to automate and puts forth a serious effort.

I tested several of these products (most of them have demo numbers you can call). And while they are by no means perfect, they can do a lot more than I expected, and they generally work very well. I would go so far as to say that the best ones are better than the average contact center agent, but not as good as the best. With some, I was able to make a hotel reservation entirely by voice. I was offered the relevant products, I was not asked to spell my name (but was able to ask the bot to read it back to be sure it got it right), I didn’t have to strain to understand a heavy accent or poorly placed microphone, and my call was connected instantly. There are very few calls I have with human contact-center agents (hotel or otherwise) where I can say all these things.

To be sure, many guests prefer to speak with a human, and many hotels (particularly in the luxury segment) want that as well. These systems support whatever the hotel’s preference is, but even for luxury brands they create an option for those guests who prefer not to speak to a human (including many in today’s younger generation) or when no qualified human is available. They also fully support text chat, as discussed below.

I have plenty of experience staying at major top luxury hotels (almost regardless of brand) and calling the front desk with a request only to be put on terminal hold or sent to voice mail. While a bot may only provide a “5” level of hospitality vs. the “10” of a good human, 5 is a lot better than the “0” level that terminal hold or voice mail represents. Many times, I had a simple request that a bot could have handled, but instead it ended up fulfilled too late or not at all. Is that the level of hospitality hotels want to deliver? I don’t think so!

The customer experience delivered by voice bots has gotten orders of magnitude better in the last few years, but the back-end improvements are equally impressive. These platforms often combine the phone system (often an Automated Call Distributor or ACD), voice recordings, call transcripts, messaging (chat, email, WhatsApp, even Teams), and agent performance measurement and call analytics in a single, integrated platform. As a result, the company gets a single record of each customer interaction even if it involves multiple channels or agents, potentially even in different contact centers or countries. As discussed below (“Voice Bots Can Help Human Agents”), they provide rich resources to better train, equip, and manage a workforce of front-line contact center agents.

Coexistence of Old and New

Voice bots will not replace humans, at least not anytime soon. No AI bot will handle 100% of calls to a hotel’s satisfaction, and most hotels do not want them to (at least, not yet). However, they can get 100% of the calls started, and address many needs without human intervention.

Where human capabilities are needed, the bots can log the conversation up to the point of sending the call to an agent.  When that agent answers, a transcript of the call (and a brief written summary) can appear immediately on their screen. Information captured by the bot is also readily available to the human agent and can be used to auto-populate forms on the screen; the caller does not usually need to repeat anything and the agent should not need to re-enter information previously captured by the bot (e.g., name, loyalty account, reservation locator, target hotel, dates, etc.).

The better systems can direct the call to a particular agent or queue based on the type of request the caller is making (e.g. reservations vs. loyalty program inquiry); based on the persona of the caller and which agents handle that persona with the highest satisfaction scores; or based on customer value such as elite status. Calls with unusual requests can be routed to agents who specialize in handling them. For example, if a caller to the brand reservation center asks about booking a wedding at a particular hotel, they can be automatically transferred directly to that hotel’s sales department (and the bot can give the caller detailed information about who will receive the call and how to reach them, in case they get voice mail).

Voice bots can also divert calls to self-service or other channels where appropriate, for example by texting or emailing a deep link to a website, or by taking a message when no agent is available and notifying whoever should respond by email, text, or even a Teams channel.

All of this means fewer calls requiring human agents, as well as shorter call handling times by getting calls to the right agent on the first try, and by capturing the voice bot interaction and providing the information to the agent so they don’t need the customer to repeat it. Even if all the bot does is handle informational calls, it enables agents to focus on more important (e.g., revenue-producing) opportunities.

Configurations are highly customizable. Some higher-end hotels that value the human touch use voice bots to handle calls only when a human is not available. Some limited-service hotels that use outsourced contact centers choose only to transfer calls where the likely revenue is sufficient to pay the contact center’s fees; otherwise they force callers to use self-service options or to leave a message.

For hotels that want to maximize the human touch, the bots can help agents become much more effective by prompting them with the right information or cues at exactly the right time. This could easily save 20 seconds on a typical call by exposing information that the agent would otherwise have to look up. They also score every agent on every call, as well as providing real-time suggestions and feedback, so that they and their supervisors can focus on making every agent more like their best agent.

A bonus benefit of a good voice bot is that you typically get a very good chatbot almost for free. Voice bots work by first converting the customer’s voice to text, then by analyzing and acting on the text, then by producing a text response, and finally by converting that text response to voice. If you strip off the first and last steps, the voice bot basically becomes a text chatbot. The same training data and knowledge base supports both, and answers will always be consistent between voice and text.

Technical Challenges

Voice bots face several challenges compared with text chatbots, and there are differences in how well the solutions I saw address each one of them (as well as which ones will matter to you). There is no substitute for exhaustive testing during product evaluation, and for launching with a small-scale proof-of-concept to work out the kinks. Some of the key issues to consider are:

  • Latency: Long delays before a voice bot responds will make the call seem much less natural to the caller. There is a lot of computational power needed to interpret, analyze, compose a response, and deliver it. To sound natural, it all needs to happen in less than a second or so.
  • Languages: Voice bots need extensive training in each language they need to support. While every product I saw supports English, some handle more than 100 languages. If you need a global solution, this will be a differentiator.
  • Accents, Dialects, and Regional Terminology: Voice bots have gotten much better at understanding how different people speak. Look for ones that are trained on conversations and speakers that are relevant. Some of the systems will even run a caller’s voice through multiple models simultaneously to see which one understands it best, and then use that one
  • Pauses, false starts, and repetitions: Natural language often includes ums, uhs, nonsense words (I mean, like, you know), repetitions, and decisions to rephrase after starting but not completing a thought. Look for systems that can detect and ignore words that don’t matter.
  • Background noise: The better algorithms can filter out background noise to understand conversations that others will struggle to comprehend.
  • Second speakers: Calls from speaker phones may have multiple people talking sequentially or even on top of each other. The better systems can handle these situations, at least to a degree.
  • Pauses: It is often difficult to know when a caller has finished a thought and stopped speaking (meaning it’s time for the bot to reply). Words, tonality, and other factors may provide clues. You want to respond quickly when they are done, but not to interrupt.
  • Interruptions: Callers get interrupted and may switch to speaking to someone else midsentence or even mid-word (“I’m on a call!”). And while any five-year-old child would understand what was happening, only the better voice bots will handle this in stride.
  • Context: Within a conversation, there may be segments where you are expecting the caller to provide a number, an alphanumeric (such as a record locator), an address, or something else. The better voice bots can recognize contexts where they are likely to get a specific type of response and can use a custom speech recognition module to improve understanding.
  • Homonyms: Soundalike words in general conversation can be problematical, and industry context matters. Industry-specific training is critical. If the caller asks if the hotel has any suites, and the bot hears “sweets,” it may provide the less-than-useful answer that the hotel sells candy in the grab-and-go and in vending machines.
  • Tone: Better voice bots can detect impatience, frustration, happiness, and other emotions. This can be used to decide whether to transfer the call to a human, and even to select the best available human.
  • Speech Synthesis: While most of the voice bots spoke quite naturally, it was not hard to tell they were bots. Most of the companies said this was intentional; they preferred that the caller know they were talking to a bot so that the conversation would better stay on track. Most callers understand that bots have limits and stay more focused in their requests.
  • Sensitive Data: Most of the solutions had at least some protections built in for sensitive data, for example using third-party capture-and-tokenize capabilities to collect payment card information. Based on applicable regulatory requirements, you will also want to review how other sensitive data is removed from call logs, summaries, and recordings.

Voice bots also can suffer some of the same issues as text chatbots.

  • Hallucinations: Depending on how the AI model is designed and trained, these can be an issue. Some providers minimized the issue by using older LLMs than the ones popularized by ChatGPT in the past year and a half (which are more prone to hallucinations). Others used the latest tools but emphasized the need for special content training or add-ons, ingestion of specific databases, access to content management systems, and conversational guardrails.
  • Integration: Some voice bots support the interfaces needed to (for example) process reservations, book ancillaries (such as local restaurants via OpenTable), interact with public directories such as Yelp or Parkopedia to get current information, or open tickets in work order management system when a guest requests towels. Some of the solutions had many more integration options than others


Voice Bots Help Human Agents

Perhaps the most exciting thing I saw (in different degrees in different products) was the ability of the voice platforms to improve the quality of calls handled by human agents when calls are transferred to them. When a single AI platform with speech recognition is listening to every call from beginning to end (whether the caller interact with the bot only, a human agent only, the bot and one human, or the bot and multiple humans), it can act as an assistant to human agents.

I participated in several interactive calls, watching the agent’s screen as I listened in or acted as the caller. If the caller asked a question about a hotel’s restaurant, in near real time the information the agent needed to answer it popped up on their screen – with no action on their part to request it. One reservation caller mentioned wanting to play some golf, and immediately the bot suggested language for an available golf package. A caller was hesitating whether to book something, and the screen displayed ideas for helping the agent to close the sale. The agents were spared a lot of typing: in some cases booking screens were automatically populated with information spoken by the caller, whether collected by the bot in advance, or spoken to the agent (and bot) during the call.

Suggestions to the agent were not random; they were based either on policies the hotel wanted to follow, or on statistical analysis of what worked best (as in, how do our best agents handle this situation?). They were also based on (and limited by) the integrations the hotel had implemented. If the hotel had four things they wanted every agent to do on every call, one system would list them in a corner of the screen and automatically check them off as the agent covered the point verbally (and they did not have to use exact language). One prompt even advised the agent to take a deep breath and slow down, because they were talking too fast.

The tools also benefit contact center managers. They can alert supervisors when something happens on a call that either the bot or the human agent thinks needs their immediate attention. At that point, the supervisor can join the call or have a side text chat with the agent. In either case they have full access to the transcript so far, a call summary, and recording. Dashboards can identify calls where an agent might need coaching, enabling the supervisor to replay the call (or just the relevant section) in one-on-one training. And they can measure agent compliance with policy and even predict customer satisfaction ratings from individual calls, based on data from millions of past calls.

Instead of supervisors monitoring a couple calls per agent per month, the AI agent monitors and rates every one, enabling the supervisor to really focus on the specific training needs of each agent, pulling up representative calls that they can review together.

Other Benefits

Many voice-bot enabled call centers provide additional benefits not available to contact centers using older technologies.

  • Call Analytics: By using data analytics on every historical call, these solutions can accurately measure the reasons for calls, as well as correlate agent behaviors with key metrics such as customer satisfaction, net promoter score, first call resolution, or likelihood of complaint. They can predict consumer satisfaction for each call with high accuracy. This can be reported to the agent for self-improvement and coaching and can also be provided on dashboards to supervisors. It can also be correlated with scores on hiring and candidate assessment screens to better identify who is most similar to high-performing agents.
  • A/B Testing: The best responses to particular situations (particularly in selling) can easily be A/B tested, since specific language can be prompted to both the voice bots and human agents. This is very difficult to do with legacy solutions, where agents typically make up the specific language based on their training or a quick scan of very terse descriptions from their screen (which are usually fixed in the transactional system and not changeable on a call-by-call basis).
  • Guest Mobile Calls: Through integration with hotel systems, the bots can recognize calls from in-house guests using their mobile phones and handle them as if they were internal calls.
  • Call Logging: Most of the products eliminate the need for manual note taking, call summaries, and (with integration) entries into Customer Relationship Management (CRM) systems.
  • Opportunity Identification: Because they can record customer requests, desires, and issues on every call, they can identify opportunities for business process improvements. One commercial package delivery service had no idea that 30% of their customers who had missed a home delivery wanted the option to pick up the package themselves at a company facility (which they did not offer). The analytics highlighted this number; by implementing at-facility pickups, they reduced redeliveries by 30% (with substantial cost savings), while improving customer satisfaction.

Should You Stick Your Toe in the Water?

In this article, I have painted a picture of what today’s technology can do. It is already revolutionizing contact centers in ways I couldn’t have imagined five or ten years ago, and it will only continue to improve. To be clear, no single product does everything I have described (nor does any hotel likely need everything); I took the “best of the best” to try to convey how exciting a change this represents. But reimagining your contact center technologies and operations – and the customer and colleague journeys through the myriad of use cases – will not be an overnight journey.

Here are some thoughts (informed in part by the experience of companies already embarked on this train) on how to approach implementing this technology.

  1. Look Broadly. Survey the field and see what best fits your operation and needs. Don’t start with an RFP based on what you do today with 20th century technology; rather, see what others are doing that might fit your needs, and try to reimagine your contact center technology from the ground up for the 21st century. Establish your vision for five or ten years in the future. What should the customer experience be? The colleague experience?
  2. Decide the Maximum Scope. Understand as early as possible what the ultimate potential scope of the effort is: a single contact center, all worldwide contact centers, a single hotel, a cluster of hotels, or some combination (even all) of the above. While you will want to start small, you should be trying a tool or tools that can grow to your ultimate potential scope and size. Some systems can manage worldwide contact centers and remote agents as if they were all under roof, all spoke the same language (almost), and were all connected to one ACD and supporting technologies. Others have significant limitations that may or may not matter to you.
  3. Start Small and Test Thoroughly. Run proof-of-concepts with interesting options (more than one if you can) at a small scale to see if the products perform to your expectations, what problems they solve, and what issues they create. Work with the vendor(s) to address key challenges and to assess their responsiveness and alignment.
  4. Expect Continual Improvement Over Time, Not Immediate Results. Whatever your goal, expect to fully automate only a small percentage of your calls at first. The bots require training and tuning. Perhaps more important, though, they require integrations with other systems (reservations, property management, loyalty, work-order management, etc.) to address many use cases. Use the analytics to quantify the reasons WHY calls could not be handled. If you find that 20% of your calls are for something that could be automated by integrating a new system or adding new content, you can assess the effort, cost, and timeline and make incremental decisions that improve the bot’s abilities over time.
  5. Get Your People Onboard. Make sure your call center managers, supervisors, trainers, and human resources staff are engaged, particularly where you are reviewing and piloting new technologies. This way of hiring, training, evaluating, and managing agents may (should!) ultimately change their jobs for the better, but the changes will be substantial, and resistance to change is always an issue. Make sure your staff own the solution and can help drive the necessary process changes. Give them objectives that can only be achieved through those process changes but let them choose the specifics as much as possible.

A few hotel (and other travel industry) companies are well down the road in implementing voice bots, and their experiences have generally been good. If this is you, congratulations! If it’s not, then it’s time to start ensuring that you aren’t left behind.

Douglas Rice

Discover Return On Experience

Three ecosystems — Hospitality & Leisure, Food & Beverage, and Inventory & Procurement — operate independently and together depending on your needs.


Let's Get Digital

7 Questions to Ask Before You Invest in a Hotel Mobile App