Read a former employee’s note on what was wrong with Alexa and how Amazon made a big mistake with its voice assistant

Read a former employee’s note on what was wrong with Alexa and how Amazon made a big mistake with its voice assistant



A former employee targeted the e-commerce giant Amazon For missing out on a big chance with your voice assistant, Alexa. Mihail EricThe former senior machine learning scientist for Alexa AI wrote in a lengthy post on X (formerly Twitter) about how Amazon missed the opportunity to turn Alexa into a leading product while competitors are moving ahead with their own AI-infused voice assistants.“We had all the resources, talent, and momentum to become the clear market leader in conversational AI,” Eric lamented in the post.
According to LinkedIn, Eric left Amazon in the year 2021. He wrote that Amazon missed its chance to dominate the conversational AI space due to a “poor technical process”, “fragmented organization”, and “product-science misalignment”.

Here is a longer post titled ‘How Alexa lost the opportunity to be the top conversation system on the planet’, written by Mihail Eric:

A few weeks ago, OpenAI released GPT-4o, which introduces a new standard for multimodal, conversational experiences with sophisticated reasoning capabilities.
Several days later, my good friends at PolyAI announced a Series C fundraise after seeing tremendous growth in usage of their enterprise voice assistant.
Amid this news, a former Alexa colleague messaged me: You might think that voice assistants would have been our specialty at Alexa.
For context, I joined Alexa AI as a research scientist in early 2019. By this time, the Alexa consumer device had been in existence for 5 years and was already in over 100 million homes worldwide.
In 2019, Alexa grew at a rapid pace. Dozens of new teams emerged every quarter, massive financial resources were invested, and senior leadership made it clear that Alexa was going to be one of Amazon’s big bets going forward.
My team was born out of all of this with a simple charter: to bring the latest and greatest in AI research to the Alexa product and ecosystem. I have often described our group (later called the Conversational Modeling team) as Google Brain meets the Alexa AI-SWAT team.
I was there for 2.5 years, during which time we grew from 2 to 20 and handled every part of conversational systems.
We built the first LLMs for organizations (though we didn’t call them LLMs back then), we built a knowledge-based response generator (though we didn’t call it a RAG), and we pioneered the prototype for making Alexa a multimodal agent in your home.
We had all the resources, talent, and momentum to become the clear market leader in conversational AI. But much of that technology never came to light and never received any notable press.
Why?
The reality is that the Alexa AI was plagued with technical and bureaucratic problems.
poor technical process
,
Alexa put a lot of emphasis on protecting customer data with guardrails to prevent leakage and access. Definitely an important practice, but one of the consequences was that it was very painful for developers to work with the internal infrastructure.
It took weeks to access any internal data for analysis or experiments. The data was not properly annotated. Documentation was either non-existent or out of date.
The experiments had to be run in a resource-limited compute environment. Imagine trying to train a Transformer model when you only have a CPU available. This is unacceptable for a company sitting on one of the largest collections of accelerated hardware in the world.
I remember one time our team performed an analysis that demonstrated that the annotation scheme for some subset of pronunciation data was completely wrong, leading to inaccurate data labels.
This meant that for months our internal annotation team was mislabeling thousands of data points every day. When we asked the team to change their annotation taxonomy, we discovered that it would take a lot of effort to modify even the smallest thing.
We had to involve the team’s PM, then get their manager’s consent, then submit the initial change request, then get it approved (a process that took several months from start to finish).
And most importantly, the team PM had no immediate story to make a case for a promotion by fixing the issue, except that “this is scientifically the right thing to do and may lead to better models for another team.” Not getting an incentive meant no action was taken.
Since it was not our responsibility and the effort from our side was not worthwhile, we closed that chapter and moved on.
As far as I know, they may still be mislabeling those statements today.
Fragmented organization structure
,
Alexa’s organizational structure was decentralized, which meant that many small teams worked on the same problems, sometimes in different geographic locations.
This led to an almost Darwinian feel to organisational dynamics, where teams struggled to get their work done, in order to avoid being reorganised and absorbed into a competing team.
This resulted in the organization becoming a problem of hostile middle managers who had no interest in collaborating with Alexa’s broader interests and only wanted to protect their own fiefdoms.
My group’s goal was to spread out projects, whereby we found teams that matched our research/product interests and urged them to collaborate on ambitious efforts. The resistance and lack of action we encountered was soul-crushing.
I remember once we were coordinating a project I was leading to scale up large Transformer model training. It was an ambitious effort that, if done correctly, could have been the genesis of Amazon ChatGPT (long before ChatGPT was released).
Our Alexa team met with an internal cloud team that was independently starting a similar venture. While the goal was to find a way to collaborate on this training infrastructure, there were many unfulfilled promises over the course of several weeks that never came to fruition.
In the end, our team did its job and the collaborative team did theirs. Lack of common ground led to duplicated efforts. Without sharing data, infrastructure, or lessons, this inevitably hurt the quality of the models produced.
As another example, the Alexa Skills ecosystem was Alexa’s attempt to apply Amazonian decentralization to the communication problem. Different teams should have different skills.
But dialogue is not conducive to that degree of separation of concerns. How can you seamlessly delegate conversational context between skills? That means endowing the system with multi-turn memory (a long-standing dream of dialogue research).
The internal design of the skill ecosystem made this impossible because each skill acted like its own independent bot. It was conversational AI run by an opinionated bot committee, each with its own agenda.
Product-science discrepancy
,
Alexa was completely customer-centric which I think is admirable and something every company should follow. Within Alexa, this meant that every engineering and science effort had to be tied to some downstream product.
This caused stress for our team as we had to make experimental bets for the future of the platform. These bets could not be incorporated into the product in a normal quarter without hacks or shortcuts, as was expected.
So we constantly had to justify our existence to senior leadership and present our projects with parameters that would be more suitable for the clients.
For example, in one of our projects to build an open-domain chat system, the success metric imposed by senior leadership (i.e. a single integer value representing overall conversation quality) had no scientific basis and was nearly impossible to achieve.
This led to product/science conflicts at every weekly meeting to track project progress, resulting in a change of managers every few months and eventually the shutdown of the effort.
,
As we look ahead, in the battle for the future of the conversational AI market, I still believe it can be anyone’s game.
Today, Alexa has sold more than 500 million devices, which is a huge amount of user data. But that alone is not enough.
Here’s how I would organize the dialogue system effort from the start:
Invest in robust developer infrastructure, particularly around access to compute, data quality assurance, and streamlined data collection processes. Data and compute are the lifeblood of modern ML systems, so it’s imperative to proactively establish this foundation.
Make LLM the fundamental building block of conversation flow. Looking back, the Alexa skills ecosystem was an immature initiative for the capabilities of conversational systems at the time. I compare it to when Leap Motion created and released a developer SDK before the underlying hardware device was stable.
But with the power of modern LLM, I am optimistic about redesigning the developer conversational toolkit with LLM as a foundation.
Make sure the product timeline doesn’t overshadow the science research timeline. Since things are moving so fast in the AI ​​world, it’s hard not to feel the pressure to ship quickly. But there are still a lot of unsolved problems that will take time to solve.
Of course you should conduct research aggressively, but don’t measure the delivery cycle in quarters, as this will lead to inferior systems for meeting deadlines.
,
If you’re thinking about the future of multimodal conversational systems and interfaces, I’d love to hear from you. We have work to do!




Comments

No comments yet. Why don’t you start the discussion?

    Leave a Reply

    Your email address will not be published. Required fields are marked *