Moneycab: Mr. Popov, Spitch is a Swiss company specializing in speech technologies. What is your vision, and how far are you in the realization of this vision?
Alexey Popov: Our vision is to respond effectively to a new reality of communications by providing our clients with customized speech analytics, voice biometrics, and semantic interpretation solutions. Our key differentiators are easy access and fast deployment, as well as the highest level of accuracy. Spitch achieves this by combining unique mathematics, bespoke technology adaptation, and Swiss-made precision work-flow.
„We have achieved a breakthrough in the past 2 years developing solutions that outperform those of our major competitors in specific areas, and it could set us on course to become a leading company in biometric identification and continuous verification.“ Alexey Popov, CEO Spitch
From the business angle, we believe that our speech technologies should not only deliver tangible business benefits to clients, meeting their present-day needs now, but also anticipate the future and created preparedness for it. We have achieved a breakthrough in the past 2 years developing solutions that outperform those of our major competitors in specific areas, and it could set us on course to become a leading company in biometric identification and continuous verification, for example. By outperforming I mean not only the technological advantages, clear business value such as cost saving, but also the power of bright new ideas.
We would like to become the leading provider of applications in all kinds of speech technologies, introducing a new culture, in which speech solutions are delivered as a unified interface platform, built upon the principles of artificial intelligence (AI).
Which industries went further than others in deploying voice based applications, and where do you see the biggest potential for the immediate future?
In the past decade, voice based solutions were mostly used in banking and telecom call centers as well as in healthcare, but this was largely an experimentation stage, considering the issues of accuracy and business relevance. Only in the past few years, we noted significant increase in demand and preparedness for speech technologies in financial services, insurance, and other sectors. There are many positive implementation examples across these industries: e.g. Barclays, Citibank, ING, Wells Fargo and others in banking. In the insurance segment, I like the examples of Manulife and American Family Insurance that deploy speech recognition, voice biometrics and chat apps. Some major airlines, telecoms, and retail companies use voice-driven IVR solutions. Examples are plentiful.
Jointly with our Swiss partners Crealogix and ti&m we developed two different voice operated mobile banking apps, for example. And also, SBB is planning to introduce a voice interface for their very successful mobile apps.
„Our model is B2B and we can install a bespoke solution on your servers, precisely tailored to your company’s needs and trained on your customers’ audio-data.“
In the immediate future, I think that these apps will transform such services as getting legal advice, arranging insurance, booking travel and entertainment, healthcare services and many others. I would emphasize the importance of voice-driven apps for education too. The biggest potential now is in the sphere of customer service automation: processing of calls, voice-driven self-services, IVR, and mobile apps. We expect a rapid growth in the immediate future in deployment of biometric identification solutions. And finally, voice user interfaces for AI based personal assistants and chatbots. The overall impression is that speech technologies are firmly on their way to becoming mainstream. Those who will either wait too long or fail to capitalize on these disruptions soon might find themselves lagging behind the market leaders in the nearest future.
Software giants like Google or Apple are heavily investing in their speech and voice technology. In which areas can you outsmart them and bring better solutions to the market?
The main difference between medium-size vendors like us and Google, Apple, Baidu, and other major players is that those giants are almost entirely focused on a B2C model. They accumulate enormous amount of customer data and increasingly emerge as fintech players. Some banks, therefore, may regard them as potential competitors.
Our model is B2B and we can install a bespoke solution on your servers, precisely tailored to your company’s needs and trained on your customers’ audio-data. Spitch pays special attention to ensuring that both statutory and contractual data protection requirements are rigorously observed. All the big American players forward all their data to the US or at least to an external data center for processing. This is an issue for most countries’ data protection experts and security-aware companies. Importantly, operating Spitch solutions does not require any cross-border transfer of sensitive personal data or other tricky procedures. Data stays in Switzerland in full accordance with the Swiss banking and personal data protection laws. This applies to other countries where we work as well.
„Operating Spitch solutions does not require any cross-border transfer of sensitive personal data. Data stays in Switzerland in full accordance with the Swiss banking and personal data protection laws.“
These are the areas where we, as a B2B company, can outsmart and outperform the majors with our business model, ability to offer bespoke solutions and precision to clients, and unmatched flexibility in meeting their unique needs. In other words, we are not just supplying the technology, but also consulting on what, where and how should be done to achieve outstanding business results.
One area where voice identification and voice driven process automation could make a fast and substantial impact are Call Centers. Where do you see the best use of your technology in Call Centers and do you already have reference cases in that area?
Impressive impact can be achieved by call centres implementing the entire package of our speech technologies including a voice-driven IVR, voice biometric identification followed by continuous verification, where required, as well as subsequent automation of standard business processes such as replacing cards in banking, or changing a tariff package in telecom. Just imagine, automating most of the insurance claim forms fill-in in for the insurance company that works 24/7, for example. We have some references from insurance companies, and cases in healthcare sector, which we think are very promising.
We also have a good reference case with a telecom in UK, where our solution is used to recognize credit card numbers no matter how they are pronounced by a customer. This makes it less costly to observe PCI DSS requirements for our client. Spitch is making a unique value proposition in the market: implementation of our identification and verification technologies in combination, instead of a traditional security questions procedure, reduces the average call handling time by 30%. However, cost-saving effect will dramatically increase up to 86% in case of implementing the whole package of solutions offered by Spitch.
„We have a reference with a medium-size bank where the process of issuing quotes for bank guarantees was fully automated by means of our solution.“
What’s more, automatic but secure identification and verification open an array of opportunities for automation of even rather complex business processes such as issuing a bank guarantee. We have a reference with a medium-size bank, for example, where the process of issuing quotes for bank guarantees was fully automated by means of our solution. Machine dialogue in this case clearly outperforms human call centre agents in terms of speed and quality. Most of the references we have so far are from the banking and telecom call centers.
What is your vision of a Call Center in 2020? Where is this industry going as far as speech technologies are concerned?
I believe omni-channel will become standard: customers will be able to use any suitable channel with a seamless switch between channels at any stage of the communication process. Voice user interfaces will appear in most mobile self-service apps and, in the majority of cases, customers will interact with the machine, not a call center agent. Chatbots will be used to find the answers in the relevant knowledge base. Only the calls on new topics and highly specialized questions will be automatically referred to the human operators. Our experts believe that the application of speech technologies will significantly improve the customer experience in call centers. Sentiment analysis will increasingly be used for evaluating customer satisfaction on a considerably larger number of calls than it is possible at present.
We are now seeing an increasing demand for speech technologies in the enterprise segment in Switzerland, and not just in the financial services industry. There is an explosion of interest among the insurance companies too.
Insurance industry professionals, by the way, will be pleasantly surprised by the futuristic solutions we are going to show and discuss on 29 June in Zurich during our event specifically dedicated to implementing new speech technologies in this industry. Besides Spitch experts, executives from Swisscom and other Spitch’s trusted industry partners are going to take part as speakers. We will start sending out the invitations soon.
Switzerland with its many dialects and accents can be both a fantastic place to test new voice applications, or an impossible place to use voice applications in the daily business. How do you deal with accents in your technology, how far are we in seeing practical applications in the mass market?
Switzerland is not only interesting by its linguistic diversity, but also has very strong schools and academic achievements in computational linguistics and natural language processing (NLP). Spitch R&D team was one of the first to produce a market-ready automatic speech recognition (ASR) solution in Swiss German, taking into account all the major dialects and accents. Our solutions that support Swiss German are fully ready for the mass market now. We were the first to offer a high security voice biometrics solution in Swiss German as well.
„We were the first to offer a high security voice biometrics solution in Swiss German.“
The ability to “understand” the meaning and context of speech as well as identify or verify personality comes with a right combination of technological approaches as well as well-structured and ample audio data for training. Accents can be dealt with by drilling down to the phonemic level and understanding the peculiarities of individual phonemes’ pronunciation, which also helps note the individual speaker’s pronunciation peculiarities for a highly accurate verification of identity. It is a challenging but very interesting scientific and business task to make it all work in real time too.
There is a great focus in the finance industry on Robo-Advising and bots in standard conversations. How can voice solutions help to make the robots more responsive and improve the whole customer experience?
Let me ask you something too. Would you prefer to answer security questions, use your bank card and token, or just talk – simply talk and be authenticated in seconds with the same or greater level of security, knowing that no fraudster can ever imitate over 100 individual characteristics of your voice? Or imagine that you would like to check whether your portfolio is well-balanced by asking your mobile app to do that, while having a cup of coffee on a terrace and watching your children play on the lawn. Our multiple “voice bank” cases work in this way and we are also planning to add voice user interfaces to some leading banks’ chatbots.
The lifestyle change makes it imperative that essential banking, travel arrangements, booking a table in a restaurant and many other day-to-day things should be available over the voice channel. Furthermore, it is increasingly important to offer opti-channel experience, where your customer uses the most convenient channel in accordance with his or her preferences, time of the day, mobility (e.g. driving or riding a train) etc.
Although the voice seems to be unique to every person, voice recognition is not yet widespread for identification purposes. Why is that and which developments could help to widen the usage in this field?
Voice biometrics usage is rapidly growing and there is a clear explanation. To use your voice to authenticate all you need is a microphone already in-built in your phone. Your voice is an authentication factor – a biometric one, in other words: something you are. Combining voice with a face recognition is also possible but we do not receive such requests often and find that in most cases the voice factor is sufficient for secure authentication.
„Voiceprint matching occurs in real time with a very high degree of accuracy.“
Identification of customers has so far been based only on defining their phone number. Obviously, the fact that someone is using your phone number does not guarantee that it is you calling, and in many cases an authentication procedure would be necessary. Identification by voice was problematic for many years mainly due to the issue of latency (time delay) which occurred when the system had to compare the voice of the caller with thousands or even millions of samples stored in the database. Thanks to our new identification product the latency problem has been eliminated. ‘Voiceprint’ matching occurs in real time with a very high degree of accuracy. It is followed by a continuous verification of identity throughout the conversation to ensure the highest level of security.
Switzerland, while a good place to start a company, represents only a very limited market for new high tech solutions. Where do you plan to grow your business and how do you plan to finance this growth?
Spitch is rapidly growing. Last year we opened an office in Milan. There is another one in London covering the UK and Ireland. We are also looking at the large markets in Brazil and some ASEAN countries. And certainly, Northern Europe as well as CEE are among our priorities. We have very clear expansion plans and sufficient revenues to finance our growth, but it could be realized faster with investors who share our vision.
Which technological developments do you see that will have a major impact on how voice based applications will drive the economy?
One of the macro-level developments that is currently unfolding is the automation of knowledge work fueled by disruptive technologies. McKinsey predicts that global potential economic impact of automation of knowledge management will reach US$ 5.2-6.7 trillion in 2025. It is bound to make a significant impact in the next 10 years on many industries. Entire professions and business areas will be transformed and a set of more creative tasks for businesses to tackle will emerge. We believe that the central role will be played by voice user interfaces, that will start replacing conventional input media as soon as the accuracy of ASR reaches that of keyboard input. It would not be an exaggeration to say that it will bring about a kind of new technological revolution. Remember ‘Windows’ replacing the command line in the black screen?
„In the automotive sector speech technologies would re-define the meaning of “hands-free” by enabling control of virtually all functions by free speech commands.“
In healthcare services, a massive loss of doctors’ time could be eliminated by precision in speech-to-text, patient voice biometrics, and automated forms fill-in. Similarly, in the automotive sector speech technologies would re-define the meaning of “hands-free” by enabling control of virtually all functions by free speech commands through voice user interfaces. This would minimize driver distraction.
It seems that in many industries, where speech technologies are currently penetrating, it is important to have the right kind of partnerships. What is your partnerships model?
Indeed, mutually complementary partnerships are crucial as they help draw from different types of cross-fertilizing expertise. Spitch is not only a ‘one stop shop’ offering its partners an ecosystem of tools and solutions clustered around Spitch proprietary core and based on Spitch infrastructure. At any time, partners and customers have access to the new and emerging products, as well as technologies behind them, from one vendor. Our focus on R&D and distributed business model allows Spitch to reduce time-to-market for new products to the minimum, which is critical in the rapidly growing speech technologies market.
We offer a wide range of opportunities to partners from flexible pricing and revenue sharing models, in case they act as resellers, to the possibility of breathing new life in their legacy solutions by integrating our technologies. It is important to note that we provide 24/7 support and dedicated training programmes to clients’ and partners’ staff, as well as to third-party integrators. There is also our Lingware Development Portal that all partners get access to. It serves as a tool-box for developing their own speech solutions based on our technologies and products.
At the end of the interview can you share your thoughts on the future of speech technologies?
We can see a clear trend towards the adoption of speech technologies. In 2016, 20% of all internet search queries were performed by voice commands. There was a nearly 300 times growth in voice internet search queries over the last 5 years, with a maximum number of spoken queries processed by Google per day in 2016 reaching 1,5 bln. The use of voice assistants is bound to continue to grow. This reality stands behind the increasing demand for easy-to-use interfaces and a new level of customer experience.
„There was a nearly 300 times growth in voice internet search queries over the last 5 years.“
Getting it right for most businesses would depend on finding appropriate solutions and optimizing business processes, to fully utilize the potential of new technologies. It is a complex of services that starts with business and technology consulting to ensure that the right scope of products and services as well as their most effective configuration is implemented in line with the business needs projections by the customer.
Clients should also have a clear understanding from the very beginning of the best practices and regulatory frameworks for the implementation of speech recognition and voice biometrics solutions. It’s important to mention that many voice biometrics projects, which have been initiated lately, had to address a problem of low voiceprint collection rates, as well as customer complaints caused by the fact that people had not been informed in advance about their voiceprints collection, Spitch is very well aware about these issues around voiceprint collection, and our specialists are suggesting ways of addressing them to our clients. As I mentioned in the beginning – there is something that differentiates Spitch, namely – our Swiss-made precision work-flow, fast deployment, and the highest accuracy.
About Alexey Popov:
Spitch is a Swiss provider of solutions based on Automatic Speech Recognition (ASR), Voice User Interfaces (VUI), and natural language voice data analytics. Our technologies are focused on facilitating knowledge work automation — one of the most promising IT trends of the next decade.
The Spitch R&D team has over 50 years combined experience in R&D for Spoken Language Technologies (SLT), including Automatic Speech Recognition (ASR), Text‐to‐Speech Synthesis (TTS), Information Retrieval (IR), and Natural Language Processing (NLP)