itb-au logo
Story image

How multimodal learning is set to transform AI

15 Oct 2019

The total installed base of devices with Artificial Intelligence (AI) will grow from 2.7 billion in 2019 to 4.5 billion in 2024, forecasts global tech market advisory firm, ABI Research. 

There are billions of petabytes of data flowing through these AI devices every day; the challenge now facing both technology companies and implementers is getting all these devices to learn, think, and work together. 

According to a recent whitepaper from ABI Research, Artificial Intelligence Meets Business Intelligence, multimodal learning is the key to making this happen, and it’s fast becoming one of the most exciting — and potentially transformative — fields of artificial intelligence.

“Multimodal learning consolidates disconnected, heterogeneous data from various sensors and data inputs into a single model,” says ABI Research chief research officer Stuart Carlaw. 

“Learning-based methods that combine signals from different modalities can generate more robust inference, or even new insights, which would be impossible in a unimodal system.”

Multimodal is well placed to scale, as the underlying supporting technologies like Deep Neural Networks (DNNs) – a giant leap forward over rules-based software - have already done so in unimodal applications like image recognition in camera surveillance or voice recognition and Natural Language Processing (NLP) in virtual assistants like Amazon’s Alexa. 

At the same time, organisations are recognising the need for multimodal learning to manage and automate processes that span the entirety of their operations. Given these factors, ABI Research estimates that the total number of devices shipped with multimodal learning applications will grow from 3.9 million in 2017 to 514 million in 2023.

“There is impressive momentum driving multimodal applications into devices, with five key end-market verticals most aggressively adopting multimodal learning: automotive, robotics, consumer, healthcare, and media and entertainment,” Carlaw adds.

In the automotive space, multimodal learning is being introduced to Advanced Driver Systems (ADAS), In-Vehicle Human Machine Interface (HMI) assistants and Driver Monitoring Systems (DMSs) for real-time inferencing and prediction.

Robotics vendors are incorporating multimodal learning systems into robotics HMIs and movement automation to broaden consumer appeal and provide greater collaboration between workers and robots in the industrial space.

Consumer device companies, particularly those in the smartphone and smart home markets, are competing intensely to demonstrate the value of their products over competitors. New features and refined systems are critical to generating a marketing edge, making consumer electronics companies good candidates for adopting multimodal learning-enabled systems into their products. Growing use cases include security and payment authentication, recommendation and personalisation engines and personal assistants.

Medical companies and hospitals are still relatively early in their exploration of multimodal learning techniques, but there are already some promising emerging applications in medical imaging. The value of multimodal learning to patients and doctors will be difficult for health services to resist, even if adoption is initially slow.

Media and entertainment companies are already using multimodal learning to help with structuring their content into labelled metadata, so they can improve content recommendation systems, personalised advertising, and automated compliance marking. So far, deployments of metadata tagging systems have been limited, as the technology has only recently been made available to the industry.

“The most extensive application of multimodal learning today is for behaviour and language modelling in smartphones. Classification, decision-making, and HMI systems are going to play a significant role in driving adoption of multimodal learning, providing a catalyst to refine and standardise some of the technical approaches,” Carlaw says.

Story image
Video: 10 Minute IT Jams - Radware on multi-cloud deployments
In this interview, Techday speaks to Radware vice president of technologies Yaniv Hoffman, who discusses the trend towards organisations deploying applications in multi-cloud environments.More
Story image
OpenText expands on integration with Microsoft Teams
OpenText Extended ECM, allows users to surface Teams content in context of relevant business processes across the enterprise, creating links with line-of-business systems and applications such as Salesforce, SAP, or Oracle.More
Link image
Why TAPs are the building blocks of visibility
These hardworking devices provide the most effective way to copy actual traffic running across a system, so you can better monitor, secure and analyse your infrastructure.More
Story image
Fortinet web application firewalls help secure business continuity
Cornelius Mare, Fortinet A/NZ Director, Security Solutions, provides an overview of the importance of web apps for business continuity and what it takes to secure them.More
Link image
5 reasons to embrace cloud-based communications & ditch legacy systems
If you’re seeking more robust and flexible communications, cloud is an obvious choice. But what makes it so appealing? Join webinar experts on July 23 from 11am as they help you explore the possibilities.More
Story image
What is the future of SD-WAN security? integration
"While SD-WAN is still a relatively new technology, it makes sense to start the convergence process sooner rather than later to avoid potential security risks and deliver a better user experience and simplify operations.”More