Gartner: 40% of generative AI solutions to be multimodal by 2027
Gartner has forecast that 40% of generative AI (GenAI) solutions will become multimodal, incorporating text, image, audio, and video by 2027, a significant rise from the 1% seen in 2023. This prediction was shared at the Gartner IT Symposium on the Gold Coast, Australia, where analysts explored the latest trends in artificial intelligence.
Erick Brethenoux, Distinguished VP Analyst at Gartner, noted, "As the GenAI market evolves towards models natively trained on more than one modality, this helps capture relationships between different data streams and has the potential to scale the benefits of GenAI across all data types and applications. It also allows AI to support humans in performing more tasks, regardless of the environment."
Multimodal GenAI, identified as having high-impact potential in the 2024 Gartner Hype Cycle for Generative AI, along with open-source large language models (LLMs), is expected to lead to notable competitive advantages and time-to-market benefits within the next five years. Gartner highlighted that among the GenAI innovations expected to reach mainstream adoption within a decade, domain-specific GenAI models and autonomous agents hold the highest potential.
Arun Chandrasekaran, Distinguished VP Analyst at Gartner, stated, "Navigating the GenAI ecosystem will continue to be overwhelming for enterprises due to a chaotic and fast-moving ecosystem of technologies and vendors. GenAI is in the Trough of Disillusionment with the beginning of industry consolidation. Real benefits will emerge once the hype subsides, with advances in capabilities likely to come at a rapid pace over the next few years."
The transformation to multimodal GenAI is anticipated to enhance enterprise applications by enabling the introduction of new features and functionalities. Many multimodal models are limited to two or three modalities, but this diversity is expected to increase in the coming years.
Brethenoux explained, "In the real world, people encounter and comprehend information through a combination of different modalities such as audio, visual and sensing. Multimodal GenAI is important because data is typically multimodal. When single modality models are combined or assembled to support multimodal GenAI applications, it often leads to latency and less accurate results, resulting in a lower quality experience."
Regarding open-source LLMs, Chandrasekaran said, "Open-source LLMs increase innovation potential through customisation, better control over privacy and security, model transparency, ability to leverage collaborative development, and potential to reduce vendor lock-in. Ultimately, they offer enterprises smaller models that are easier and less costly to train, and enable business applications and core business processes."
Domain-specific GenAI models, optimised for specific industries, business functions, or tasks, can improve use-case alignment within enterprises. These models also deliver enhanced accuracy, security, privacy, and contextualised answers while reducing the need for advanced prompt engineering.
Chandrasekaran added, "Domain-specific models can achieve faster time to value, improved performance and enhanced security for AI projects by providing a more advanced starting point for industry-specific tasks. This will encourage broader adoption of GenAI because organisations will be able to apply them to use cases where general-purpose models are not performant enough."
Autonomous agents, systems capable of achieving goals without human intervention, use AI techniques to identify patterns, make decisions, and generate outputs. Brethenoux commented, "Autonomous agents represent a significant shift in AI capabilities. Their independent operation and decision capabilities enable them to improve business operations, enhance customer experiences and enable new products and services. This will likely deliver cost savings, granting a competitive edge. It also poses an organisational workforce shift from delivery to supervision."