Databricks invests in Lakehouse, announces new generative AI tools

Fri, 30th Jun 2023

FYI, this story is more than a year old

Databricks, the Data and AI company, has announced new Lakehouse AI innovations that allow customers to easily and efficiently develop generative AI applications, including large language models (LLMs), directly within the Databricks Lakehouse Platform.

Lakehouse AI offers a unique, data-centric approach to AI, with built-in capabilities for the entire AI lifecycle and underlying monitoring and governance. New features that will help customers more easily implement generative AI use cases include Vector Search, a curated collection of open source models, LLM-optimised Model Serving, MLflow 2.5 with LLM capabilities such as AI Gateway and Prompt Tools, and Lakehouse Monitoring.

The demand for generative AI drives disruption across industries, creating urgency for technical teams to build generative AI models and LLMs on top of their data to differentiate their offerings.

However, data determines success with AI, and when the data platform is separate from the AI platform, it's difficult to enforce and maintain clean, high-quality data. Additionally, the process of getting a model from experimentation to production, and the related tuning, operationalising, and monitoring of the models, is complex and unreliable.

With Lakehouse AI, Databricks unifies the data and AI platform, so customers can develop their generative AI solutions faster and more successfully, from using foundational SaaS models to training their custom models securely with their enterprise data. Organisations can accelerate their generative AI journey by bringing together data, AI models, LLM operations (LLMOps), and monitoring and governance on the Databricks Lakehouse Platform.

"At JetBlue, we inspire humanity through our product, culture and customer service. We've embarked on an AI transformation over the past year because we believe AI, and in particular LLMs, can fuel increased productivity and better customer experience for our travellers," says Sai Ravuru, senior manager of data science and analytics at JetBlue.

"Databricks has been instrumental in our AI and ML transformation and has helped us build our own LLM, enabling our team to more effectively use the BlueSky platform to make decisions using real-time streams of weather, aircraft sensors, FAA data feeds and more. The deployment is significantly improving our onboarding time for new users. We're excited about all of Databricks' data-centric AI innovations, enabling customers like us to build LLMs in the Lakehouse and govern them from there."

Lakehouse AI unifies the AI lifecycle, from data collection and preparation to model development and LLMOps, to serving and monitoring.

Newly announced capabilities include Vector Search, fine-tuning in AutoML, and curated open source models, backed by optimised model serving for high performance.

Databricks Vector Search enables developers to improve the accuracy of their generative AI responses through embedding search. It will fully manage and automatically create vector embeddings from files in Unity Catalog and keep them updated automatically through seamless integrations Databricks Model Serving. Additionally, developers can add query filters to provide even better user outcomes.

Databricks AutoML now brings a low-code approach to fine-tuning LLMs. Customers can securely fine-tune LLMs using their enterprise data and own the resulting model that AutoML produces without having to send data to a third party. Additionally, with MLflow, Unity Catalog and Model Serving integrations, the model can be easily shared within an organisation, governed for appropriate use, served for inference in production and monitored.

Databricks has published a curated list of open-source models available within Databricks Marketplace, including MPT-7B and Falcon-7B instruction-following and summarisation models and Stable Diffusion for image generation, making it easy to get started with generative AI across a variety of use cases. Lakehouse AI capabilities like Databricks Model Serving have been optimised for these models to ensure peak performance and cost optimisation.

Databricks also unveiled innovations in LLMOps with the announcement of MLflow 2.5, the latest release of the popular Linux Foundation open-source project MLflow. This is Databricks' latest contribution to one of the company's flagship open-source projects. MLflow is an open-source platform for the machine learning lifecycle that sees nearly 11 million monthly downloads.

MLflow 2.5 updates include MLflow AI Gateway and MLflow Prompt Tools.

MLflow AI Gateway enables organisations to centrally manage credentials for SaaS models or model APIs and provide access-controlled routes for querying. Organisations can then provide these routes to various teams to integrate into their workflows or projects. Developers can easily swap out the backend model anytime to improve cost and quality and switch across LLM providers. MLflow AI Gateway will also enable prediction caching to track repeated prompts and rate limiting to manage costs.

New, no-code visual tools allow users to compare various models' output based on a set of prompts automatically tracked within MLflow. With integration into Databricks Model Serving, customers can deploy the relevant model to production.

Additionally, following its release earlier this year, Databricks Model Serving has been optimised for the inference of LLMs up to 10x lower latency time and reduced costs. Fully managed by Databricks to offer frictionless infrastructure management, Model Serving now enables GPU-based inference support. It auto-logs and monitors all requests and responses to Delta Tables and ensures end-to-end lineage tracking through Unity Catalog. Finally, model serving quickly scales up from zero and back down as demand changes, reducing operational costs and ensuring customers pay only for the computer they use.

Databricks also expanded its data and AI monitoring capabilities by introducing Databricks Lakehouse Monitoring to monitor better and manage all data and AI assets within the Lakehouse. Databricks Lakehouse Monitoring provides end-to-end visibility into data pipelines to continuously monitor, tune and improve performance without additional tools and complexity. By taking advantage of Unity Catalog, Lakehouse Monitoring provides users with deep insight into the lineage of their data and AI assets to ensure high quality, accuracy and reliability. Proactive detection and reporting will make it easy to spot and diagnose pipeline errors, automatically perform root cause analysis and quickly find recommended solutions across the data lifecycle.

"We've reached an inflection point for organisations: leveraging AI is no longer aspirational, it is imperative for organisations to remain competitive," says Ali Ghodsi, co-founder and chief executive officer at Databricks. "Databricks has been on a mission to democratise data and AI for more than a decade and we're continuing to innovate as we make the Lakehouse the best place for building, owning and securing generative AI models."

Databricks continues to expand the Lakehouse Platform, recently announcing Lakehouse Apps and the general availability of Databricks Marketplace, LakehouseIQ, new governance capabilities, and Delta Lake 3.0.

Share on: