IT Brief Australia - Technology news for CIOs & IT decision-makers
Story image

Datadog enhances AWS monitoring with AI/ML integrations

Tue, 3rd Dec 2024

Datadog has announced continued investment in its Amazon Web Services (AWS) monitoring product portfolio, emphasising integration with AI/ML applications as well as serverless and containerised environments.

The company now offers over 100 unique AWS service integrations, with a focus on AI/ML services. Notable organisations such as AppFolio, Asana, Maersk, and andsafe utilise these integrations to monitor AWS services.

Yanbing Li, Chief Product Officer at Datadog, remarked, "We continue to see companies rely on Datadog for enterprise-scale observability at an accelerated rate. Trends like AI/ML, cloud migration, serverless, and containers — and the need to monitor and optimise resources for all these areas — have helped to accelerate this growth as companies search to better understand their LLM usage, infrastructure performance, and cloud costs."

Among the enhanced capabilities, Datadog's platform supports AWS Trainium and AWS Inferentia ML chip monitoring. This feature assists customers in optimising model performance, preventing service interruptions, and scaling infrastructure as ML workloads grow.

Additional integrations include Amazon Bedrock, allowing teams to monitor AI models' FM usage, API performance, and error rates through runtime metrics and logs. Amazon SageMaker integration enables data scientists and engineers to collect and visualise metrics, facilitating quick issue resolution and performance improvement of ML endpoints and jobs.

Kyle Triplett, VP of Product at AppFolio, said, "The Datadog LLM Observability solution helps our team understand, debug, and evaluate the usage and performance of our GenAI applications. With it, we are able to address real-world issues, including monitoring response quality to prevent negative interactions and performance degradations, while ensuring we are providing our end users with positive experiences."

James Adams, Machine Learning Engineering Manager at Cash App, commented, "We explored a bunch of different hosted solutions and found that SageMaker solved all the problems that we were encountering. And we did some stress testing with it and it held up to the traffic that we expected to be sending through the system. With Datadog, it has all these AI integrations—including SageMaker—that we're using heavily."

Marcel Drechsler, Senior Cloud Solutions Engineer at andsafe, stated, "andsafe has been all in on Amazon Web Services since day one and our infrastructure is based on microservices which are running on Amazon EKS. To monitor the resource consumption, we are utilising the container monitoring tools of Datadog. As a result, we were able to decrease the resource consumption and make the process much faster."

Follow us on:
Follow us on LinkedIn Follow us on X
Share on:
Share on LinkedIn Share on X