Observability becomes the biggest challenge for ML and LLM applications

In the application of AI and machine learning, observability has become a challenge that cannot be ignored. This article reveals the monitoring challenges faced by ML models in production environments and the strategies adopted by enterprises when adopting GenAI and LLM.
Core content:
1. Observability challenges of ML models in production environments
2. Enterprise adoption trends of GenAI and LLM
3. The rise of MLOps, LLMOps, and GenAIOps and their impact on observability
Production ML models face observability challenges! Custom tools are becoming mainstream, and only 7% are concerned about ML security. Enterprises are trying out GenAI and LLM, and predictive analytics and computer vision applications are surging. MLOps, LLMOps, and GenAIOps are on the rise, and LLM observability is crucial! OpenAI, AzureAI, and Amazon Bedrock are popular.
Observability and monitoring are the most cited challenges when putting ML models into production. The Institute for Ethical AI & Machine Learning [2] conducted a survey on the state of production ML in Q4 2024 [3] . Another key conclusion was that custom tools dominate user roadmaps as few vendor tools gain significant traction.
Overall, 44% of the 170 practitioners surveyed were machine learning engineers, with roughly the same number identifying as data scientists or MLOps engineers. Many of the respondents were subscribers to The ML Engineer newsletter [4] .
Only 7% said ML security was one of their top three challenges, and only 17% said the same about governance and domain risk. This finding is significantly different from what we have seen in other research, where security and AI governance were cited as one of the biggest barriers to increased adoption. We believe that practitioners view ML security only as related to the ability of models to be hacked, while other IT decision makers are more concerned about general access to company and personal data.
It seems like every enterprise is at least experimenting with generative AI and AI agents that rely on large language models [5] (LLMs). At the same time, applications for predictive analytics and computer vision continue to grow. As these applications scale, developers need data engineers, SREs, and others to handle Day 1 and Day 2 challenges [6] . To meet this challenge, MLOps [7] emerged as a real discipline, followed by LLMOps and GenAIOps [8] .
Regardless of the terminology used, LLM observability and monitoring [9] are issues that must be addressed.