As senior tech leaders, we’re often caught between the promise of emerging technologies and the realities of building and running systems that actually work.
At YLD, we recently brought together a group of CTOs and engineering leaders for an evening of open discussion on making better sense of GenAI in today’s world. We were joined by industry experts like Sam Lowe and Perry Krug, and I had the opportunity to facilitate a conversation with them and the audience on how to prioritise GenAI initiatives that deliver real impact. We explored a variety of topics, from the evolution of GenAI to the role of AI Agents and why Evals are becoming essential to measuring performance and trustworthiness.
The objective was to have an open and engaging discussion to figure out how to use GenAI in ways that are reliable, scalable, and aligned with business needs through pragmatism.
From Queries to Context: How GenAI is transforming search
While chat interfaces have been the face of GenAI, the true value lies in enhancing search and retrieval capabilities. Traditionally, search engines relied on keyword matching, which often led to a disjointed user experience. However, the arrival of large language models (LLMs) has transformed search into a more intuitive and conversational process, enhancing the way users interact with information.
This evolution isn’t merely a technological upgrade but a strategic opportunity for businesses to provide users with contextually relevant and accurate information by integrating Retrieval-Augmented Generation (RAG) techniques. This approach is particularly beneficial in sectors like legal and customer service, where timely and precise information is crucial.
The integration of voice interfaces, on the other hand, is becoming increasingly significant as voice-enabled AI systems offer a hands-free, efficient way to access information, which is especially valuable in environments where multitasking is common. This trend indicates a move towards more natural and accessible user interactions with technology.
The Rise of AI Agents
Powerful models like ChatGPT continue to do everything from answering questions to writing code. They are large, general-purpose systems, known as monolithic models, and as AI adoption matures, organisations are moving away from this single-prompt approach toward decomposing complex tasks into smaller units and assigning them to AI Agents
From the traditional process of “context + prompt = output”, AI Agents instead interact with tools to get information and then perform the action, such as “tools + task + runtime = action”.
For example, a flight-booking agent doesn’t just return flight options. It searches for flights, chooses the best one based on criteria, completes the reservation, and confirms success, embodying autonomy by coordinating with APIs and managing the full task lifecycle. This approach moves away from traditional, linear prompt-response interactions as Agents now decide how and when to act, and whether further steps are needed.
In real-world applications, Agentic systems can also orchestrate multiple Agents wherein one handles research, another execution, cooperating to achieve complex goals. This “multi-Agent orchestration” is the main theme of modern AI stacks and represents a major leap toward autonomous, scalable Agents, which is cost-effective and responsive compared to a large single model.
However, smaller AI Agents also introduce orchestration as a new challenge. Imagine that you’re using five or ten different models behind the scenes, and you need a way to coordinate them smoothly. That’s where orchestration tools can support teams in managing how different AI Agents communicate and complete tasks as part of a broader task, as they break the task apart themselves and decide when to move on to the next stage.
The demanding part of building with GenAI isn’t the model, but it’s everything around it.
You still need a robust data infrastructure, thoughtful user interfaces, and strong monitoring in place. The fundamentals of good software engineering haven’t changed. What has changed is that AI now fits more naturally into existing systems, as long as you know how to connect the dots.
Building with GenAI is becoming less about creating a single model and more about designing intelligent systems of multiple Agents working together.
Model Evaluation (Evals)
Integrating AI models into scalable, user-friendly applications is complex and needs reliability and ease of use. This is where Evals are critical. Evals in GenAI applications should not be seen as a post-development checkbox, but rather an ongoing process that is instrumental to the AI model’s lifecycle. Evals ensure that models perform as intended, adapt to new data, and continue to deliver value. They help engineers choose the best model and maintain optimal performance in production.
A key risk with AI models is quality drift, wherein gradual degradation of performance occurs due to changing data patterns or user behaviour. Without continuous Evals, compromised performance may go unnoticed until it’s too late, impacting your business negatively.
To mitigate this, embracing continuous Evals with regular testing of model outputs, using fresh data, and relevant metrics is important. Additionally, cross-validation helps assess how well models generalise to real-world data, which is crucial in dynamic environments where user needs change quickly.
Human-in-the-loop systems are increasingly used as well in the Evals process. Human reviewers assess AI outputs, especially when confidence scores are low. This hybrid approach improves accuracy, trustworthiness, and alignment with user expectations and ethical standards.
Navigating governance in GenAI
Integrating GenAI across your business requires careful governance. While free AI tools may be easily accessible, they often come with risks around data privacy and usage.
Many AI providers use your data for model training, potentially exposing sensitive information. Additionally, bundling AI providers introduces further complexity, as each comes with its own terms regarding data usage.
In regulated industries, adhering to regulations like the EU’s AI Act is critical to avoid legal risks. Despite these challenges, organisations should encourage responsible AI experimentation, ensuring that innovation aligns with both business objectives and compliance standards.
The key is to make smart decisions that balance the benefits of AI with strong security, privacy, and ethics, ensuring good governance while unlocking AI’s value and protecting your organisation and its people.
–
If you’re exploring the potential of AI models or need support across AI engineering, MLOps, or data science, get in touch.