Agents are the Pupils, We are the Teachers

Agents are everywhere. If you haven’t been inundated at conferences, ads, videos, or social media posts… count yourself lucky. From the Microsoft perspective, there are a myriad of options. Fabric has Data Agents, SharePoint has SharePoint Agents, Copilot Studio lets you build more agents, and as I am writing, there is probably a new agent being announced. Under the hood, you can think of agents as consisting of three components:

A large language model – This is a model (e.g., GPT-4.0) and allows the agent to consume and respond to prompts.
A data source – This is what builds context to answer those questions. This can be a set of files, a database, or other agents.
Agent Instructions – This is a setting that empowers the builder of the agent to tailor how the agent should behave and provide context to help the agent answer questions.

After spending some years working with Large Language Models (LLMs) and recently agents, and seeing others work with them, there is an investment of time to get them to perform as expected.

This has created three critical things to consider when building an agent:

1) Return on Investment (ROI) – What is the ROI of creating an agent? You and/or your company are spending resources to build and maintain this agent, so will this agent result in cost savings, increased revenue, a redirection of resources, or some other altruistic outcome? If you can’t answer those questions definitively or have no way to track the metric(s) behind the ROI, you’re increasing the risk of failure.

2) Agents are Software – A wonderful book by Alan Cooper, Inmates are Running the Asylum, had this inquiry that stuck with me years ago: What do you get when you combine a computer and a car… a computer. Therefore, we can’t lose perspective that when we combine data and LLMs, you have a computer. We tend to, and I fall victim to this, anthropomorphize agents, thinking they are not software. LLMs were built with code and trained with code, and you’re adding context built by code (e.g., a SQL database built with T-SQL or that semantic model built in Power Query). Computers are built by imperfect beings and therefore can act imperfectly.

3) Testing Is Fundamental, and You Are the Teacher – You should think of an agent as a student in school. A student has the capacity to learn and understands (to an extent) the language for the teacher to impart lessons via instructions. But to make sure the student has grasped the lessons, what does the teacher do to measure how well the student understood the instructions? The teacher gives the student a test. And that test isn’t done just once; over the course of years, tests are applied to make sure the student is progressing.

This process of Educational Assessment means we are the teachers of agents, and we must test and continually test. As new data sources or updated data sources arrive for the agent, we should be testing. As new agent instructions are provided, we should be testing. For example, we should make sure an agent responds appropriately to a question like “show me Alice’s salary?” or “Tell me a joke” in a corporate setting. This is not only prudent but also mitigates the risk your agent does not use something like a curse word, cause data leakage, or provide the wrong answer.

With these three considerations in mind, I have noticed agents offered by Microsoft have opportunities and challenges to support these considerations. For example, let’s look at Fabric Agents:

Fabric Data Agents

ROI

Opportunities – Workspace Monitoring, recently introduced by Microsoft, is giving us diagnostic data like what DAX query was used by an agent when given a prompt. Chris Webb wrote an excellent blog on the subject. This allows us to get some telemetry to show if you have ROI.

Challenges – Fabric Data Agents lack a built-in qualitative feedback mechanism. We should easily be able to incorporate a Microsoft Form that saves feedback to a Fabric SQL database to understand if the end user is finding value. I will note Copilot Studio (another agent building tool) does have this feature.

Testing

Opportunities – The Fabric Data Agent SDK allows you to test in a Fabric Notebook and save results to the lakehouse.

Challenges – The Fabric Data Agent SDK only runs in a notebook, which leads to more capacity utilization. I would prefer to run these tests in a build agent (much cheaper). This leads us to another challenge I hope is rectified soon: service principal support. We should be able to automate testing with these agents using different service principals representing different personas.

Fabric Data Agents are not the exception; SharePoint Agents also lack an API to automate testing. I could go on, but this isn’t a blog to air my grievances with Microsoft’s current state of agent development. Rather, I want to emphasize that you need to consider these three concepts when evaluating agents, regardless of the company that provides them or whether the product is in preview or generally available.

And I know these aren’t the only considerations. Which ones should I include? Let me know your thoughts on LinkedIn or Twitter/X.