Jan. 8, 2026

#562 Agentic AI Is Not an Intern: Craig McLuckie on Control, Context, and Enterprise Reality

Show Notes
Transcript

Agentic AI is moving faster than enterprise readiness.

Boards are pushing adoption. Teams are deploying agents at speed. But security, control, and operational discipline are lagging behind.

In this episode, Mehmet sits down with Craig McLuckie, the co-creator of Kubernetes and founder of Stacklok, to unpack why most agentic AI initiatives break after the demo and what enterprises must do differently to make them durable, secure, and production-ready.

From MCP and context engineering to eval-driven development and why AI agents should never be treated like interns, this conversation goes deep into the realities CTOs, VPs of Engineering, and security leaders are facing right now.

This is not a hype conversation. It’s an operator’s reality check for 2026.

⸻

👤 About the Guest

Craig McLuckie is a foundational figure in modern cloud infrastructure. He is the co-creator of Kubernetes, founder of the Cloud Native Computing Foundation, and former VMware executive behind the Tanzu portfolio.

Today, Craig is the founder and CEO of Stacklok, where he is focused on helping enterprises securely connect agentic AI systems to real-world infrastructure through open, controlled, and auditable platforms.

https://www.linkedin.com/in/craigmcluckie/

⸻

🧠 Key Takeaways

• Why agentic AI represents a true epoch shift, not just another tooling cycle

• The real difference between demos, POCs, and production AI systems

• Why MCP is powerful but dangerous without proper control layers

• How context engineering is becoming more important than writing code

• Why eval-driven development replaces test-driven development in AI systems

• How enterprises should think about permissions, scope, and agent autonomy

• Why most AI failures are workflow problems, not model problems

• What 2026 realistically looks like for agentic AI adoption in the enterprise

⸻

🎯 What You’ll Learn

• How to operationalize agentic AI without exposing your infrastructure

• Why treating AI agents like humans is a security mistake

• How to design guardrails without slowing teams down

• Where CTOs should focus investment to move from hype to ROI

• How leadership metrics and engineering evaluation must evolve in the AI era

⸻

⏱ Episode Highlights & Timestamps

• 00:00 – Introduction and Craig’s journey from Google to Kubernetes

• 03:10 – Why agentic AI feels like a historic inflection point

• 06:05 – MCP explained and where enterprises get it wrong

• 10:45 – The security risks nobody is talking about

• 14:20 – Why AI agents should never be treated like interns

• 18:30 – The danger of permission sprawl and tool pollution

• 23:10 – Why most AI initiatives fail after the demo

• 28:40 – Eval-driven development vs traditional software thinking

• 34:15 – Context engineering as the new leverage point

• 38:50 – How engineering leadership and metrics must change

• 43:30 – What realistic agent adoption looks like in 2026

• 46:20 – Open source, ToolHive, and building durable AI platforms

⸻

🔗 Resources Mentioned

• Stacklok: http://stacklok.com/

• ToolHive (Open Source MCP Platform): https://stacklok.com/toolhive/

[00:00:00]

Mehmet: Hello and welcome back to a new episode of the CTO Show with Mamet today. I'm very pleased. Joining me, Craig McLuckie. He is the founder and CEO of Stacklok. We gonna talk today about. A topic which is I think very, very important in the world of [00:01:00] AI and, you know, development in AI automation. We talk a lot about also, you know, security.

We talk about, you know, how to control the environment. And this is exactly what you know me and Craig will discuss today. So without further ado, Craig, little bit about. You, your background, your journey, and you have a very rich journey just as a teaser to the audience. And then we gonna deep dive immediately in what we are gonna talk about.

So the floor is yours, Greg.

Craig: Alright, well hey, thanks for having me on. I really appreciate it. Um, let me just introduce myself quickly. So I'm Craig McLuckie, the founder and CEO of Stacklok. Um, and by way of background, um, I've been in distributed systems my pretty much my whole career. Uh, started my career at Microsoft and went to work at Google where I worked on, uh, what became known as, uh, Google Compute Engine, uh, kind of Google's, uh, entry into the infrastructure service space and, and really kind of that sort of, uh.

Uh, pivot towards more of an, an enterprise, uh, software company. [00:02:00] Um, I then got the chance to work on what became known as Kubernetes. So started that project with a, a couple of my friends inside Google, and obviously that worked out pretty well. A lot of folks use, uh, Kubernetes out there in the wild. Uh, and it is interesting to see it becoming an anchor technology for a lot of, uh.

Infrastructure management for, for AI tech. Um, I started the Cloud Native Computing Foundation as a, a home for Kubernetes and to support incubation, innovation, open source, built a company, sold that company to VMware, uh, where I worked on what was done as the Tanzi portfolio. And then I stepped out and started Stack Dock, which is hopefully what we'll get to talk about today, which is, uh, really an organization that's focused on.

Bringing the capabilities that enterprise need to be able to bridge AgTech and, uh, assistive technologies to their existing world of of IT systems.

Mehmet: Great. And thank you again, Craig, for being here with me today. Um, you know, I'm gonna start kind of, um, from your background relating to [00:03:00] what you're currently doing.

So you mentioned agent ai and it's now, you know, I remember in 2024, you know, with the start of gener generative ai and then 2025 we started to, to talk about, you know, agent tech, ai and how it can. Take us to the next level. So when you look today at what's happening with AI agents, do you feel, you know, it's history repeating itself when the same adoption because you know, you started the Kubernetes and the Cloud Foundation and so are we, and now we talk about CPS and all this stuff.

So are we kind of repeating the history again?

Craig: I always joke that, uh, history doesn't repeat itself, but it often rhymes. Um, I think it's, it is reminiscent of, you know, some of these kind of big epoch transitions, right? Like, um, I got to work on a, a dotcom during the dotcom era, like very early in my career.

Um, you know, I got to see, you know, client server technology start to take hold. [00:04:00] Um, got to be a participant in the kind of cloud journey, you know, building cloud technologies and cloud native technologies with Kubernetes. And it certainly does feel like a, an epoch boundary, but if anything, I would say it's, it's happening a lot quicker.

Uh, and the promise is significantly more disruptive than anything I've seen. Like I, I think this is actually. A pretty profound moment for for technologists. You know, I think if you look back at humanity, we've defined our kind of epoch by the dominant technology. You know, if it was the Stone Age and the Iron Age and the Industrial Age and the digital age, and I, I do think we're kind of entering into the IH.

So. I, I think it is reminiscent, but I think it's far more significant in many ways than, than anything we've seen before. Because at the end of the day, uh, it's just a new class of capability, right? Like it's a way to, um, you know, like everything we've built to, to date has been built on. Kind of distributed system foundations and some new kind of new ways of passing messages or persisting data to dis and this is just a whole new class of technology that [00:05:00] just creates entirely new capabilities.

Um, and I think that the impact would be profound. I think a lot of people overestimate the impact in the short horizon was something like this, but perhaps underestimate the impact, uh, on the longer horizon. So reminiscent, but I think far more significant.

Mehmet: Right now I want to jump in, you know, to to what you do today, uh, Craig, but I'm gonna relate a little bit to, you know, to, to your background also a little bit.

So. If I want to see the landscape of how we are deploying these agents. So of course we talk about cps, right? And I gotta, you know, which is like the model context protocol. So what kind of problems, you know, CPS were aim to solve, but what are like some of the. I would say issues that they, they, they came up, and I'm asking you this question in a similarity to when we started that I come from a infrastructure background as well.

So, you know, like every [00:06:00] new technique that comes, we, we try to solve what was a bottleneck before, but after a while we figure out like we need another management layer. So walk us through, you know, what's happening in this, uh, field today and. Where the need from, uh, you know, to come up with the Stacklok to solve this problem.

Craig: Yeah. And I think, um, you know, when you start looking at, you know, what the promise of agents are, right? Like it's, it's, it's effectively a system that can take data and turn it into knowledge, and then can take knowledge and turn it into action, right? So in, in the most simplistic terms, when you, when you kind of break it down to the, the most fundamentals, it's, it's a way to take, you know, data in a variety of different, you know, sources and, and, and turn it into something that.

Has, you know, kind of distilled awareness around it, which is what we think of as knowledge. And then based on that knowledge, you can actually make decisions. And then you turn those decisions into actions and, and an action could be. You know, filing a ticket, you know, uh, uh, initiating a transaction. Um, you know, [00:07:00] there's pretty much, pretty much anything on, on the back end of it.

And so the, the real challenge that I think organizations face is that question of how do you take data and turn into knowledge and, and the window, the, the gateway between kind of data knowledge is the context window with oms. That's, that's really what needs to be navigated. I think of it as a. A context real estate management problem.

And it's not about just, you know, kind of grabbing a whole bunch of, of, of data and dumping it into the context window. It's really about organizing it and presenting it in a way that is gonna produce the best possible results. And so I think that's kind of problem one, which is, you know, how does a real world enterprise turn data into knowledge via the context window and provide access in, in real time to those costs of systems.

And we've certainly seen these patterns around. You know, rag becoming very popular, which was really a way to take data and turn it into parametrized data that could be kind of augmented into the, into the flow. Um, but it, it quickly falls apart when you, when you're dealing with realtime systems. [00:08:00] So it's really about like that realtime access to the data that businesses are using.

And that obviously introduces a whole bunch of, of, kind of, um, challenges around, you know, the, this data exists in systems that are regulated or controlled or are sensitive. And, um, how do you, do you start to create the right controls to be able to access that from an authentication, authorization, auditability, traceability perspective, um, but also how do you start to massage that data and present it in a way that actually supports the workflows that these systems need to, to, to be effective.

And then the second part is, is, is turning. Um. That's knowledge into action. What are the appropriate set of controls? How do you start to structure these systems so that, uh, an agent, which is a stochastic process, right? Like the, it's, it's just like a person. It's valuable, right? Like it's, it's amazing, but it's valuable.

How do you start to create the right controls there? So I really think of it as this bridge that needs to exist. The reality is enterprises exist in [00:09:00] this world where. The people, the process and the technology are deeply intertwined. It's very difficult to separate those things out. Uh, and it's naive to assume that an enterprise is just gonna rip and refit everything it has.

So the question is, how do you start bringing these technologies in? Uh, and MCP has emerged as a, as a, as a wonderful enabler, right? Like when I, I encountered the protocol for the first time. I was like, wow, this is kinda like the days, like I remember seeing Docker for the first time when, um, you know, I'd been working on technologies like Kubernetes, and I was like, wow, that, that's just an amazing.

The useful thing to have where you can start to package things up and reuse 'em, et cetera. And I think like the, the protocol itself is very similar, like a, a, a natural language framework to describe things in a way that that models can process them. And a natural way to expose tools in a way that models can understand and exercise them.

Um, but the real work starts in terms of operationalizing and navigating that in the real world.

Mehmet: Right now, the question, Craig. Do [00:10:00] the rush of adopting AI in general and deploying agents also cause this kind of, uh, issues. Because, you know, I never saw a technology before, and correct me if I'm wrong, like you, uh, you, you're more expert than me in this, but I never seen, you know, a rush to, to deploy these technologies as what happened with, with agents.

Right. And in AI in general. Um. And do you see, like companies sometimes are doing kind of trade offs between security, maybe architecture to just, you know, get the things deployed. How, how are you seeing, you know, these trends, uh, in the enterprise today?

Craig: Oh yeah. It's, it's kind of crazy because, you know, I think.

I've never seen a technology where there's as strong, uh, level of board mandates to embrace the technology, right? Like, so in all of the organizations I've talked to, um, you [00:11:00] know, engineering teams are being metriced on how quickly they can adopt a technology. Like, I've very seldom seen this before in my, in my career, where, where the, the goal isn't just, you know, kind of, you know, like you always think about, like, you know, Hey, you're a CTO.

What do you care about? Or, you're a CI, what do you care about? Right? I wanna make money, I wanna save money, or I wanna mitigate risk. It's one of those three things,

Mehmet: right?

Craig: And a lot of folks are looking at this as a way to both, uh, you know, kind of make money, save money, and that mitigating risks. Part of it is, is not necessarily, um, being addressed.

And like, here's a funny statistic. Like we, we did this analysis quite recently and, and we were just looking at the, the, the rate of growth. So AI adoption and AI spend is outpacing any technology we've ever seen, ever, right? Like it's, it's the far fastest growing technology category. In the history of humanity.

Um, and the interesting thing is the only thing that is growing at a rate faster than AI spend is, uh, is the number of vulnerabilities that are being kind of exploited in an organization, right? So you look at organizations that have embraced an MCP [00:12:00] technology, uh, 20% of them report breaches like security breaches, you know, as a result of, of the technology.

You see a lot of folks out there in the wild. You know, like if you, if you look at the number of MCP servers that have been been published, there's something like 30,000 MCP servers that have been published. There's 450 GitHub MCP servers, and if you actually dig into it, about a thousand of those are malicious.

But, but which ones and how would you know? And um, and the, the problem that I, I think is this, this unbearable tension that a lot of organizations are feeling. They, they feel this acute need to be able to embrace these technologies because they see it as a critical. Differentiator, competitive differentiator.

Like there will be haves and have nots, there will be winners and losers in this space. And so every organization is racing to be a winner. Um, but they don't have the skills, they don't have the controls, and a lot of people are getting hurt. Like I see a lot of people getting hurt right now because they don't understand the new challenges associated with it.

They're, they're, they're throwing the tools into the teams. There's a lot of [00:13:00] shadow, um, shadow AI use happening. And, and a lot of ex sort of exploits are being kind of targeted at developers that are NPX running, uh, right. A, a random package off the internet that's actually malicious or has a supply chain vulnerability in it or, or, or what have you.

Mehmet: Right Craig? Like if you want to stay in the security, um, you know, perspective.

Craig: Yeah.

Mehmet: So what, what should be, what should be worrying for us the most? Is it like the data leakage? Is it like unauthorized actions? Because, you know, we are giving also, you know, authorization for these agents to do some actions sometimes.

Or is it like the, the hallucination, you know, the, the, the context of Yeah, like the, they can just. Uh, come up with, with something which is nonsense, and then, you know, give it to the other agent and then again, start to do nonsense. So what, what should be worrying for us the most among these,

Craig: I mean, I think, I think the, the, the, the, the root [00:14:00] of, of a lot of the challenges that we face right now is that the, the reflexive position of the CISO is to.

Push the responsibility or the accountability onto the individual that's using ai, right? Like it's a very comfortable and convenient thing to do because you can basically say, Hey, what I'm gonna do is just treat any of these AI tools. So hey, we build a, uh, you know, like, Hey, we bought cursor licenses for instance, or we bought a cloud code licenses.

We've bought whatever licenses We've trained our developers around security best practices. We're just going to treat the agents that are being built and run by developers to support their coding activities as developers, right? And we're gonna push the onus of responsibility onto the developers. We're gonna leverage our existing systems for, um, you know, how we, how we authenticate, how we authorize, you know, how we reason about role-based access control, or that we're just gonna treat those agents as the developers that initiated them.

Uh, and, and things fall, fall apart pretty quickly there, right? Because. When you're a developer using these tools, you [00:15:00] get trained to go accept, accept, accept, accept, accept. And like you, you look at like how many people out there are, you know, kind of like joking about, hey, you know, kind of one of these coding assistants, you know, RMRF, my, my route directory.

And I, you know, like, you know, like, ha ha ha, but, but what you're really seeing is this numbing where, you know, people are not good, like developers are not good. And like no one, like, you know, like they're great at developing code. They're not great at kind of moment to moment watching something and making sure it doesn't do something done right.

And so I think that, you know, the starting point is if you, if you start to just treat these systems, you know, as, as as human actors, you, you're gonna get hurt. And you really do need to start reasoning about like, you know, how do I start to. Bound access to something, to a specific task. You know, like, I'll give you a good example of like, right now, Amazon produced this, this NCP server, the A-W-S-N-C-P server, which they've done a great job, right?

So it's, it's a, it's a pretty high surface area, NCP server. They recognize, you know, the AWS [00:16:00] surface is so vast, it's very difficult for an open source project to kind of track it in an accurate terms. Uh, and you know, when you start using tools, tool pollution is real problems. They've built a single endpoint that enables a developer to integrate it in, and there's the, the totality of the AWS surface area that they can use everything that they're authorized to see.

The agent that's running with their credentials can now see that could be very dangerous. Right? Like it's, you just think about like, okay. There's this thing that's doing work, it's being sort of watched by a developer, but, but the developer's getting numb to just pressing, accept, accept, accept. 'cause.

Mehmet: Yes.

Craig: If, if 99% of the time it does the right thing, then 1%, you know, like, and you see it work, 99% of the time you stop thinking because, oh, you just assume it's, it's, it's, right. You're not looking for that 1% moment where it's gonna do something dumb in your AWS. Um, so, you know, simple question. How would you set that up so that it only has, it has read only access?

Like, you know, how do you actually start to implement a, a system so that if your developers are using this in gen mode, it's readly access and then additional [00:17:00] controls are in place when it's, when it's actually manipulating your, your resources changing your cloud configuration. Um, and so that is an area where, you know, I think a lot of people are gonna get hurt in the, in the short horizon by not taking the time, step back, assess how these workflows need to run.

Understand what capabilities need to be put in place. Not, not just like, you can't pinball between this like thou shalt mark with developers, right? Like they, they will find a way to get to what they want to do. Um, so you really have to balance unlocking them with the tools that they want to use that are actually gonna enable 'em to do the job better with the controls that you need to have in place so that you're not gonna get hurt as these systems, uh, perform.

Um, so that's, that's, I mean, that's one category. There's obviously a lot of different categories. Mm-hmm. Like whole new classes of security vulnerabilities, you know, um, prompt injection via, via tool calling. Like there's, there's oceans and oceans of, of new vulnerabilities that are being discovered and exploited at a time when organizations don't have a really robust understanding of, of, of the technology.

So I'm not here [00:18:00] to kind of call doom and gloom. I think there's some very practical things organizations can do to both unlock the productivity of these tools, but also establish the controls that you need in place to, to be able to put these things to work. Um, but it does take thought and, uh, and investment.

Mehmet: Right. Craig, you talk in that context. You, you, you, I, I, you know, been doing the research. You said like, we should, uh, treat, uh, agents, we should make AI agents, I mean, I mean, uh, senior developers, not interns. Right. And this is going back to. All the things that you just mentioned, and you know how us as humans, you know, we should give some, you know, probably context.

We, we need to do now, how we can enhance this from our, our side, because do you think we. And this may be a human nature because we, we assume that, yeah, it says a machine, like it should be like smart enough to do the thing, and then we remove ourself from the loop and then we keep it. So how much, you know, involvement should stay [00:19:00] from, from us, you know, the, the, the, the seniors in, in that, uh, life cycle.

And here I'm talking about even like the development part also as well. Um, and where should I be focusing today if I am. Probably maybe a VP engineering CTO. And I want to guide my team on how to try as much as possible to let these agents fail. So where I should be focusing, uh, more? Is it about supervision?

Is it about, you know, um. You know, making judgements or not giving the full autonomy and keep myself as a human in the loop. What, what's your say?

Craig: Yeah, I mean, it's, it's interesting 'cause you know, we, we've, we've certainly been running our own surveys around developer productivity with these tools in place and it's, it's fascinating because, you know, like you go speak to a VP of engineering, you, you basically ask 'em the question like, how much more productive is your team with these tools?

And it's, it's kind of, you know, you'll get a lot of different answers [00:20:00] and I think, you know, people want to be perceived as, as being effective at deploying and engaging these tools, but in a relatively naive manner. You know, based on our own analysis. You know, when you just throw a bunch of, of coding assistant tools into a team, um, and just tell 'em to use them, you don't, you don't see productivity gains actually.

You see productivity declines, right? There's a, a lot of these basic antipas that start to emerge, right? You look at the amount of time it takes to merge a pr. Um, and you look at the size of prs, the amount of time it takes to merge and the, and the sort of throughput of, of feature delivery. In many cases, actually the clients, because, you know, hey, there's a developer that's producing a 10,000 line vibe coded pr, and they're basically just pushing the, uh, the owners onto their teammates to go through the review cycle, to actually turn that into qualified code.

Their teammates, you know, have to sign off, uh, you know, during kind of code review on on, on what, on the work product. And it kind of slows things down, you know, in many cases. I think, you know, stepping back, the, the thing I keep coming back to is this is a fundamentally new class of system. Like, this is like, just [00:21:00] the way that it works, the way that it operates is stochastic by nature, right?

And so the way that we develop needs to change, right? Like, it's not, it's not like, you know, traditional distributed systems development where you establish a set of patterns. You define a set of use cases. Uh, use cases become the proof of correctness. You start building the system. You make sure that, that the, well the unit tests become the proof of correctness.

You build the system, you run the unit test, everything works fine. You figure out where the scaling and contention points are, and you go from there. That's not how these systems work, right? Um, you have to start with like, what does the model understand? Not just know, but understand, uh, how does it start to produce work and how can I start to create eval cycles, evaluation cycles to make sure that it's operating, you know, within the boundaries of, of, of, uh, of, of, of, of what I expect.

And so, you know, one of the things that I see is, you know, like you need to start reasoning about the compartmentalization and the, the deconstruction of the workflow that the, the development team has put. So let's just, if we go back to like, using these [00:22:00] tools for code development, um, you know, one of the things that I think has got a lot of airtime recently is things like spec kit, you know, which is basically just a set of, of, of prompts that provide a, a, a guided path for an agent to stop participating in the development workflow.

And the reason that that's so powerful is that someone actually went down and wrote down the workflow, like a priori, they sat down and described what the workflow is. They deconstructed two set of steps they described when operator or human participation is involved in the steps. And you started to be able to reason about that workflow in a way that is very kind of AI native.

And so I think for organizations that are starting to embrace these technologies, it, it really isn't just about like. Hey, let's just kind of throw these technologies out there and, and see what sticks. You really need to start reasoning about the workflow, deconstruct into smaller consumable pieces, and then figure out how to run evaluations so that you actually are asserting that something is, is working as it as it should for a specific task.

Make sure that it's [00:23:00] operating within the, the, the, the characteristics that you, you, you want to see, and then set up control. So for instance, like, you know, in the engineering cycle, you know, one of the things that we've been doing inside Stacklok is. Um, start to get a lot more formal around our planning cycle.

Like, you know, this is what planning looks like, these are the resources that are produced. These are canonical examples of, um, you know, a, a use case or this or that. Uh, this is where all of our conversations with our clients are being, um, archived. So as we start going through the planning cycle, we can formally describe the workflow and then, you know, we can start to describe a lot of the artifacts as, as assets that, that, that support the workflow.

We can build an MCP server or something like that to, to ven it. And it creates a lot more structure. And so you, you're able to start, you know, in the case of our development flow, we're not necessarily running, you know, eval cycles around development flow, but what we are doing is observing our productivity at each step of the game.

And we're reasoning about, you know, is this increasing, is productivity is decreasing, you know, where are hallucinations getting into our specifications? You know, what additional CRU is [00:24:00] required? Can we use these systems to kind of, you know, run, run their own internal evals against the work products to make sure that we identify these gaps.

And so just being very disciplined in terms of workflow definition, compartmentalization, and then reasoning about execution against smaller tasks and then stepping back over time once the system is working is, is, is, is kind of key to success.

Mehmet: I think Greg, this also, correct me if I'm wrong, uh, and feel free to add anything.

So what you just described also, I believe, helps organization the enterprise to implement these, uh, guardrails in a better way. Because when you dissect, you know, the, the processes that you're doing. I believe you would have a better idea where today actually maybe, uh, you have like shortcomings in, in the context of where might authorization is implemented in the wrong way, where the data leakage might happen because you don't have the proper security measures.

Yeah. And then [00:25:00] now when, when, when you try to, to make it like. In an AI agent perspective. So automatically you are kind of implementing the best practices. Can I think about it this way?

Craig: I think so, and I think it's a question of, you know, it's, it's, it's really about what platform investments are you making?

Like where are the, like, and I think a lot of enterprises really struggle with this, and I, you know, like when you look at the, the projects that are out there in the wild, um. One of the most commonly cited challenges that organizations have in business technologies is the state of the art is changing so quickly.

Right? So you're like, okay, we're gonna build an agent to support, you know, kind of client service, like, uh, you know, uh, customer, um, customer support. Mm-hmm. And so we, we, we pick a model and then we start to structure these things and we build a workflow and we do all of this. And by the time we've done, the state of the art has changed completely, right?

Like, you just go and, you know, like a new frontier model's come out. That's. You know, three times better at this and it invalidates a lot of your assumptions around factoring and you back, you know, you get caught in this sort of [00:26:00] treadmill of, of this sort of ever increasing capability. And so I think starting and looking at the systems that are, are the slowest changing and, and kind of working out from there.

Right? So, and this is why it's something like MCP is such a great story is that. You know, Hey, I've got all these systems that I need to integrate with. Let's just figure out a way to start formally defining those integration points. Let's get to a point where we can authenticate and authorize, but not necessarily using an open-ended credential that, that, that an like a, that an admin has or an operator has, but like as a d scope credential, that that can be identified as that agent that has access to only the resources that are necessary to support that workflow.

And then as you start to perform the work, maybe you discover that, hey, additional, you know. Access is required, have a way for it to start, you know, kind of escalating the, the, the privilege of the system as you, you start to pursue, uh, having the ability to just have all disability and traceability. Like, what systems is this touching?

When did I touch it? Like, you know, like, oh, something went bad. How do I go back and, and sort of debug and diagnose [00:27:00] that. Um, and starting, you know, at the interfaces to the existing systems and kind of working out from there is, is helpful. And, and you know, like this might be a bit of a deconstructionist.

Perspective, but I see a lot of organizations worrying about like, you know, ag agentic, orchestration, and relatively complex structures at a time when most organizations are gonna be very well served by just having access to a frontier model and some kick ass tools and being able to wrap that up in a relatively minimal way.

We'll probably produce better results than, you know, and, and, and even if it doesn't produce those better results right now. I guarantee in six months it will produce better results than anything that you can produce using kind of like a, a sort of a, a more complex orchestrated system using, you know, reflection patterns and, you know, kind of, you know, self-analysis, right?

Like the frontier models are advancing at such a, a rate that starting with a very, very simplistic system, which is just, Hey, here's a set of tools. They're relatively constrained. This limits these frontier models too. You know, the set of [00:28:00] resources that are germane to the task. We've got the right controls in place, we've got the right visibility in place, we've got the right author.

C uh, we can now start to create real value. Is, is, is gonna produce better results for, for almost everyone. Uh,

Mehmet: Craig on that, do you think this is one of the reasons why we see some and, and you know, there were a lot of reports coming out, uh, in 2025 about like, you know, AI initiatives failing. So do you think this is one of the reasons why after successful demos and promising, you know, um, roadmaps, we have not achieved the ROI that we thought about?

So. And you know, if there's any other also as well and how I can make, you know, what would we call them? Proof of concepts. Successful durable systems building on agent ai.

Craig: Well, I mean, part of the problem is people don't understand how they work, right? So it's, it's one thing to [00:29:00] get a POC working. It's a whole nother thing to get a production system working.

And it's, it's one thing to get something working. It's another thing to keep it working. And I think a lot of the problems that people have is they're not investing in how they, you have to change how you build, right? Like, like right now, TDD test driven development, pretty popular. Like, hey, you know, write down the unit test code to, you know, you know, code to those tests, get it running and it runs.

Unit tests aren't proof of correctness for AI models, it's evals. And these are stochastic models, right? Like it's, it's, it's probabilistic by nature. It's just taking a bunch of, of, of kind of input context, you know, in, in the context window. And it's pr, it's, it's, it's, it's basically doing a lookup against embeddings and it's producing the most likely response to that input.

Now, obviously as that input changes, the output's gonna change. As the embeddings change, the output's gonna change. Like these, it's not a system where you can just, you know, put it [00:30:00] together and walk away and expect it to continue to work unless you are running constant evils, unless you're constantly observing and assessing the behavior of the system.

Any additional form of entropy, any noise coming into the system is going to impact its behavior, it's gonna impact its operating characteristics. So you know, hey, you've got an MCP server, it works great. You add another MCP server, it works okay. You add another MCP server and you've, you've discovered like, Hey, I've just saturated my context window with spurious tool descriptions, which completely degrade the performance of the system that I'm trying to build.

Um, and you know, if you're, if you're adding those things at, at runtime, you know, into a production system, forget about it. Um, you know, you have to be able to establish those controls. So I think the starting point is just you have to change how you work. You have to start with this kind of, you know, like, I always like to ask people that, that we, we interact with 'em like, what Evolve framework you using?

They're like, evolve, welcome. I'm like, if, if, if you're not running evils against like an energetic system, if you're not looking at that as. The way that you both get to a point of evidence, [00:31:00] if you don't have the ability to generate synthetic training data or, or, or, or just, you know, sample training data that you can use to kind of prove the correctness of the system as you're building you, you're gonna have a very hard time.

Um, and so I think, you know, first is, is, is really just changing the way you think about building and, and embracing this kind of what I think it was like. You know, kind of eval driven development or spectrum driven developments and, and, and using the eval framework as a proof of correctness. The second thing is, is, is context is hard, right?

Like it's, a lot of people have been struggling to just render context Now, context isn't data, like data, like it needs to be presented in a way that has the highest relevance to the task at hand. I'll give you an example. Like if you're a developer working on GIS, like ma, like adding mapping features to your application.

Um, feature means something, it's a collection effect that describe some to topographical elements that's main when you're interacting with that, like, you know, feature means something else to almost every other developer in the world. It's something that you're working on. [00:32:00] Like how do you contextualize it?

Like how do you describe that this is a feature which means these things? And so, you know, just dropping a spec in, uh, you know, dropping a document and that has, you know, feature X without additional markup that actually. Contextualizes what feature means over here is gonna produce bad results. And so that sort of second big kind of gap I think of is, is the context chasm.

Like how do you go from just grabbing data and actually turning it into the most valuable context? Real estate you can, uh, through a variety of techniques, like there's a lot of different ways to, to, to start to reason about turning data into context. Um, and that's another area that I don't think organizations are investing enough in.

Um, and so between those two, I think that, that, that results in a, a lot of the, um, uh, a lot of the, the challenges that organizations see in terms of going from, from POC to pro use of agents.

Mehmet: Great. Um, Craig, I'm gonna ask you, um, your solution. You decided to make [00:33:00] it also Tool Hive. It's called Tool Hive, right?

Free and open source. Of course, you have the enterprise version. How much, you know, the background you came from, you know, from Kubernetes and Cloud Native Foundation, how this has, you know, affected the decision and do. Have you felt that there will be a kind of a counterbalance, uh, of power in the AI area?

So, you know how much this affected, you know, the, the decision to open source the solution in, in a

Craig: Yeah, I mean it's, it's not surprising. Um, you know, when I was building Kubernetes, um, I think a lot of folks inside Google thought I was crazy wanting to open source something that, um. You know, like Google had a lot of ip, intellectual property around, you know?

Right. And some those technologies and, um, you know, taking a lot of that knowledge and distilling it out into an open source technology that we were effectively giving the community, [00:34:00] um, wasn't obvious to a lot of people when we did it. You know, there were a lot of discussions around why it made sense to do that.

And for me, I think it, it really comes down to a couple things, like, um, I believe that a platform needs to exist in this space. Like, I really believe that, you know, enterprises need a platform that enable them to connect. Agents to, um, their existing systems, you know, both in terms of rendering context and, and, and asserting control.

Um, and I believe that, you know, over time, uh, you know, open source platforms tend to outperform proprietary platforms. Like that just seems to be like, you know, but when you look at, at the, at the world out there, um, you know, there's an open source alternative to every proprietary technologies, you know.

Linux to Windows, right. Um, Kubernetes to, um, you know, a lot of the, the sort of proprietary alternatives or, or even those were open source, but there were single vendor, open source. Um, and so I think, you know, open source and community-centric, open source drives, differentiated results. Uh, for me, I'm, I'm a [00:35:00] startup.

Um, you know, I've, I've got some great engineers. You know, we we're working really hard. But we would certainly benefit from having other organizations participate in this platform building exercise. So, you know, we, we've been talking to Red Hat. Red Hat is helping us build Tool Hive. Um, I think, you know, tool Hive is probably the only open source MCP platform that has external maintainers on it.

It's not a single vendor initiative. Red Hat feels as much ownership of this as we do. Um, you know, when the, when the time is right and we find the right foundation, you know, we'll look to bring the, the core runtime into a foundation. And by doing that we get a lot of value. It's not just about, Hey, you know, there's extra engineers.

Like those radar people are really smart. They understand enterprise, they understand what the challenges are. They, they're working in the space. They bring their own, um, perspectives and they, that they make it better because. Of who they are. Um, and as we've, we've kind of been looking at the use of Tool Hive.

There's, there's a lot of other organizations out there that are starting to embrace it. You know, like we, we, we find, you know, kind of every other day we have a conversation where [00:36:00] there's some new exciting West Coast Tech company that's, that's adopted Tool Hive and is using it and they want something in the community or they, they want whatever.

And so if you're building a platform, having a, you know, built an open source core just creates. A far more rapid path to engagement and adoption. Um, and I don't feel like Stacklok has to own this. It's, it's not our intent to like own the platform story. Mm-hmm. Like we want to create value around it. We want it to become ubiquitous.

We want everyone to embrace it. And by starting with kind of community-centric open source core, we have a, you know, we have a better opportunity to create something that will have a Kubernetes like durability. Then just another flash in the pan. Um, you know Right. AI startup.

Mehmet: Right. Greg, like you mentioned about your, your, your team.

So compared to how you built engineering teams before and you know how you built engineering teams today, so in this age of ai, like what skills are [00:37:00] becoming like less important and which ones like are like non-negotiable when building your team today?

Craig: You know, it's interesting, um, and it's, you know, like when you start looking at the space, there's, there's definitely this process of becoming AI capable, AI aware, you know, kind of ai, uh, sort of maximalist if you will.

Like, um, there's a, there's a, there's a change in mindset. There's this growing awareness of, of how to deploy these tools. Like there's an inordinate amount of emphasis on context engineering versus, um. Systems engineering, you know, like this, this, this, this, this nominal idea that knowing like, you know, understanding what these models understand, not just what they know and, and how to kind of start to, you know, contextualize it is, is important.

But I think that's something that can be learned, right? Like it's, it's, it's not something that, it's like you have it or you don't have it. Like, um, you know, being an expert in context engineering is something that you can grow over time. Uh, and I think it's a lot easier to teach a great [00:38:00] engineer that has really strong.

Engineering foundations fundamentals, uh, that has excellent attention to detail that has, you know, kind of, uh, deep sophistication that's able to reason about relatively complex systems and how they interact, um, that has experience with, with those types of, of systems to, to become, um, you know, to become an, an expert in context engineering.

But the, the one thing that, that I think is like, you have to be AI curious, right? Like this, the, the, I think the thing that, that. Is that sort of like the, the, like this, these kind of two hard lines for me. Like one hard line is you have to be a great engineer, by the way. I think about a great engineer. So being able to actually see the systems, understand the patterns, uh, reason about the deconstruction of the problem.

Smaller pieces execute against that with attention to detail like that, that you, you could, you continue to have to have. But this new thing is you have to be our curious, um, like, I don't know how to better describe it. If you're an engineer that just delights in writing code by hand and you just wanna sit in a room and produce [00:39:00] beautiful code by hand, um, you're not gonna have a job in, in, in five years.

Like, like code is going to become io, it's gonna become like by code over time. We're not there yet. Don't get me wrong. Like we've still gotta way to go, but like we're not seeing the asymptotic convergence of these systems in terms of their production of, of code. Um, like how your work is gonna have to change like this, this sort of agonizing around the specification like.

You know, like reimagining yourself as a context engineer where the thing you're building is still an engineering system, right? Like the, the work product is still code, but the place where you're creating value is in, in, in the context engineering space, not in the actual manipulation of code by hand space.

Um, and so I think that that kind of natural curiosity and that that willingness to embrace these new systems is it's essential if you don't have that curiosity. You're gonna get left behind and, and you're gonna have a hard time because, you know if, if all you love doing is, is, is turning a specification of PM hands you into beautiful code.

I'm telling [00:40:00] you now, these things will do it in three years and it's gonna be tough. H

Mehmet: how that change for leaders, Greg, like how, how, how this will affect the way you evaluate the team. Like it's, it's different. The metrics are different than before, right?

Craig: The metrics are different. Um, and it's, it's interesting because there's really two sides to this, right?

Like, you know, obviously you want the team to continue to execute, right? So you, you want reason about and look different, like I'm, I'm not gonna, kind of, every engineering leader has a different way of reasoning about the through progressing. Mm-hmm. You know, some people use feature points, you know, there's a set of heuristics.

Every engineering manager has their own sort of favorites. Um, and so you, you want to maintain that continuity and like, I think it's, like I said earlier, it's dangerous to just like throw cursor or cloud code or something in there,

Mehmet: right.

Craig: Uh, without any structure and assume that you're just gonna get better results.

You're not like you, your, your engineers are going to stumble. Um, and so the second part of it is really what investments, like, what like [00:41:00] named investments are you making around building your own operational capabilities? Like what set of things are you as an engineering leader? Earmarking to change how your team works, right?

So, so it's also, and like, you know, you have to maintain velocity, but you also need to make sure that you're earmarking specific things. And so, you know, one of the things, like we, we, we've just run this as a series, like initially a series of experiments, like let's just build something for our own use.

So we built a knowledge server like, um, silly example. We basically, you know, wanted to make sure that every engineer had access to the best information at any moment, and that could be exposed as an TP tool into their workflows so that if you're using these tools mm-hmm. You can ask questions like, what major issues are the community encountering?

You know, what's so and so working on, et cetera. So basically just a canonization of the knowledge that a team has. And we went ahead and built that and we operationalized it and we, we, we, we now expose that to all of our teams. It changes how managers work, right? Like I no longer have to go to my engineering leaders and ask them what someone's doing.

I can just ask that [00:42:00] question of the knowledge management tool. It's got all of the engineering docs and the GitHub issues and all of that index and chunked and available by, you know, by semantic search. So it's, it's increasing our ability to, you know, like the span of control for an individual manager can now be a lot higher because.

Um, you know, the, there, they, we've now made that investment in a tool that actually uplifts our own productivity. Uh, and we've, we've, we've worked at becoming expert at that. So it wasn't, you know, we could've just gone, bought Lean or some other technology that does the same thing, but if we'd done that, we wouldn't have learned what it takes to produce that.

Right. You know, we wouldn't know why initially our Discord messages weren't being, you know, weren't showing up as well as, as our Google Docs, because there just wasn't, you know, a chunking. Quanta was too large and there just wasn't enough semantic, uh, payload, you know, when we were indexing one of those, uh, one of those discord messages versus, versus the, the bigger things.

And so that, that, that sort of structured investment in, in kind of improving your capabilities and then [00:43:00] just, you know, observing it. Like, are people using it Like, you know, like, and keep, like they're just badgering. Like why aren't people using this? Like, what, you know, like what's wrong with it? Like why is it not cloud flow?

And it quickly gets you to that second point where you start to identify like, why is so and so not using this? You go have an authentic conversation and like, if it's because they're not AI curious, that's gonna be a problem. If it's because it's, it's not performing as expected, well that's a different problem.

And you go fix that, you go, you know, just kind of work the flywheel. And so yeah, we, you know, like for us it's really about, you know, rethinking a lot of our practices, like getting much, much, much more formal in terms of how we do planning because hey, these tools are really good at data processing, but can we start to.

You know, like take a, like, you know, spec kit doesn't work for us because we're an organization that's operating at a certain scale. Can we reimagine that in a way that aligns with our own practices? Can we start to introduce the tools and start to create the rituals around the use of these tools so that we become far more productive and actually increases our throughput?

Um, so I don't know if that answers the question, but, uh, [00:44:00] it's,

Mehmet: yes, absolutely. Absolutely. Craig. Craig, like very quickly, do you think 2026, we will see it as the year of the mass adoption of agents, agent AI and cps.

Craig: Um, I think, we'll, I think 2026 will be the year where. Folks get their training wheels on, right?

Like, I, I don't, like, I'm not gonna predict the singularity and like all of a sudden everything goes, goes, goes swimming. I'm also not a doom and gloom merchant. I mean, I think we are at a point where there will be a, a correction, you know, like in terms of, of, of how people think about it. But I think 2026 will be the year when, um, organizations start to build enough awareness around what works, where there's enough.

We've had enough time to start, you know, like, hey, you know, the, the, the, the, the right linguistics, you know, frameworks we start to make, people start talking about eval first development or spectrum and [00:45:00] development or, uh, you know, a lot of the patterns. I think people will start to be able to generate the awareness of where things are falling down.

Um, I think people will start to, uh. Get to a point where they are, you know, like it's, let's, it's, it's about getting over that energy curve. Like, you know, when you, when you're building a, when you, when you're building a, a fusion reactor, you know, you, you're putting enough energy in that it starts to produce actual energy.

I think, I think 2026 will be the year where we start to see real material energy return in certain kind of use cases. Um, and I think it's a, it's an area where there will be a bit of a correction. Like some of the, the shine will rub off. Uh, and I think it'll, you know, like the way I start to describe it is, uh, more mainstream engagement, less hype.

You know, people will, they will, will work through some of the disappointments that people are having, uh, but we will actually start to see the patterns and start to. You know, I expect to see over the next year, um, the, the 12 factors of AI native, you know, [00:46:00] apps, you know, some BrightBox gonna write that paper and everyone's gonna climb all over it and they'll, we'll harden it and that'll become part of the, the kind of lexicon and, um, you know, we will, we will start to see kind of normalization.

Um, I don't think it's the year where we're just gonna see sort of asto breakout. I don't think we're gonna displace 30% of. Uh, you know, our, our resources with, with agents, but I, I do think we are gonna, at least by the end of the year, most enterprises will have built something that they can look at. They understand it works.

They understand why it works. They understand how to keep it working. They understand how to connect into the existing systems. They'll have the right controls in place and they'll feel pretty good about it.

Mehmet: Great. Uh, as we are coming, uh, to an end, very traditional question, Craig, for you, where people can get in touch and learn more.

Craig: Well, um, you can always look me up on LinkedIn, um, or visit, um, Stacklok.com. That's S-T-A-C-K-L-O k.com, and we'd love to hear from you or jump onto, um, our community Discord, uh, which is, which is linked from [00:47:00] our, our, our website or, or check out the, uh, the open source repo. We'd love to, uh, hear from.

Mehmet: Great. Thank you so much, Craig, for the time today. All the links you just mentioned, they will be available in the show notes so people, they don't have to look, uh, for that, you'll find every single link in the show notes. And again, Craig, I can't thank you enough. Um, uh. You know, for sharing the time. I know how busy it can get, especially for someone like yourself.

So I appreciate it and this is how I had my episodes. This, this for the audience we are in, of course, we recorded this in 2025, posting this in 2026. So happy new year for everyone and I wish you had a very good, uh, you know, holiday season. And as I say, always, if you just discovered us by luck. Thank you for passing by.

Share this podcast with your friends and colleagues and if you are one of the people who are. Followers, fans for the show. Thank you very much for the support previous years and in this year also as well. I hope like we can do great results together and as I say, [00:48:00] always stay tuned for a new episode very soon.

Thank you. Bye-bye. Thank you.

#562 Agentic AI Is Not an Intern: Craig McLuckie on Control, Context, and Enterprise Reality

Listen On

Featured Episodes

Recent Episodes

Support On