I’ve been iterating on a project that started as a Hackweek win and has since evolved into something bigger: a domain-specific “review agent” powered by an LLM.
The idea? Could a tailored AI agent help with first-pass reviews of complex product specs? Manual domain reviews are high-effort, doesn’t scale well, and dependent on subject matter experts with deep institutional knowledge. Could an LLM catch common issues, raise thoughtful questions, and reduce the initial cognitive load on humans?
I think the answer is yes. With the right context, tools, and framing – it can help. It doesn’t replace human judgment, but offloads the rote parts of review so people can spend more time on the hard, nuanced edge-case thinking.
But making that possible as a Technical Program Manager meant pushing beyond my business-as-usual scope, exploring new tools, and surfacing opportunities to improve how we build with AI.
What I Built
I went through a few different AI tools, and with my latest prototype I created:
A custom “review agent” with a tailored system prompt and rules
A local memory that stores example reviews and supports feedback-based improvement
An architecture that supports multiple agent perspectives (e.g., one for engineering, one as a domain expert, one for business impact) so they can collaborate on assessments
The agent now parses product specs, flags potential issues based on historical patterns, and suggests areas for deeper exploration.
I’ve trialed it with human reviewers and evaluated against past reviews. Initial feedback has been encouraging – people said it helped them get oriented faster and allowed them to focus their effort where it mattered most. But don’t clap yet, there’s still much to be done and I think that side of working with AI tools isn’t talked about enough. We all expect it will save us time and money, but there is an upfront cost to learning, experimenting, and applying that knowledge effectively.
Why Not Just Use ChatGPT?
I’ve seen demos of LLMs reviewing product specs by simply pasting in a prompt and asking, “What are the risks?” That’s a fine starting point – but in real operational settings, that alone doesn’t cut it. Here’s why I needed a more extensible setup:
Specs are long and inconsistent: not all LLMs handle 10+ page documents well without structured guidance
Surface-level suggestions aren’t helpful: “Be careful of bots” isn’t actionable; we need contextualized, system-aware recommendations
Institutional knowledge matters: knowing what we’ve suggested in past reviews is helpful, and that history lives in tickets, comments, and scattered documents – not in a single static prompt
That’s why I used tools that allow:
Persistent memory (a local library of real-world examples)
Customized rules and system prompts
Integration with MCP for internal data context
It wasn’t about proving that an LLM could read a doc. It was about helping the LLM reason like someone who’s done dozens of reviews before.
Why TPMs Are Uniquely Positioned
I’m not an engineer (anymore). But I’m technical enough to navigate internal infra, and product-minded enough to design useful workflows.
This is why TPMs are in a uniquely powerful position right now:
We sit at the intersection of processes, people, and tools
We understand how work actually gets done – not just how it’s specced
We have just enough technical fluency to stitch things together
This combination of breadth, depth, and just enough tech, is what makes TPMs well-suited to drive value from internal AI tools. We know which problems matter. We know who they affect. TPMs also excel at bridging the gap between the technical and non-technical and are often described as the glue. But with LLMs, we can be more than glue – we can be builders and enablers.
What’s Been Challenging
While I’ve made good progress, here’s what I’ve had to work around:
To access internal tools and MCP servers, I had to follow engineering onboarding guides to set up my machine. As someone who hasn’t done any development work in years, this was a hurdle. For someone without an engineering background, it’d be even harder.
The prototype lives locally, which makes it hard to share – especially with non-technical teammates. I can easily put my artifacts in a repo, but the friction to onboard to new and developing internal AI tools that meets this use case is high for non-engineers.
Gaining visibility and buy-in took time. But the tech is moving quickly and I’ve fortunately found allies along the way.
I’ve had encouraging conversations with folks who are thinking seriously about enablement, tooling equity, and scalable access. Still, this experience also highlights a broader pattern I think many companies will encounter as they integrate AI deeper into how we work.
What Companies Should Be Thinking About
Here are some questions I think are worth reflecting on:
Have we made AI tooling accessible beyond engineering teams?
Have we educated people on what internal tools like MCP actually enable?
Are AI workflows shippable and shareable?
Are TPMs, designers, analysts, and reviewers empowered to build?
These are not critiques – they’re opportunities to scale responsibly. If you have TPMs who understand ML and have deep domain knowledge, lean into them. The fastest path to AI leverage may not come from AI specialists – it might come from your cross-functional builders who already know the systems.
What Comes Next
I’m continuing to refine the review agent, layering in more domain perspectives, expanding its capabilities, and seeking ways to make it more accessible.
This is where companies are heading: moving from chatbot novelty to operational leverage. Prompt design is not enough; domain expertise is what turns a prompt into a high-leverage system. These kinds of internal agents will become part of how we scale expert thinking across fast-moving product orgs.
If you’re working on similar problems – or wondering how to create effective AI systems – I’d love to connect.
This kind of work doesn’t just come from engineers anymore. And the sooner we realize that, the faster we’ll all move. Non-coders can build systems. With the right tools, prototyping agentic workflows is within reach for anyone willing to experiment.
This post was originally on LinkedIn.