5 min read

A Brief Rant About the New Product Development Lifecycle

Caleb Grillo
Head of Product
HN Disclosure: WarpStream sells a drop-in replacement for Apache Kafka built directly on-top of object storage.

I am a product person. I have always struggled to describe my job because the answer is literally “I don’t know, it depends.” I am not a software engineer, and while I’ve dabbled in the dark arts from time to time, I have never been paid to write code.

In the last few months that's changed.

Most recently, I implemented a new Chargebacks feature that exposes approximately how much of an invoice’s usage is attributable to a given Kafka topic, Tableflow table, or Schema Registry subject. I was going to spend a bunch of time explaining in detail how Chargeback calculations work, but if you want to know more about that, just go read the docs. We’ve got bigger fish to fry today.

How I Built This Is More Interesting Than What I Built

I built this feature myself (with a lot of help from your friendly neighborhood frontier models and Cursor). The code touched some pretty sensitive parts of the codebase. I had to be careful to not mess things up. I had to learn a lot about things that were unfamiliar to me. I had some ideas that were, in retrospect, not good. I asked questions (of people) and had multiple models and multiple people check my work. I explained some key details about how it works to other people. Then I deployed it incrementally in staging and tested it. Then I gave a couple of customers access, and asked them to test it. They found a couple of bugs. (Thanks Aly from PostHog!)

Finally, this morning, I deployed the feature flag that turned it on for everyone, otherwise known as “going GA.”

Much has been written about how AI is eating the world and rendering humans obsolete. All I can say is that things are possible now that were not possible six months ago. Does that mean that software engineering is obsolete and everyone is a PM because now it’s all about what gets built, not writing code by hand? Or does it mean that PMs are no longer needed because engineers can get all the “product sense” that they need from an infinite well of information served up by the latest model?

I think it’s a lot more nuanced than that. The old ways of defining roles are dead. Now, everyone exists on a spectrum. Maybe there are some broad categories that still make sense to divide people on (I am much more qualified to be pretend-software engineer than a pretend-lawyer, and actual meat-space jobs like surgeon and astronaut are a no-go). But I’m skeptical of the idea that “product management” and “software engineering” are actually different roles at all. Individual people can exist on one extreme end of the PM/SWE spectrum or the other, but they’re on the same spectrum. Maybe they always were.

One thing is very obvious to me: some teams are much more successful at getting better and faster outcomes using LLM-generated code than others. I think I’m starting to understand why.

Systems Matter at Least as Much as Tools

A hammer is a tool that’s designed to be pretty good at driving nails into wood. Anyone can whack a nail into a board with a hammer, but if you’ve ever watched a professional builder whack nails into boards, you know that there’s a vast difference between an expert hammer user and a novice. LLMs are like air-powered nailguns. The act of swinging a hammer to drive nails has been automated. You can whack a lot more nails into a lot more boards much faster using a nailgun than a hammer, but that does not make you any better at actually building houses.

The tools have changed, and this change has enabled many of us to contribute an unprecedented amount of output. But output was never the goal. You are not a better builder of products just because you can now feed your PRD to an Agent and dump a pile of vibeslop on the heads of unsuspecting code reviewers a few minutes later. Subject matter expertise still matters. Taste matters. Understanding what you are contributing, how it works, what tradeoffs you have made, which alternatives you have rejected, is harder now than it’s ever been. But it’s never been more necessary.

Don't merge slop, please.

On the WarpStream team, one way that we ensure quality and craftsmanship is by requiring that every pull request adheres to a set of “affirmations.” You have to check the boxes before you can get a review. It feels bad if you ask someone to review something after falsely affirming to yourself that you understand what you are asking them to review. This accountability keeps us honest. If I can’t check these boxes with confidence, I feel shame, and the dopamine hit I get from submitting a +2343 line PR won’t make up for it.

In addition to the PR affirmations, we also have a document defining guiding principles and expectations for our SDLC that we all agree to adhere to. An excerpt:

We don’t care how you generate code. Write it by hand, or spawn 20 sub-agents to do it in parallel. This is your own business. However, we do expect that you’re willing to stand behind the code that you put forward. This means that at the end of your process, whatever that process may be, the generated code should be of similar or higher quality to what you would have generated had you written it all by hand.

We also have an extremely robust CI pipeline, and comprehensive test coverage. We have some rules that are codified in CI checks as well, such as “you cannot define a feature flag and also set that feature flag in the same PR” and “you cannot make both protobuf definition changes and other changes in the same PR”. Rules like these ensure that we don’t make changes that are known to be dangerous and difficult to roll back.

Our CI pipeline kicks off an AI code review on every push, so before a PR gets to a human reviewer, it’s been reviewed at least once, by a review bot that’s been given pretty specific instructions on how to make sure we don’t merge garbage. It usually does a pretty good job of finding obvious bugs and performance improvements. I also like to get an AI code review from multiple models before I even put my branch up for the CI bot to run against. (This is so easy to do with Cursor that I’m actually pretty convinced that none of the frontier model providers have any pricing power at all, which brings up many terrifying-but-academically-interesting questions about the economic viability of offering models as a service – but I digress.) 

If you can’t tell by now, our team is extremely AI-pilled. I am extremely AI-pilled. These things are really useful. But they are still just tools.

What Is Old Is New Again

Old truths about what makes an effective product builder remain the same, regardless of the tools that are available to us, or the titles that we give ourselves. 

Agency is, in my opinion, the single most important defining characteristic of “people who are good at building stuff.” I don’t think this is controversial. Agency means that you have a drive to do things. Others have referred to this as “bias for action” or “getting shit done”, but I think it’s actually more than that because those phrases and taglines emphasize output. Agency isn’t about output. People with agency are motivated by affecting outcomes, not just producing more widgets at the widget factory. 

Another defining trait of the Product Builder is their ability to exercise good judgment. Good judgment comes from the ability to think critically about a problem, and come to a conclusion independently based on the available information. People who have good judgment typically have strong convictions because they’ve thought somewhat deeply about a problem from first principles, and therefore have a point of view. Sometimes their points of view are uncomfortably contrarian. Sometimes they perfectly uphold the consensus. If you’ve thought critically about a problem, and you’re honest with yourself about what you do and don’t know, then you can confidently articulate what you think. Good judgment requires critical thinking.

Good judgment also requires that you understand your users and customers. You have to know who you’re building for, why they care, and what matters to them. On the WarpStream team, we put all of our engineers on customer calls all the time. This practice has built a huge amount of product sense on the team. It has improved every single thing about our product, and our GTM motion. You should do this, too.

Agency and exercising good judgment have always been assets for product-focused people. But these days, I think they’re more relevant than ever before. The barrier to entry is gone, and as a result it’s easier than ever to ship mediocrity. Once you open the door to mediocrity, it cannot be closed. To avoid this descent into despair and doom, you must create systems that fight entropy. You must also build a culture that emphasizes quality and conscientiousness over quantity and output. Again, this is not new. It’s just more important now than ever before.

You Are Not the Sum of the Tools That You Use

Coding agents are pretty cool. This Chargebacks feature would not exist if I didn’t have Opus 4.8 and GPT 5.5 working for me. I wouldn’t have prioritized it. It simply would not have made the cut line for a “real” engineer to work on, and I hadn’t learned enough Go to write it myself in a reasonable amount of time. 

But am I offering yet another breathless take about how software engineering is dead and human engineers won’t exist a year from now? No.

If I didn’t have humans to review my code, or teach me about dependency injection, or tell me that some of my proposed changes “seem insane” (sorry, no tweet link for that one), I might have shipped something, but it probably would have been bad. It could have caused major problems, up to and including bricking customers’ WarpStream clusters. Skill issue? YES. That’s the whole point.

Nearly all of our engineers spend substantial time working with coding agents. I don’t know the actual percent of our codebase that’s been AI-generated, but it’s large. If we didn’t have an extremely robust culture and set of tools specifically aimed at controlling the entropy increase that LLMs introduce, our product and codebase would likely be regressing into mediocrity despite everyone’s best intentions. And I can tell you first-hand, that’s not what is happening.

Your Reflexes Are Wrong

It’s tempting to think that once the models get good enough, we can simply replace all the human engineers with LLMs, just like it’s tempting to think that “Product Led Growth” means replacing all your salespeople with a signup page. This is the wrong reflex. It’s not even that “the models aren’t ready.” They’re never going to be “ready”, as long as they’re LLMs. Scientifically speaking, I don’t know what gives your brain the ability to express good taste and exercise good judgment, but I’m pretty sure that predicting which character is most likely to come next in a sentence ain’t it. More people can do more with LLMs, but LLMs can’t do everything.

I’ll leave you with a hot take about the advent of LLMs and where I think this is going spectacularly wrong right now. People with agency and good judgment do not need to be told which tools to use. If you have to put up a token leaderboard and tell everyone at your company at an all-hands meeting that they’re being left behind if they don’t use the LLMs to make more widgets, your problems are way, way deeper than whatever tools your people are using. 

Forcing people to use a specific tool or category of tool only exposes deficiencies in your culture, and by extension, your leadership. Think twice before you hand out nailguns to your hammer-wielding framers and tell them that they’re going to be judged based on how many nails they shoot into boards. You won’t get more or better houses that way. The output is not the point. Counting tokens, or even counting adopters, is completely the wrong thing to do. Stop doing it.

So a PM Vibecoded a Feature, Big Deal

NEW EPISODE! Guy and Al perform a hard-target search of every gas station, residence, warehouse, farmhouse, henhouse, outhouse and doghouse as they enjoy Harrison Ford and Tommy Lee Jones in the Fugitive

Maybe none of this matters to you, and reading this was a big waste of time. Hopefully, though, we can all start to agree that the collective AI psychosis gripping our industry has gone a bit too far, and we need to reset our expectations about what these things are actually good for. 

A lot of people are going to read this and think, “Wow, they’re letting a PM contribute code to production? Good luck with that.” And that's probably a pretty good default position to take. If you don’t have all the systems, processes, and culture already set up to support people messing around in production, you will be in for a wild ride. But we do have those systems, processes, and culture, which is what enables me to do this at all. Getting hands-on with our codebase has made me better at my craft, just like putting our engineers on sales calls has improved their ability to build meaningfully valuable features and functionality for our customers. 

But I am not a replacement for any engineering headcount, and I probably never will be. I’m just able to do more without needing someone else’s help. Just like a builder who swaps out his hammer for a nailgun.

Get started with WarpStream today and get $400 in credits that never expire. No credit card is required to start.