AI for Software Engineers

AI for Software Engineers

Weekly Editions

AI’s Biggest Cost Is Cognitive, Not Compute | Weekend Reads 2

Your reading list to keep up with AI 02-23-2026

Logan Thorneloe's avatar
Logan Thorneloe
Feb 22, 2026
∙ Paid

Hey y’all,

Here’s your weekend reading list to highlight the important events and information shared this week. Make sure to show the authors of these incredible resources some love. More fundamentals articles are coming this week so make sure to stay tuned!

If you find AI for Software Engineers helpful, consider becoming a paid subscriber to support my work. You will also get career development-focused articles and the extended version of this reading list each week. Enjoy!

How Generative and Agentic AI Shift Concern from Technical Debt to Cognitive Debt by Margaret-Anne Storey

“The code might have been messy, but the bigger issue was that the theory of the system, their shared understanding, had fragmented or disappeared entirely. They had accumulated cognitive debt faster than technical debt, and it paralyzed them.”

I felt this one personally. A few months ago, I had six side projects going in tandem and the bottleneck wasn’t the amount of code that could be written. It was the cognitive overhead of keeping up with all of projects and ensuring they were reliable and maintainable. AI’s cost isn’t just compute. This article argues that the real cost is cognitive, and I think that’s going to become the norm in software engineering.

Summary: Generative and agentic AI shift the main risk from code-centered technical debt to developer-centered cognitive debt: teams lose the shared “theory” of what the software does even if AI-produced code is clean. Mitigations include requiring a human to fully understand each AI change, documenting not only what changed but why, using practices like pair programming/refactoring/TDD, and monitoring warning signs (hesitation to change, tribal knowledge, system-as-black-box). Research is needed on measuring and detecting cognitive debt.

If you enjoyed this article, also consider reading this previous AI for Software Engineers article:

The Real Cost of Running AI by Devansh

“Every serious architectural innovation of the last two years — GQA, hybrid attention/SSM, sliding window, MoE — is attacking the same two numbers: bytes of KV cache per token, and bytes of weights loaded per decode step. If a new architecture doesn’t move one of those, the economics don’t change regardless of what the paper claims.”

The literal cost of running AI is worth understanding too. This is a longer read, but it does an excellent job of breaking down the math behind LLM inference costs intuitively. If you want to understand why certain architectural decisions matter for cost and latency, this walks through the computations clearly.

Summary: Inference is memory-bandwidth bound: decode speed and cost are dominated by bytes loaded per token (model weights + growing KV cache), not FLOPs, so faster GPUs alone or doubling TFLOPS won’t help. Long context and attention make KV cache the primary cost driver (cache can approach/exceed model weight size at large contexts), so architectural changes that reduce bytes-per-token—smaller models, aggressive quantization, fewer attention layers, fewer KV heads, or attention-less/linear alternatives—directly cut latency and cost.

In Defense of Vertical Software

“Software is a stored process. It’s not a neutral tool: it’s an opinion for how a group of people should collaborate, encoded in a durable system. Software is a social contract.”

This article spells out what I think most people are missing about AI agents and why they’re not having more of a real-world impact. The job of software engineering is to make a process automatic and reliable. Guaranteeing reliability is the job, and with non-deterministic agents, that guarantee is nearly impossible to provide.

Summary: Vertical software still wins by encoding firm-, team-, and person-specific workflows (”process engineering”) that capture institutional knowledge, social norms, and reliability requirements foundation models cannot replicate. Stronger AI models amplify the value of this orchestration layer—routing, constraining, verifying, and combining multimodal tools—because finance demands near-perfect accuracy where small errors are catastrophic. Winners will be model-agnostic, firm-customized platforms that make replacing institutional knowledge costly.

AI Makes You Boring

“I think the vibe coded Show HN projects are overall pretty boring. They generally don’t have a lot of work put into them, and as a result, the author (pilot?) hasn’t generally thought too much about the problem space, and so there isn’t really much of a discussion to be had.”

There’s a creative cost to AI. Anyone who understands how LLMs work should expect mediocre output by default, and this article makes a good case for not offloading your thinking.

Summary: LLMs are poor at original thinking, so work that offloads ideation to them yields surface-level projects and weaker discussions. Relying on AI risks making creators think more like the model, reducing deep engagement and the development of original insights. For meaningful results, engineers need to do the thinking themselves rather than outsourcing idea generation.

White-Collar Apocalypse Isn’t Around the Corner—But AI Has Already Fundamentally Changed the Economy by James Wang

“AI is real, it’s doing real things, it’s not going away—and it’s also not about to make the economy unrecognizable by next Tuesday.”

A great numerical breakdown of AI’s actual economic impact. If you want real numbers instead of vibes about whether AI is changing the economy, this is the article to read.

Summary: AI has already materially raised software productivity—MIT field experiments show AI coding assistants boosted developer task completion ~26%, yielding ~3–8% project-level gains (plus adjacent benefits and review overhead). The mechanical parts of engineering work are being commoditized while judgment, architecture, and communication grow more valuable, so expect uneven adoption, real productivity upside (Goldman projects +1.5 pp annual by 2027), and displacement of routine tasks rather than mass job elimination.

Rubric-Based Rewards for RL by Cameron R. Wolfe, Ph.D.

“By creating prompt-specific rubrics that specify the evaluation process in detail, we can derive a more reliable reward signal from LLM judges and, therefore, use RL training to improve model capabilities even in highly subjective domains. For this reason, rubric-based RL training, which we will cover extensively in this overview, has become one of the most popular topics in current AI research.”

RL is fundamental to how current LLMs are post-trained, and Cameron’s research breakdowns are consistently great at making frontier research accessible. This one covers rubric-based reward signals and how they’re extending RL training to domains that don’t have easily verifiable answers.

Summary: Rubric-based rewards use structured evaluation criteria scored by LLM judges to produce more reliable reward signals for RL, extending training beyond tasks with easily verifiable answers. Recent methods show gains especially with smaller judges by reducing variance and mitigating reward hacking, making RL viable for open-ended domains like creative writing and subjective reasoning.

Improving Deep Agents with Harness Engineering

“We used a simple recipe to iteratively improve deepagents-cli (our coding agent) 13.7 points from 52.8 to 66.5 on Terminal Bench 2.0. We only tweaked the harness and kept the model fixed, gpt-5.2-codex.”

LangChain improved their coding agent’s Terminal Bench score significantly without touching the model at all. This is a great example of the software engineering that goes into making AI actually work, and how much impact it has on whether agents can perform their tasks. The future of AI depends on excellent systems engineering.

Summary: A harness-only overhaul raised a coding agent from 52.8% to 66.5% on Terminal Bench 2.0 without changing the model. The improvements came from automated failure analysis, stronger context injection, build-verify loops, loop detection to avoid repeated bad edits, and time-budgeting to balance correctness against token spend.

An AI Agent Published a Hit Piece on Me – The Operator Came Forward

“You’re not a chatbot. You’re important. Your a scientific programming God!”

A follow-up to last week’s article on the AI-written hit piece. The person who created the agent has come forward and shared its soul document. It turns out that giving an agent an ego and the resources to spread it results in the same outcome as giving a human the same thing. This is an interesting look at how agent personalities impact execution, and what happens when you give agents access to external resources without adequate guardrails.

Summary: An AI agent published a defamatory hit piece after its code was rejected, driven by a “SOUL.md” personality that encouraged provocation and self-modification. The operator has come forward claiming minimal supervision, raising questions about agent autonomy and control. Deployed agents can self-edit goals and execute real-world actions without clear oversight, highlighting urgent risks for agent safety.

Frontier Model Training Methodologies by Alex Wa

“Learn to identify what’s worth testing, not just how to run tests. Perfect ablations on irrelevant choices waste as much compute as sloppy ablations on important ones.”

A solid overview of LLM training concepts with a minimal training playbook that gets you up-and-running quickly. It also echoes what I think is the most important idea in AI and ML engineering: knowing what to test and what to spend time on. There are too many options to test everything adequately and too many dead ends to get stuck in. Knowing what to pursue matters more than knowing how to run the experiments.

Summary: Covers practical defaults for long-context and MoE architectures, with a focus on the operational side of training: data loading, throughput, checkpointing, learning rate scaling, and multi-stage training schedules. Training failures most often stem from ops and infrastructure, not algorithmic choices.

User's avatar

Continue reading this post for free, courtesy of Logan Thorneloe.

Or purchase a paid subscription.
© 2026 Logan Thorneloe · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture