Better Agents Mean Better Surveillance | Weekend Reads 3
Your reading list to keep up with AI 03-01-2026
Enjoy this weekend’s reading list! There are a few topics that were especially prevalent: the dangers of a surveillance state, the importance of evals, and agentic engineering practices and resources.
Statement from Dario Amodei on our discussions with the Department of War
“Powerful AI makes it possible to assemble this scattered, individually innocuous data into a comprehensive picture of any person’s life—automatically and at massive scale.”
This is the biggest ethical issue AI is facing right now. US citizens (and I’m certain other countries) have always been scared of a surveillance state (search ‘Birds Aren’t Real’). AI provides not only the means to do this, but also more of a motive. Surveilling also provides the opportunity for more data collection which in turn creates more powerful AI.
Proper AI use is vital to technology’s future and the impact it can make. Just because it can be used for a purpose doesn’t mean it should. The public/user’s trust in the technology is paramount. Anthropic’s statement is a must read as an excellent statement for proper AI against one of the most powerful entities on the planet.
It’s worth calling out that the US Department of War’s response to Anthropic was to label them a threat to the US. I won’t comment on this as I don’t feel knowledgeable enough on the subject to understand the nuance.
Summary: Anthropic says it has actively deployed its AI to U.S. national security customers but refuses government demands to remove two safeguards: bans on AI-driven mass domestic surveillance and on providing models for fully autonomous weapons. They argue those uses threaten democratic values and are unsafe with current models, and warn that forced removal of safeguards would be unacceptable even if it risks losing contracts.
Lessons from Building Claude Code: Seeing like an Agent
“As model capabilities increase, the tools that your models once needed might now be constraining them. It’s important to constantly revisit previous assumptions on what tools are needed. This is also why it’s useful to stick to a small set of models to support that have a fairly similar capabilities profile.”
If you’re building an agent, the lessons here are directly transferable to your own work. The Claude Code team walks through their iteration on planning, tool design, and how model changes unexpectedly affected agent output. It’s a great example of why evals matter: so many factors influence agent behavior that without proper checks, you end up with unintended results.
One of the more interesting takeaways is that search seems to be the most important agent capability. If an agent can search for information, context can be actively managed and rot avoided.
Summary: The article describes iterating on Claude Code’s agent action space to match model abilities: designing tools for eliciting user input, tracking work, and letting the model build its own context through search and progressive disclosure rather than preloading everything. Failed output-format attempts, improved results from a callable question tool, replacing rigid todos with shareable Tasks, and better context discovery via nested search all demonstrate that the right tools reduce friction and enable more capable behavior as models improve.
Does AGENTS.md Actually Help Coding Agents?
“The headline finding is that LLM-generated context files reduce task success rates compared to providing no repository context at all, while increasing inference cost by over 20%.”
Human-written context files outperform AI-generated ones. LLM-generated context made agents perform worse than having no context at all. Importantly, this isn’t something we would have known without having the capability to measure it.
I see a lot of “use AI for this” online without any sort of support for why and how it should be used. It’s important to remember that just because AI can do something doesn’t mean it does it better than another method. In production, this capability is key and measuring improvements is a necessity.
Summary: A new benchmark study shows repository-level context files only help when they add non-redundant, repo-specific info: human-written files that capture tooling quirks and non-obvious conventions raise success rates around 4%, while LLM-generated files that restate existing docs reduce success and increase compute by over 20%. Agents faithfully follow whatever instructions they’re given, so redundant or verbose guidance drives extra, unhelpful exploration. Keep context files minimal and focused on gaps the codebase doesn’t already document.
How We Hire Engineers When AI Writes Our Code
“Removing algorithmic questions is only one half of the battle, though. We still need to design an interview loop that tests practical skills! This has historically been a tough needle to thread. I want to see how a candidate tackles a problem with real-world scope, but my time with a candidate is short. An interview shouldn’t be a proxy for an engineer’s typing speed.”
I’ve always been pro Leetcode-style interviews when they were the best we had, but those interviews no longer draw the proper signal for what makes a good candidate.
Tolan agrees with this and has made their hiring process more similar to on-the-job coding. By enabling candidates to use AI, they can have a candidate solve a problem that would be time-bound previously in an interview. Then they talk to the candidate about their solution and where they would take it in production.
While most companies are shying away from letting candidates use AI in interviews, it’s becoming more important to allow it.
Summary: The article argues that interviews should mirror day-to-day engineering where AI accelerates coding: candidates get a short spec, may use LLMs, and must demonstrate design, judgment, trade-off reasoning, and ownership of AI-generated code. Implementation is easier now, so hiring should prioritize clarity, maintainability, communication, and the ability to know when work isn’t production-ready.
Inference Engineering by Baseten
“While the potential and impact of inference are becoming clear, the space is young. There are relatively few people working on inference, and newcomers can become experts quickly. There are opportunities to solve novel, interesting, and deeply technical problems at all levels of the stack.”
ML infrastructure is one of the best entry points for software engineers getting into AI. It’s an excellent mixture of software engineering and AI, which makes it a great place for curious engineers to start having an impact in the space. It’s also a space where many optimizations are needed and we’re still in the early days.
I suggest grabbing a free copy of this book by Philip Kiely from Baseten on inference engineering.
Summary: The piece argues that inference engineering, optimizing model serving across hardware, software, and tooling, is the most valuable and underdeveloped area in AI. It maps the full stack (models, GPUs, runtimes, and deployment), highlights practical optimization techniques, and backs this with four years of hands-on experience, team interviews, and customer conversations.
A Dream of Spring for Open-Weight LLMs: 10 Architectures from Jan-Feb 2026 by Sebastian Raschka, PhD
“OpenRouter is a platform and API that lets developers access and route requests across many different LLMs from various providers. Note that while its usage statistics are a good indicator of open-weight model popularity, it’s heavily biased towards open-weight models (versus proprietary models), since most users use proprietary models through the official platform directly.”
Sebastian is one of my favorite writers and one of the best resources for keeping up with LLM advancements. I highly suggest him as a resource for doing so when you don’t want to have to read a bunch of different sources. He does an excellent job of synthesizing information and making it much more easily understandable.
Summary: Ten open-weight LLMs released in Jan-Feb 2026 converge on hybrid/efficient attention and MoE scaling. Several teams shipped models that match or approach proprietary performance by combining sliding-window, sparse/linear hybrids, and mixture-of-experts at scales from 3B to 1T parameters. Benchmarking shows smaller-efficient models often match or exceed older, larger baselines.
What you should know about AI speculation by Logan Thorneloe
“However, the implausibility of their scenario becomes apparent if you know a few things about the current state of AI and agents in production. There’s a consistent gap between perceived AI capabilities and production reality, and that gap explains most of the doomerism we see online.”
The more you understand about the current state of AI, the better you can evaluate speculation for yourself. I wrote this in response to a ‘research’ article that caused many to fear for the future of their careers. Understanding what AI looks like in production helps you separate signal from noise.
Summary: The piece argues that viral doomsday scenarios about AI replacing engineers are speculative and overstated because real-world AI is mediocre, gravitates toward average outputs, and often fails in production reliability and context sensitivity. Engineers should keep learning core skills and start building and using AI agents themselves to see firsthand where they help and where they break.
Writing about Agentic Engineering Patterns by Simon Willison
“Agentic Engineering represents the other end of the scale: professional software engineers using coding agents to improve and accelerate their work by amplifying their existing expertise.”
This is going to be an excellent resource for working with coding agents. One of the most exciting parts of software engineering right now is how new everything feels. We’re finding new ways to program with agents every day, and the entire online AI community is contributing to the findings. In my opinion, Simon Willison is the right person to catalog these patterns.
Summary: Simon Willison is assembling “Agentic Engineering Patterns”: a living collection of practical patterns for software engineers using coding agents. He argues the big shift is that producing initial working code is now cheap, so teams must rethink workflows. He’ll publish chapter-shaped, updateable guides on his blog.
You can support AI for Software Engineers for just $5/mo. You’ll get more research articles and the extended reading list each week (see below!).
In case you missed it, here’s last week’s reading list:



