The skeptic’s guide to generative AI assisted coding
Rob Patro February 15, 2026
OK; it’s not really a guide, but I needed a good title, this is more a story about how, and why, my perspective on AI assisted coding has evolved.
Since the AI-assisted coding hype began, I have been an outspoken skeptic of these technologies. In large part, this was due to my own experience with the early models. While I am a techie, I am very much deliberate in how I adopt and use technology. I love useful tech that makes my life easier, and lets me accomplish more of my goals. However, I hate bad tech; technology that wastes my time, demonstrates bad design, or diminishes rather than enriches enjoyment (yes, I know, I was quite active on Twitter for a while …. maybe that’s a post for another day).
When I first tried generative coding models, they simply did not produce good code. They would sometimes produce functional code (but often not), and their solutions were almost always both messier and less efficient than what I would have done by hand. I tasked them with implementing the projects in the classes I teach, and they either failed, or succeeded with implementations that I would hope an undergraduate at UMD would find embarrassing. One early success, that I filed away in the back of my mind, however, was having ChatGPT help me write some highly vectorized AVX-256 code for longest common prefix computation that ended up being useful in our CAPS-SA work. Nonetheless, I was incredibly skeptical of people talking up (and using) these tools, and was quite outspoken about the types of mistakes they made and the limitations they had.
A small series of successes
You can try to ignore the world, but you do so at your own peril. Increasingly, colleagues and experts whose opinions I truly respect (some of whom had previously been very outspoken AI skeptics) began to express some very positive opinions. To be clear, I think that in the tech world, there is still a massive overstatement of the case for and future of some of these technology. However, even some of the calmest and most judicious tech folks I follow, who had previously expressed skepticism similar to mine, began to express much more optimistic, albeit nuanced, opinions. In particular, Jon Gjengset, whose technical opinions I have come to value immensely, and whose overall judgments (and taste) I find align highly with my own, has made some very keen observations about the use of these tools, and demonstrated their utility in a way that I find that I would share. So, I decided to give it another go.
Very quickly, I learned that, while I believe my opinions on the early models was correct, in that they were useful for helping those with little to no coding experience to cook up somewhat functional, but often ill-designed things, and they were often of very little use (or even a net negative) for seasoned developers. However, the current state of things is entirely different. In rapid succession, I was able to obtain a series of quick and, critically, practically useful successes with the newer class of models. I mention a few below just to give the flavor of how I was using these tools.
mim
We recently pre-printed a method (and associated tool) called mim, which is a semantically aware auxiliary index for gzip compressed (and block gzip compressed) FASTA and FASTQ files. We demonstrate that with a very small auxiliary index, one can massively speed up the parallel decompression and parsing of these files, eliminating a bottleneck in many tasks in high throughput sequencing analysis. These indices are semantically aware (i.e. know about the structure of the read records) and so can e.g. properly synchronize between paired-end files during decompression. mim is built, at a basic level, around the ideas of zran, and the initial implementation of mim was written in a mix of C and C++, with part of the indexer as a direct modification of the zran code. While this was useful and functional as a way to show the benefits of mim, I have come to greatly value keeping my software ecosystem in Rust as much as possible. It is the language that I prefer to write, and to maintain.
So, I tasked Claude (Sonnet, on the free plan!) with porting the core indexing functionality of mim from C++ to Rust. This was scoped to be a relatively straightforward translation, but nonetheless is something that would probably have taken me a few days to do by hand. More critically, it would have taken me large, uninterrupted, meeting-free blocks of time. Something of which I, as a professor, have precious little.
With the C++ source code (uploaded as an attachment), and a few hours worth of conversation, I was able to obtain a fully-functional Rust implementation of the mim indexing functionality in Rust. It was not a “one shot” solution; there were some hiccups in how multi-part archives were handled. But each time I ran into a problem, so long as I was able to explain to Claude the technical details of the issue that I was encountering, we were able to overcome the problem. As a result, we ended up with a mim implementation in Rust in short order, which has since become the core reference implementation of the idea. The current code has been polished (and parts of it rewritten by 1337 coders, like Ragnar), but getting to a functional and respectable Rust implementation, in a day, via the free tier of Sonnet, more than piqued my interest.
In memory tile collation in Cuttlefish 1
Our Cuttlefish (version 1) tool still acts as the basis for building the core information that is indexed by our (SSHash-based) piscem tool — the mapper for single-cell RNA-seq and single-cell ATAC-seq data upstream of alevin-fry.
Cuttlefish 1 builds a compacted colored de Bruijn graph very efficiently on reference sequences. It outputs both a set of maximal unitigs and, critically, a tiling, that describes precisely how each reference is spelled out by an ordered sequence of unitigs (each in a specific orientation), and possibly with gaps of N nucleotides. When we build a piscem index, we build an SSHash index over the unitig sequences, and a packed, inverted index over the tiling information. In this inverted index, we store, for each unitig, the sequence of references, positions and orientations in which it occurs.
However, when we (primarily Jamshed) wrote Cuttlefish 1, we had in mind the construction over massive collections of large references (e.g. large bacterial pangenomes, plant genomes, or human genomes / pangenomes). Specifically, we had in mind reference collections where, on average, each reference sequence was long (e.g. at least millions to tens of millions of nucleotides long).
This informed the parallelization strategy we adopted. Each input reference was cut into a number of bins equal to the thread count, each thread wrote out the tiling information for its assigned interval to an intermediate file. Then, these intermediate tilings were stitched together in order to yield the tiling information for the entire reference sequence before we moved on to the next reference sequence (e.g. the next chromosome). This worked brilliantly for genomes, but quite horribly for transcriptomes. When each reference sequence is short (1-2 kilobases), each thread ends up being assigned a small interval of sequence (a few dozen to a few hundred bases long, and typically tens of unitigs), writing tiny intermediate tilings to file, forcing the flushing of tiny buffers, and then reading those all in again to stitch together the overall tiling.
All of these tiny I/Os just made this step horrifically slower than it needed to be, and this was especially a problem on networked file systems, where millions or billions of tiny flushed I/Os absolutely tanked performance. An index build that should have taken 10s of minutes would end up taking many hours or, sometimes, even tens of hours. There were quite a few GitHub issues about this, and even a friendly how-to guide written by one of our users.
The solution was clear; avoid the intermediate files altogether. As the references are short, they aren’t needed. Likewise, it’s silly to partition a reference into many bins between threads when each thread ends up writing only a few symbols in the tiling. Instead, it makes much more sense to parallelize across reference sequences — to let each thread pull the next available reference sequence and create its tiling information in memory.
Conceptually, all of this was clear, but it was buried in a quite large C++ code base, and this specific code hadn’t been touched in several years. Jamshed and I discussed the fix, and it was on his radar, but he always had more important things to tackle, and then he graduated and moved on to his postdoc at Northeastern (where he’s doing awesome stuff, btw).
So, again, and based on my earlier success, I decided to give an AI coding agent a hand. This time, I didn’t really want to have to paste bits and pieces of code into the Claude website. While I was curious about these agents, I still wasn’t curious enough to commit to paying real, hard, money for one.
Luckily, as part of our academic-affiliated organization (COMBINE-lab) on GitHub, we get the cheap tier of co-pilot (the $10 / month version). So, I hopped on the co-pilot interface on GitHub and started chatting with it to create the feature. There was a bit of back and forth conversation, but over the course of a few hours, co-pilot and I worked on this PR that implemented the feature. This made the bottleneck step of Cuttlefish (on transcriptomes) a lot faster. It was about 5 times faster, even when running on a reasonably fast local disk, and much more than that when running on a cluster with a networked file system (though it still makes sense to do a build and write the output to local scratch and then move it to a shared location). This experience was a huge success. The AI models (mostly Claude Sonnet and Opus) “understood” the C++ codebase, my instructions, and our goal. It went off and implemented the feature piece-by-piece, coming back to me when expert input was required.
Not everything was perfect, however. For example, when implementing the inter-reference parallelization strategy, we reached a point where the agent introduced a concurrency bug. I manually investigated the code and immediately noticed the source of the error, and why it was happening. I pointed out where the error was, and let co-pilot take a shot. Once I pointed out where the problem was, it did come up with a solution, but it was a sub-optimal one. I explained my solution, and it agreed the solution I proposed was better.
Nonetheless, this was a minor hiccup in what was otherwise a big win. Even when interacting with the agent in this way, a key ability that it had was to inspect the feedback I provided critically, and reason about how to address the relevant issues. At one point, when we encountered a segmentation fault, I popped open the executable in lldb, ran it, and provided the backtrace to co-pilot. Based on the backtrace, it reasoned about, diagnosed, and fixed the bug.
This experience led me to attempt a bigger lift.
Two major translations in < 1 week (actully just about 4 days)
SSHash
SSHash is an incredibly efficient sequence dictionary for k-mer lookup in the context of genomics. The original data structure is described in this paper by Giulio Ermanno Pibiri, the reading of which, actually, initially sparked my collaboration with Giulio (so thank goodness for this paper!). Giulio and I recently pre-printed a set of data structural and algorithmic updates to SSHash aimed at theoretically and practically improving the caching behavior.
I’ve wanted a version of sshash in Rust now for a long time. In fact, I asked my PhD student Jason Fan (now at Fulcrum Genomics) to try to implement it about 2 years ago! Jason made some great progress, but it was a heavy lift, and not directly on the path of his dissertation, so this project was never finished. Further, sshash is on the core path of our lightweight read mapping tool, piscem, that is, itself, used upstream of alevin-fry for single-cell RNA-seq and single-cell ATAC-seq processing, so it was a perfect candidate for a “large scale” project to convert from C++ to Rust.
I didn’t want to attempt this through the co-pilot web interface, so I downloaded the co-pilot CLI. I was ready to go! And the co-pilot CLI … immediately segfaulted (another story for another post). So, I threw up my hands and decided to work through the VSCode co-pilot plugin. It works well, but I don’t use VSCode, I’m a neovim guy. Anyway, I wasn’t to be doing most of the coding (just the chatting and guiding, with some minimal manual fixes), so I went with it.
What happened next was a transformative experience for me. While converting an approximately single-file dependency for mim was very useful, and while directly modifying our Cuttlefish C++ codebase and adding a long-awaited feature was impressive, actually, what happened with sshash massively updated my prior about these tools.
I started with describing what SSHash is, giving the agent access to the full C++ SSHash codebase, and describing the design requirements for the Rust version. Claude derived a detailed implementation plan, broken down into major phases and minor sub-phases and steps within each phase. It described what the requirements were for each phase, how correctness was to be assessed, and when it would be OK to move on to the next step.
Over the next approximately 2 days, in a few focused sessions, and sometimes in the background (between meetings, etc.), I guided Claude, through co-pilot, into a complete, functional and approximately performance equivalent version of sshash written in Rust. Now, certainly, there was a lot of great (non-AI-created) infrastructure that we could rely on. SSHash is largely about succinct data structures, and we were able to build directly on sux-rs for bit-packed integer vectors and Elias Fano vectors, and on bsuccinct for the incredible PHast minimal perfect hash function (it was a real win that so many of the leaders in the succinct data structure space are “Rust forward”).
Nonetheless, this implementation feat absolutely blew me away. We went from 0 to a fully-functional and performance equivalent version of this very sophisticated index over 2 days; a goal that had eluded me for 2 years. Granted, if I was in a position to set aside a month or so for nothing but crack coding sessions, I think I could have accomplished this, but I don’t have a month or two to set aside like this. The fact that, in such a short span, I now had access to SSHash in Rust was simply mind-blowing.
Again, as with the Cuttlefish feature. Not everything was perfect the first time. In one step, Claude replaced what was an efficiently encoded Elias Fano sequence with O(1) select support with a Vec<u64> with a binary search. This was a silly and unnecessary mistake from both the size and speed perspective, and I have no idea why it did this. But, because I was watching this process in amazement, I noticed this, pointed out the issue, and it was promptly fixed.
Also, very interestingly, despite being given explicit and repeated instructions to directly follow the set of data structures used in the C++ implementation, Claude initially came up with a novel encoding scheme for offsets into light, medium, and heavy minimizer buckets, deriving an approach that was distinct from (but in practice competitive with) what was done in the C++ code. For the time being, I’ve left that design on a branch of the sshash-rs repository, as I think it was a quite interesting development.
Yet, again, with some cajoling and oversight, Claude produced a semantically equivalent index to what exists in C++. This index now exists in the sshash-rs repository, and it was critical to me in my most recent project (yes, there was one after sshash-rs which was started and had the initial version completed this week). I hope that having SSHash available in Rust also proves useful to other folks working in this space. It’s also astounding to me that all of this (modulo some subsequent optimizations for which I used Claude Code) was accomplished on a $10 / month co-pilot plan!
piscem-rs, the magnum Opus (nerd pun intended)
With the sshash Rust implementation in place, and enough evidence, finally, to pay for an actual Claude Max subscription. I booted it up, and started an ~1.5 day session that would ultimately result in the complete translation of our piscem-cpp mapping tool from C++ (with a submodule dependency on the C++ SSHash) to Rust (with a proper crate dependency on sshash-rs).
Like the sshash translation, this is a complex and interrelated codebase, implementing many features. Piscem supports mapping bulk RNA-seq data, single-cell (and single-nucleus) RNA-seq data, and single-cell ATAC-seq data, with an array of different algorithms and variants. It supports output in a custom binary format (the RAD format), consumed downstream by our alevin-fry tool for single-cell quantification. It implements necessary supporting features, like parsing fragment geometries using a custom parsing expression grammar (PEG). It has a host of different detailed optimizations, like a concurrent shared cache to retain lookup information about the k-mers bridging the ends of unitigs and the unitigs that most frequently follow them in the observed reads (the unitig end cache from our paper “Alevin-fry-atac enables rapid and memory frugal mapping of single-cell ATAC-seq data using virtual colors for accurate genomic pseudoalignment”).
All of this was created, in about a day and a half, by me driving Claude Code (with Opus 4.6). The result is astonishing. The current version supports all of the features of the C++ codebase, is slightly faster than the corresponding C++ code, is more cleanly organized, and produces semantically-equivalent RAD format output (e.g. the order of things can be different because of multithreading). On non-trivial test data, across all assay types, there is 100% concordance between the results of the C++ and Rust versions of piscem.
What’s even more amazing, perhaps, and something that truly sets the experience of using Claude Code apart from something like the web interfaces or even co-pilot in VSCode, is Claude Code’s tremendous ability to interact with the system, plan what it is doing, and introspect on the results of the code it is generating. During this process, when there were performance pitfalls or subtle differences in the algorithm. Claude Code instrumented the code with debug information and inspected the output to understand what was going wrong. Claude Code wrote the test harness to compare the binary RAD format files for semantic equivalence between the C++ and Rust versions of piscem. Claude Code wrote the unit tests to ensure that each abstraction worked in isolation and that implementation changes didn’t affect the output. Claude Code instrumented the index build on an ATAC-seq index (which is larger than that used for transcriptome mapping, because it indexes the whole genome) when I complained that it took too long, discovered the performance issue arising from a pathologically cache incoherent memory access pattern, fixed the pattern in the dependent codebase, validated the performance and correctness of this fix, and then asked me if I wanted to commit and push this change (I did).
In the series of translating the C++ code to Rust, we got down to a case where there was one particular sequencing read being mapped by the C++ program that was not being mapped by the Rust program. Claude suggested this was a very minor difference, but I insisted we fix it. Claude isolated the read, tracked down the difference, and explained why the read mapped in the C++ code and not the Rust code. It turns out, it was the result of a (very rare) integer overflow case in masking the bitpacked representation of a length 31 k-mer in a 64-bit integer. Claude pointed out what this bug in the C++ code was, why it happened, and offered to reproduce the buggy behavior in Rust to achieve perfect parity. Of course, I chose to simply fix the C++ code instead, but just going on that journey with Claude Code, and seeing it track down a rare corner case and reason about the algorithmic flow, was eye-opening. A huge part of what makes Claude Code so powerful, in my opinion, is the way that it can (and often expertly does) close the loop on developing, debugging and optimizing.
As a result of this development push, in about 2 days, we now have a complete implementation of piscem in Rust. This will be shortly replacing our C++ implementation, making it easier to add new features, making it easier for my students to contribute, and making maintenance and deployment much simpler!
In the future, I hope to apply these tools to both novel development, as well as some further translations to Rust that I’ve long desired (I’m looking at you, salmon). Also, along with a team of interested individuals, we have a super-secret (well, maybe not so secret if you follow me on BlueSky) Claude assisted rewrite planned, and I’m super psyched about this.
Non-technical caveats
I’ve substantially revised my perspective on the capabilities and utility of these AI coding models, and I think that they are and will continue to be an incredibly powerful tools, and I am excited to continue to explore how they can help achieve technical (and specifically software) goals. Nonetheless, there are several very real, and very serious caveats to these AI models in general that I think are quite concerning, and for which I don’t currently have a solution. I mention a few below. However, I do not think that simply refusing to engage with these tools is a productive way to address these caveats. I do not think that, even if we wanted, the genie could be put back in the bottle.
Here’s a super-short (and non-exhaustive) list of issues with these tools, how they have come to be, and the effects that will have that keep me up at night. I may likely continue to update this, or even turn it into its own post.
-
Training by violating copyright and replicating intellectual property with zero traceability or citation. Despite the legal rulings on fair use exemptions, it seems to me quite clear that these models, in their training, have massively violated relevant copyright and licensing terms. They have consumed, with almost complete disregard for any protection, the intellectual property of countless individuals. Further, by virtue of the way these models work, at least as far as I am aware, when they recapitulate or even completely reproduce this subsumed IP, they provide no citation nor attribution. I do not know how to hold accountable the companies who have built these models for engaging in practices where they certainly know better, or even if that is possible. But I do hope, that going forward, we can develop both social and technological solutions to this serious problem.
-
Brainrot: My experience with these models has given me a bit of a “black mirror” moment. What the models do, the capabilities they have, and the quality of the content they produce is, to a large extent, a reflection of the user. They can be an incredible tool for a serious an seasoned developer building software, or for a mathematician working through a complex open conjecture, and even for an ambitious and dedicated teenager looking to master a new skill. However, they can also, absolutely, act as an “easy” button to generate zero-to-little effort content; output that may be passable, but is of no real value. It seems clear that we are well into the regime where a student can choose to put in almost no effort, have one of these models complete a (e.g. programming) assignment for them, and completely miss out on the educational goal. While there is a lot of thought going into how to incorporate AI tools into curricula, and while that may be an important line of inquiry, I think it is incomplete. The real problem that we need to tackle, that predates the challenges raised by AI, but that is newly-magnified by their capabilities, is the very human and social one. How do we motivate actual learning? How do we convince people that learning, itself, is the goal, and to put in real effort when there may be easy ways to game the evaluation system (i.e. grades)? Motivated and truly curious people will learn with or without AI tools, but these tools make it much easier for folks not so-inclined to move through the educational system without learning or exercising critical thinking skills, and perhaps even fooling themselves into thinking they are.
-
Widening the gap: The more I experiment with these tools and the more I see how capable they are, the more I also see how savant like they can be. Their capabilities are incredible, but their mistakes are often absolutely silly (to a seasoned expert). Building on the “black mirror” comment above, I think that it may be the case that, rather than having a “rising tide” that “raises all ships”, these tools may further widen the divide between the most effective, capable and knowledgeable experts, and everyone else. It is the seasoned greybeard, who has spent months of cumulative time tracking down and squashing subtle heisenbugs, who can immediately see and correct the silly but obvious (to them) mistake that Claude just made. It is the accomplished algorithm engineer who knows exactly how a specific data structure needs to be laid out to maximize cache efficiency and therefore performance. It is that algorithm engineer, and the knowledge they bring to the tool, that lets them guide the model to the right solution, and not just a solution. Ultimately, I fear that if we are not able to effectively teach and instill that knowledge and experience in those who are now undergoing that critical stage of their development, we may be creating, in some ways, an expertise cliff. For those who have that expertise today, it has often come through hard-earned knowledge, manual construction of sophisticated systems from first principles, and a lot of persistence and banging their heads against the wall. There may be ways to create that level of expertise and knowledge without all of the associated “manual” exercise, but those ways are as yet, to me, unclear.