COMBINE-lab

Research

COMBINE-lab develops algorithms, data structures, statistical methods, and production-quality software for high-throughput genomics. Much of our work is motivated by a simple constraint: modern biological data is growing faster than the tools used to analyze it.

We focus on methods that are accurate enough for scientific inference, efficient enough for large public datasets, and reliable enough to become part of everyday computational biology workflows.

Research Areas

Transcriptomics and RNA-seq

We design methods for transcript quantification, bias modeling, transcriptome analysis, and downstream inference from short-read, long-read, bulk, and single-cell RNA-seq data.

Related tools: salmon, sailfish, terminus, grouper

Single-cell and multimodal assays

We build scalable tools for processing single-cell sequencing experiments, including workflows that handle transcript counting, feature barcoding, sparse data, and emerging assay designs.

Related tools: alevin-fry, simpleaf, pyroe

Sequence indexing and search

We develop compact indexes for genomes, transcriptomes, and large collections of sequencing experiments, with emphasis on exactness, memory efficiency, and practical query performance.

Related tools: pufferfish, piscem, mantis

Succinct data structures

We study compact representations of biological sequences and graph-derived objects, especially de Bruijn graph structures and probabilistic or approximate representations.

Related tools: cuttlefish, rainbowfish, squeakr, deBGR

How We Work

Algorithm engineering

The lab emphasizes careful implementation, benchmarking, profiling, and reproducibility. A method that is theoretically appealing but brittle in practice is not finished.

Open software

Most lab projects are developed in the open, released on GitHub, and designed for use by computational biologists beyond our immediate collaborators.

Scalable inference

We often work at the boundary between data structures and statistical inference, using better representations to make larger and more accurate analyses possible.

Publications

We are still migrating publication metadata into a maintainable format for this website. For now, the most complete publication list is available on Rob Patro’s Google Scholar profile.

As the local publication data is cleaned up, this page can grow into a searchable publication archive linked to the software and datasets associated with each paper.