Curoverse

Bioinformatics

Genomics

Curoverse built the open-source data infrastructure that the genomics industry needed to scale: the storage, computation, and workflow platform that turned petabytes of sequencing data into a manageable, reproducible, shareable resource.

Story

In the early 2010s, the cost of sequencing a human genome was falling faster than Moore's Law. What had cost $3 billion in 2003 was, by 2013, on its way to $1,000. The science had won. The infrastructure had not.

Every research lab, every clinical sequencing operation, every precision medicine initiative was running into the same wall: a single human genome is roughly 200 GB of raw data. A modern sequencing project produces tens or hundreds of thousands of them. The pipelines to process that data were fragile, irreproducible, and stitched together with shell scripts and shared drives. Researchers couldn't reliably reproduce each other's results. IT teams couldn't manage the cost. Clinical labs couldn't pass CAP/CLIA audits. The promise of cheap sequencing was running headlong into the limits of the computing stack underneath it.

The data was finally affordable. The infrastructure to actually use it wasn't.

The Opportunity: A Computing Fabric for Precision Medicine

The Personal Genome Project at Harvard Medical School — George Church's effort to sequence and openly share the genomes of 100,000 volunteers — had hit this wall earlier and harder than anyone else. To make the project work, Alexander "Sasha" Wait Zaranek and a team of scientists and engineers in the Church Lab built an open-source platform from scratch: content-addressable storage that handled petabytes natively, a containerized workflow engine that made pipelines reproducible across any environment, and a federation model that let different institutions share workflows without moving the underlying data.

They called it Arvados. It was open source, it was production-grade, and it solved the right problem.

The future of precision medicine didn't need another sequencer. It needed a computing fabric that could keep up with the ones we already had.

Arvados, Commercialized

Curoverse was the commercial company built around the Arvados platform. Spun out of the Church Lab in 2013 (originally as Clinical Future), Curoverse productized the open-source stack and sold the enterprise capabilities — managed deployment, on-premise and cloud-hosted clusters, CAP/CLIA-compliant configurations, professional services — to the institutions that needed them. The platform did three critical things:

Managed the data at genomic scale. Content-addressable distributed storage handled terabytes to petabytes natively, with the federation model letting different repositories stay in the hands of the institutions that generated them while remaining accessible to the broader research community.
Made pipelines reproducible. A containerized workflow engine paired with Curoverse's foundational contributions to the Common Workflow Language (CWL) — which became the industry standard for genomic data processing — meant a pipeline written at Sanger ran identically at Hopkins, at Harvard Medical School, in a clinical lab, or in the cloud.
Met regulated lab requirements. CAP/CLIA-compliant deployments, fine-grained access controls, complete auditability, and the option to keep data fully on-premise made Arvados deployable inside the clinical sequencing labs and regulated environments that the open-source community alone couldn't reach.

Curoverse's customer roster was a who's-who of genomics. The Wellcome Trust Sanger Institute ran production workloads on Arvados. Johns Hopkins, Harvard Medical School, and the Personal Genome Project were early adopters. Pharma R&D, clinical sequencing labs, and academic medical centers ran Arvados clusters on AWS, on-premise, and in hybrid configurations. The platform was on a trajectory to handle 10+ petabytes of genomic data per year by the time of acquisition.

The Exit

In August 2017, Veritas Genetics — the genome-sequencing company also co-founded by George Church — acquired Curoverse. The strategic rationale was clean: Veritas was building a platform to sequence and interpret hundreds of thousands, eventually millions, of human genomes per year, and that ambition was only possible with the data infrastructure to match. As Veritas CEO Mirza Cifric framed it at the time, sequencing at that scale required AI and machine learning at scale, which required data that was produced, stored, and managed in a standardized way. Curoverse had spent four years building exactly that capability.

Why We Invested

We backed Curoverse on three convictions, and they all played out:

Sequencing was about to outrun its infrastructure. The cost curve made genomic data abundant. The data engineering to make it useful would be the bottleneck, and the bottleneck would be worth a company.
Open source was the right go-to-market. Bioinformatics ran on open-source tools. A proprietary platform would never win the trust of the institutions that mattered. Arvados — built in an academic lab, contributed to by a global community, hardened by the Personal Genome Project — was the only kind of platform that could.
The team came from the right place. A founding team out of the Church Lab — Sasha Wait Zaranek as scientific lead, Ward Vandewege on engineering, Adam Berrey as CEO, alongside Jonathan Sheffi, Zen Chu, and the rest of the team — had the domain depth and the operational discipline to translate research-grade software into infrastructure that ran the genomic industry.

Curoverse was an early bet on what the industry now calls "biocompute infrastructure." A decade later, every serious genomic data platform — from Veritas, to the national biobanks, to the precision medicine arms of large pharma — is built on the architectural ideas Curoverse helped define. Arvados is still maintained and still in production at institutions around the world.

The science of reading the genome was the easy part. Curoverse built the layer that made it usable.

Insights & News

Article

Robotics

UPS will invest $120 million in 400 robots used to unload trucks as part of its $9 billion automation plan.

UPS will deploy Pickle Robots in multiple facilities in the latter half of 2026 and into 2027.

Portfolio News

AI/ML

Integrate Raises $17M Series A Funding to Scale the Operating System for Classified Defense Programs

Integrate is the world's first ultra-secure platform for dynamic, multi-entity collaboration, and has quickly become a requirement for government launch programs.

Portfolio News

Logistics/Supply Chain

Tive Secures $40 Million in Series C Funding Led by WiL & Sageview Capital

Trusted by over 900 customers, Tive will use the funding to accelerate growth and enhance its leadership in supply chain and logistics visibility technology

View all insights