In late 2011, I gave cheek swabs to National Geographic to trace my genetic genealogy. The samples looked at markers from my Y chromosome and mitochondrial DNA: the first was passed down from my father from his father from his father up through the generations, and the second from my mother from her mother from her mother etc. The markers would help me learn about those ancestral lines. It took a couple months for the results to come back, and this puzzled me.
A few years prior, I was at a center that was sequencing DNA at surprisingly fast rates, so I couldn’t understand why my sequences would take so long. I took follow-up tests with Family Tree DNA to better understand my genetic genealogy, but these were just as slow.
Meanwhile, the fast sequencing that I had observed was being commercialized by companies like Illumina, which provide “high throughput” sequencing technology. One well-known technique involves reading fragments of DNA and reassembling it into the original genome in a manner akin to solving a jigsaw puzzle. A couple nights ago, I was having dinner with a friend who worked at one of these high throughput sequencing companies, and I asked how “high throughput” these jigsaw-puzzle solvers are.
“The throughput is roughly one genome per day,” he responded. The entire genome? Yup. Even eukaryotes? Yup. And they handle multiple chromosomes? Yup. Wouldn’t my tests only take a tiny fraction of that time? Yup. Were the companies I sent this to using high throughput sequencing technology? Yup. Then what was going on?
My tests amounted to genotype matching, which my friend confirmed was a simpler process and would take less time; however, the technology requires samples to be prepped according to the type of test. While the prep work itself may only take a few hours, running a test on my sample simply isn’t cost-effective by itself. In order to achieve economies of scale, it would be best to batch my test in with similar ones.
“Why not just sequence the entire genome and do post-processing in the cloud?” This, it turned out, was the holy grail, and companies were working on new ways to improve the speed.
Were they simply building faster jigsaw-puzzle solvers? No. The algorithms for assembling the genomes were fairly mature, so the real gains have come from finding ways to set up the prep so that it can provide side information in the later assembly. In one such example, Illumina had acquired the start-up Moleculo, which had found a clever way to barcode the preps for faster assembly. Moleculo consisted of a mix of molecular biologists and computer scientists, with the innovations happening at the interface between the fields.