Skip to main content

Table 1 Pipelines and bioinformatics tools utilized in genomic resources in the national biobank

From: Lessons from national biobank projects utilizing whole-genome sequencing for population-scale genomics

 

UK Biobank

NPBBD-Korea

PRECISE

BBJ

All of Us

Variant calling

GATK

DRAGEN (FPGA-accelerated)

GATK

GATK

GATK

DRAGEN

GATK

DeepVariant (deep learning-based precision)

Multi-sample VCF

GATK (GenotypeGVCFs)

DRAGEN (DRAGEN Iterative gVCF Genotyper for scalability)

Graphtyper

GATK (GenotypeGVCFs)

GATK (GenotypeGVCFs)

GATK (GenotypeGVCFs)

Graphtyper

Genomic Variant Store (GATK based)

Glnexus

Data representation & storage

BAM/CRAM

Sparse VCF

BAM

gVCF

BAM/CRAM

Sparse VCF

BAM/CRAM

Dense VCF

BAM/CRAM

Sparse VCF (Hail matrix, VDS)

Computing environment

Cloud-based RAP with DNAnexus and AWS

KISTI National Supercomputing Center (https://www.ksc.re.kr/eng/index/main)

RAPTOR (Research Assets Provisioning and Tracking Online Repository)

Local HPC for server-based analysis

Cloud-based workbench (Google Cloud Platform for large-scale analysis)

Data management system

“Category-field”-based data structure

DRC and RDR-CDR system

-

-

GIMS

Data access system

Tier system, paid for all tiers

Tier system, free for all tiers

Tier system, free for all tiers

Tier system, free for all tiers

Tier system, free for all tiers