How Population Biobanks Shed Light on Disease

by Jason Chien

2 June 2026

Illustrated by Chris Cao

Edited by Cady Jacobson

Imagine yourself as a researcher. Perhaps your work requires biological samples from a rare disease, but finding and collecting these samples from patients is difficult. Or maybe your research depends on comparing a person’s current cell or tissue data to what it was ten years ago, but you cannot afford to wait a decade. Thankfully, there are biobanks: specialised facilities that store biological samples and other relevant medical information from donors, while also distributing samples to researchers regardless of where they are based. Biobanks also act as data custodians, de-identifying and anonymising patient information. In addition, they work with Institutional Review Boards (essentially, university or agency ethics boards) to review the merits of each access request before deciding whether to deliver samples to applicants (1). For some biobanks and sample types, samples can even be returned after being used for research (2).

There are biobanks storing non-human data, such as the Svalbard Global Seed Vault or Australia’s Victorian Conservation Seedbank, which store viable seeds of many plant species and their various strains (3). Even within biomedical biobanks, they vary in the types of samples stored and by extension, their intended and fulfilled functions, as well as why they were built (4).There are disease-specific biobanks storing samples relevant to specific diseases; for example, a cardiovascular biobank that specialises in storing tissue immediately following a patient’s death, such that it can be used for physiological studies (2). However, this article will focus specifically on population biobanks, a subset of biomedical biobanks, in the context of disease investigation. Ethical issues surrounding biobanking will not be a major focus, including matters such as how sample donors provide consent for the use of their samples during research (4).

Population biobanks store tissue samples, plus health and personal information of donors. Large sites have sample counts ranging from hundreds of thousands to millions (1). Population biobanks aim to have enough samples to represent the huge variation among a region or country’s population, and store many parameters of each sample donor, such as lifestyle data (e.g. cardiovascular disease or smoking status) and omics data (5). Biomedical scientists study many aspects of what goes on in human cells and tissues, down to the molecular level, due to their relevance for our understanding of disease. These factors include our genetic makeup, regulation and expression of genes, as well as the effects of environmental factors and metabolism (6). In such omics approaches, our existing knowledge of the complete set of genes in species such as humans allows information from an individual’s own genome to be generated and then compared or combined with data from other individuals. This gives rise to approaches such as genomics (concerning genes), proteomics (concerning proteins) and metabolomics (concerning molecules involved in metabolism). Large subsets of this data from individuals, as well as population-level variations associated with specific diseases, can be used for research. For example, the number of genes involved in cancer alone can easily exceed thousands (7).

Large sample quantities allow researchers doing many different investigations to identify factors correlated with disease resistance and susceptibility to many diseases (1). This complements methods of investigating disease mechanisms involving specific genes and molecules by helping researchers identify candidate genes and molecules for further investigation. As a consequence, many samples in population biobanks are actually from healthy donors rather than hospital patients (5). Beyond providing samples, some of these large biobanks are also direct providers of omics data. While biobanks are not necessary for generating omics data from one or a few individuals, they enable the collection of data from enough samples to represent the diversity of the population and capture variants as rare as those in 0.1% of the population (6). Because of rigorous sample processing and quality control procedures standardised across biobanks, the omics data generated from samples collected by different biobanks can also be more easily combined by researchers to yield insights (1). Furthermore, as new higher-resolution biotechnology develops, they can be used to investigate samples collected in the past (1).

For example, the UK biobank enrolled 500,000 participants for its first cohort in 2006, collecting blood, urine, and saliva samples, in addition to substantial lifestyle data and physical measurements for each participant (5). Though there is no flashy silver-bullet discovery like penicillin directly resulting from the biobank, it has greatly increased our knowledge of which genes and proteins to target when designing new drugs, along with which genetic variants increase our predisposition to a range of diseases, such as cancer and cardiovascular diseases (8).

Population biobanks are huge, long-term investments. For example, funding from the UK government and various non-profits for the UK biobank have exceeded £90 million British Pounds from its inception to 2014 (1). Reasons for their construction include understanding the mechanisms of disease, translating research into interventions, improving health outcomes, and promoting biotechnology (1). Population biobanks that are able to effectively engage sample donors can generate population data to support research into diseases that most heavily affect a country (1). The majority of the world’s biobank datasets are still composed primarily of individuals with white Northern European ancestry (9), and using data from one population can be significantly less effective for identifying risk factors in another population. This limitation serves as a driver for developing countries to build their own population biobanks (9).

Part of the reason most biobanks are built is to serve as a national public good, with data made accessible to both academic researchers at affordable rates and to industry researchers (1, 10). Large population biobanks are usually run as a public entity or as public-private partnerships with a large proportion of public funding (10). In fact, even with cost recovery measures that charge users for accessing biobank samples — often with higher rates for industry researchers — revenues still fall below operating costs (1). That said, biobanks create benefits beyond their own countries, and multinational collaborations have expanded their scale and reach (1), allowing researchers to access omics data and request samples from biobanks overseas. With biobanking infrastructures in place, multinational collaborations have emerged that facilitate data sharing between research institutions across different countries through consortia – formal collaborations between participating institutions that establish common research goals and reduce competition over the use of specialised facilities (11). In other words, they seek to minimise situations in which multiple research groups inadvertently work on the exact same project, leading to an inefficient allocation of resources.

One example is the International Cancer Genome Consortium (ICGC). This agreement creates a collaboration framework for data exchange in around 200 large-scale cancer research projects, with participating biobanks from Europe, China, Australia, USA, and other countries. Participating biobanks include large population biobanks, but also other types such as disease-specific biobanks (12).

To finalise, biobanks are not simply a place where biological samples are stored. They are dynamic entities that can be scaled up and down, places where samples are sent in and out, and they face financial pressures as national research priorities change. They are places where innovation occurs in a wide range of areas, from cryostorage to management of digital information. The power of population biobanks and their research potential lies in their cohort sizes. Hopefully, biobanks will continue to generate valuable new discoveries as newly established cohorts around the world begin to mature.

References

Chalmers D, Nicol D, Kaye J. et al. Has the biobank bubble burst? Withstanding the challenges for sustainable biobanking in the digital era. BMC Med Ethics. 2016;17(1):39. doi:10.1186/s12910-016-0124-2. PMID: 27405974; PMCID: PMC4941036.
Yamada, K.A., Patel, A.Y., Ewald, G.A. et al. How to Build an Integrated Biobank: The Washington University Translational Cardiovascular Biobank & Repository Experience. Clinical And Translational Science. 2013;6(3):226-231. doi: https://doi.org/10.1111/cts.12032
Seed deposit at Doomsday Vault ensures Australia’s plant future. ABC News. 2018 Mar 1. https://www.abc.net.au/news/2018-03-01/australia-makes-deposit-in-to-doomsday-vault-to-ensure-survival/9496308
De Souza, Yvonne G.; Greenspan, John S. Biobanking past, present and future: responsibilities and benefits. AIDS. 2013;27(3):303-312. doi: 10.1097/QAD.0b013e32835c1244
Busby H, Martin P. Biobanks, national identity and imagined communities: The case of UK biobank. Science as Culture. 2006 Sep;15(3):237–51. doi:10.1080/09505430600890693
Murtagh, M.J., Demir, I., Harris, J.R. et al. Realizing the promise of population biobanks: a new model for translation. Hum Genet. 2011;130:333–345. doi: https://doi.org/10.1007/s00439-011-1036-3
Ferolito BR, Dashti H, Giambartolomei C, Peloso GM, Golden DJ, Gravel-Pucillo K, et al. Leveraging large-scale biobanks for therapeutic target discovery. Human Genetics and Genomics Advances. 2026 Jan;7(1):100556. doi:10.1016/j.xhgg.2025.100556
Szustakowski JD, Balasubramanian S, Kvikstad E, Khalid S, Bronson PG, Sasson A, et al. Advancing human genetics research and drug discovery through exome sequencing of the UK Biobank. Nat Genet. 2021 Jul;53(7):942–8. doi:10.1038/s41588-021-00885-0
Rudan I, Marušić A, Campbell H. Developing biobanks in developing countries. J Glob Health. 2011 Jun;1(1):2–4. PubMed PMID: 23198094; PubMed Central PMCID: PMC3484738.
Caulfield T, Burningham S, Joly Y, Master Z, Shabani M, Borry P, et al. A review of the key issues associated with the commercialization of biobanks. Journal of Law and the Biosciences. 2014 Mar 1;1(1):94–110. doi:10.1093/jlb/lst004
Nature Index. 2021. How to be part of a research consortium. Available from: https://www.nature.com/nature-index/news/how-to-be-part-of-a-research-consortium
Hudson (Chairperson) TJ, Anderson W, Aretz A, Barker AD, Bell C, Bernabé RR, et al. International network of cancer genome projects. Nature. 2010 Apr;464(7291):993–8. doi:10.1038/nature08987

How Population Biobanks Shed Light on Disease

back to

Fact & Fiction