HLA Completion

This tool will probabilistically "complete" missing or low-resolution HLA-I (loci A,B,C) typing data based on ethnicity. In the process, it will infer ethnicity if unknown. ()

This tool takes as input, HLA class I (loci A,B,C) typing data, specified at possibly multiple resolutions (see input file format for details) and probabilistically resolves the typing ambiguities (i.e., probabilistically “completes” the data to 4-digit resolution). Both phased and unphased outputs are provided (both at a 4-digit level). It is assumed that all HLA data input is defined at the molecular level (i.e., not serological). You must choose either a single ethnicity or provide an assumed distribution of ethnicities which will specify which model(s) are used for completion.

If your input data has considerable ambiguity, the computation can take a long time, so please be patient. Also, if any input line contains too much ambiguity, then the original line will be returned without completion. By default, only high resolution completions with >1% posterior probability are output. If the total output would exceed 100,000 lines, then an error message is reported and computation is aborted. To get around these computational limitations, you may wish to download the executables.

An example use of this tool can be found at

Jonathan M. Carlson*#, Chanson J. Brumme*, Eric Martin*, Jennifer Listgarten, Mark A. Brockman, Anh Q. Le, Celia Chui, Laura A. Cotton, David J.H.F. Knapp, Sharon A. Riddler, Richard Haubrich, George Nelson, Nico Pfeifer, Charles E. DeZiel, David Heckerman, Richard Apps, Mary Carrington, Simon Mallal, P. Richard Harrigan, Mina John, Zabrina L. Brumme# and the International HIV Adaptation Collaborative
Journal of Virology, 86(24):13187-13201, December 2012.

Details of the training data & acknowledgements

The training data used for our model are an aggregate of two main data sets: i) those typed in the laboratory of Mary Carrington (see specific cohort acknowledgements below), and ii) data provided to us by the National Marrow Donor Program (NMDP), as described in Maiers et al, "High-resolution HLA alleles and haplotypes in the U.S. population" Hum Immunol. 2007 Sep; 68(9):779-88, though not the NMDP European data as this is transplant biased. All in all, there were 6057 African-descent data, 256 Amerindian data, 3088 Asian descent data, 8067 European descent data, and 2860 Hispanic descent data. A model for each ethnicity was trained separately.

The following cohorts and investigators generously allowed us to use their data, typed in the laboratory of Mary Carrington at NCI: the International HIV Controllers Study, the Multicenter AIDS Cohort Study, the Multicenter Hemophilia Cohort Study, the Washington and New York Men’s Cohort Study, the San Francisco City Clinic Cohort, the AIDS Linked to Intravenous Experience, the Swiss HIV Cohort, the Urban Health Study, the NIH Focal Segmental Glomerulosclerosis Genetic Study, Hepatitis C Antiviral Long-term Treatment against Cirrhosis, National Cancer Institute Surveillance Epidemiology and End Results Non-Hodgkin Lymphoma Case-Control Study, Woman Interagency Health Study, Classic Kaposi Sarcoma Case-Control Study I and II, Genetic Modifiers Study, Nairobi CTL Cohort, Grace John-Stewart, Stephen O’Brien, and Thomas O’Brien. Acquisition of this data has been funded in whole or in part with federal funds from the National Cancer Institute, National Institutes of Health, under contract N01-CO-12400. The content of this publication does not necessarily reflect the views of policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the US Government. This research was supported in part by the Intramural Research Program of NIH, National Cancer Institute, Center for Cancer Research.

For technical details and to cite, please see
Jennifer Listgarten#, Zabrina Brumme, Carl Kadie, Bruce Walker, Mary Carrington, Philip Goulder and David Heckerman#
PLoS Computational Biology, 4:e1000016, February 2008.

By using this tool you confirm you have consent from subjects to submit their data.

Ethnicity (): load example
Ethnicity prior ():
HLA file ():
Minimum output probability (): indicates a required field

Jennifer Listgarten#, Zabrina Brumme, Carl Kadie, Bruce Walker, Mary Carrington, Philip Goulder and David Heckerman#
PLoS Computational Biology, 4:e1000016, February 2008.