Transmission Index

The transmission index of an HIV sequence estimates its relative fitness for transmission. This estimate is predominantly derived from the similarity of protein sequences to those circulating in the population. ()
Details 

The transmission index was derived in the context of studying the HIV transmission bottleneck, in which the majority of transmission events are established by a single virus. In this context, we reported that, on a per-site basis, the probability that the dominant amino acid observed in the donor's quasispecies would be transmitted could be modeled using a generalized linear mixed model (GLMM) in which fixed effects include various generic features of the amino acid. The most prominent of these was the frequency (and it's square) of the amino acid in the circulating population. Other features included the number of covarying sites, the inferred structural energy cost of a mutation on the protein structure, and variation subject level features, such as sex, overall risk of infection, etc.

In the original paper, we fitted a GLMM to linked transmission pairs (Table 2 in the paper). Taken literally, this model estimates the probability that a donor virus will be transmitted. If you average this log-probability, you get a sequence-level estimate, which we call the "transmission index". In that paper, the transmission index was associated with the recipient founder virus (higher TI relative to the average donor virus) and predicted whether a potential donor would transmit to his or her partner. In follow up work, we have shown that transmission index tends to be higher among individuals with lower biological risk, as predicted by the model (see below for an example).

The best way to run this tool is using a covariation model, which requires that your sequences be aligned to one of our reference data sets. This allows both the use of covariation data, and sub-protein specific offsets (e.g. p17, p24, gp41, etc), as indicated in Table 2 of the original manuscript. If you choose "Unspecified", your alignments do not have to match, and we will only use the amino acid frequency components of the GLMM.

Note that the transmission index is computed with respect to a reference alignment. You can either use one of ours, or supply one yourself. In our experiments, it doesn't matter much, provided your alignment is large enough (sequences from hundreds of subjects) to allow a reseasonable estimate of the underlying AA frequencies in the population. The query alignmnet must match whatever reference alignment you use, so that position 10 means the same thing in both alignments.

Warning

Note that bulk sequences represent the consensus of a quasispecies population. As such, they will overestimate the transmission index of any single virus: specifically, the transmission index of the bulk (consensus) sequence will be greater than the mean transmission index over clones from that population. This is because one or two point mutations in each clone will be averaged out in the consensus. This effect increases as the diversity of the quasispecies increases. This means it is not appropriate to compare the transmission index of a bulk sequence from a chronically infected source partner to their acutely infected recipient partner.

A more appropriate use is to compare the transmission indexes of high versus low risk populations, as was done here:

Damien C. Tully, Colin B. Ogilvie, Rebecca E. Batorsky, David J. Bean, Karen A. Power, Musie Ghebremichael, Hunter E. Bedard, Adrianne D. Gladden, Aaron M. Seese, Molly A. Amero, Kimberly Lane, Graham McGrath, Suzane B. Bazner, Jake Tinsley, Niall J. Lennon, Matthew R. Henn, Zabrina L. Brumme, Philip J. Norris, Eric S. Rosenberg, Kenneth H. Mayer, Heiko Jessen, Sergei L. Kosakovsky Pond, Bruce D. Walker, Marcus Altfeld, Jonathan M. Carlson and Todd M. Allen
PLoS Pathogens, 12(5):e1005619, doi:10.1371/journal.ppat.1005619, May 2016.

For technical details and to cite, please see
Jonathan M. Carlson*#, Malinda Schaefer*, Daniela C. Monaco, Rebecca Batorsky, Daniel T. Claiborne, Jessica Prince, Martin J. Deymier, Zachary S. Ende, Nichole R. Klatt, Charles E. DeZiel, Tien-Ho Lin, Jian Peng, Aaron M. Seese, Roger Shapiro, John Frater, Thumbi Ndung'u, Jianming Tang, Paul Goepfert, Jill Gilmour, Matt A. Price, William Kilembe, David Heckerman, Philip J. R. Goulder, Todd M. Allen, Susan Allen and Eric Hunter#
Science, 345(6193):1254031, July 2014.

By using this tool you confirm you have consent from subjects to submit their data.


 
Covariation Model (): load example
Reference alignment ():
Query alignment ():
Proteins ():
indicates a required field



Jonathan M. Carlson*#, Malinda Schaefer*, Daniela C. Monaco, Rebecca Batorsky, Daniel T. Claiborne, Jessica Prince, Martin J. Deymier, Zachary S. Ende, Nichole R. Klatt, Charles E. DeZiel, Tien-Ho Lin, Jian Peng, Aaron M. Seese, Roger Shapiro, John Frater, Thumbi Ndung'u, Jianming Tang, Paul Goepfert, Jill Gilmour, Matt A. Price, William Kilembe, David Heckerman, Philip J. R. Goulder, Todd M. Allen, Susan Allen and Eric Hunter#
Science, 345(6193):1254031, July 2014.