The Sequence Overrepresentation (SOR) index measures whether or not a cluster of identical sequences within a population is larger than expected by chance given the overall diversity of the population. Briefly, the SOR index (described in Patro et al., PNAS 2019) compares the probability of finding N identical sequence pairs in a set of sequence pairs that have a Poisson distribution with average given by the average p-distance of the supplied sequence set. The sequence set supplied to the SOR webpage should be prealigned and devoid of any sequences that would produce artificially high genetic distances — e.g., hypermutants and outgroup consensus sequences. Output of the SOR webpage is a bar graph showing the distribution of pairwise distances within the supplied dataset (if requested) and a table of p-values with their associated rake sizes and IDs. The SOR index and webpage were developed by postbaccalaureate fellow Michael Bale (HIV DRP) and investigator Brian Luke (Advanced Biomedical Computing Center, Leidos Biomedical Research, Inc.) in consultation with investigators John M. Coffin (Tufts University) and Mary F. Kearney (HIV DRP). An R script version of the web app and the supporting web app code are available at https://github.com/michaelbale.
Despite the success of antiretroviral therapy (ART), HIV-1 persists in reservoirs and viremia rebounds if treatment is interrupted. To facilitate understanding of the genetic structure and dynamics of the HIV-1 reservoir, we developed a public database, Provirus Sequence Database (PSD), for the storage and meta-analyses of near-full-length (NFL) HIV-1 genomic RNA and proviral sequences that persist in donors on ART or that rebound after ART is interrupted. This relational database contains information about host characteristics, treatment, HIV-1 sequences, and tools for sequence annotation/features. PSD was developed by bioinformatics analysts Wei Shao and Jigui Shan (Advanced Biomedical Computing Center, Leidos Biomedical Research, Inc.) in consultation with investigators John M. Coffin (Tufts University); Mary F. Kearney and Wei-Shau Hu (HIV DRP); and John W. Mellors (University of Pittsburgh). PSD can be accessed at the website https://psd.cancer.gov/.
A database on retrovirus integration sites is now available for use by intramural and extramural investigators. The NCI Retrovirus Integration Database (RID) was developed by bioinformatics analysts Wei Shao and Jigui Shan (Advanced Biomedical Computing Center, Leidos Biomedical Research, Inc.) in consultation with John M. Coffin (Tufts University) and HIV DRP investigators Stephen H. Hughes, Frank Maldarelli, and Mary F. Kearney. RID can be accessed at the website https://rid.ncifcrf.gov.
Resources at the National Cancer Institute at Frederick
In addition to the NCI Retrovirus Integration Database, developed and hosted by the HIV DRP, a large variety of resources are available on the NCI at Frederick campus, including:
Retroviruses Book Online
text of the book
Retroviruses (edited by John M. Coffin, Stephen H. Hughes, and
Harold E. Varmus, 1997, Cold Spring Harbor Laboratory Press) is available
online at the National Center for Biotechnology Information website. Figures,
tables, and retrotrivia features from the book are also available at this website.
(Permission to depict the book's cover here was kindly granted by the publisher.)
Last modified: 2 December 2019