Methods for diversity and overlap analysis in T-cell receptor populations

GA Rempala, M Seweryn - Journal of mathematical biology, 2013 - Springer
Journal of mathematical biology, 2013Springer
The paper presents some novel approaches to the empirical analysis of diversity and
similarity (overlap) in biological or ecological systems. The analysis is motivated by the
molecular studies of highly diverse mammalian T-cell receptor (TCR) populations, and is
related to the classical statistical problem of analyzing two-way contingency tables with
missing cells and low cell counts. The new measures of diversity and overlap are proposed,
based on the information-theoretic as well as geometric considerations, with the capacity to …
Abstract
The paper presents some novel approaches to the empirical analysis of diversity and similarity (overlap) in biological or ecological systems. The analysis is motivated by the molecular studies of highly diverse mammalian T-cell receptor (TCR) populations, and is related to the classical statistical problem of analyzing two-way contingency tables with missing cells and low cell counts. The new measures of diversity and overlap are proposed, based on the information-theoretic as well as geometric considerations, with the capacity to naturally up-weight or down-weight the rare and abundant population species. The consistent estimates are derived by applying the Good–Turing sample-coverage correction. In particular, novel consistent estimates of the Shannon entropy function and the Morisita–Horn index are provided. Data from TCR populations in mice are used to illustrate the empirical performance of the proposed methods vis a vis the existing alternatives.
Springer