DATE AND TIME: Monday, March 6, 4:10 p.m.

        PLACE: 1652 Gilman

        SPEAKER:
        James Lyons-Weiler
        The Pennsylvania State University

        TITLE:
        Data Exploration and Hypothesis Testing in Molecular Phylogenetics,
        Molecular Evolution, and Beyond

        ABSTRACT:

        Evolutionary genomics draws upon phylogenetics, molecular evolution, and functional and structural genomics.  Both phylogenetics and molecular evolution are improved by Tree-Independent Data Exploration, and molecular evolution can be improved by a more recently devised Monte Carlo Test of Purifying
        Selection.  A matrix regression model is used in tree-independent data exploration, allowing researchers to find noisy genes (measure signal), perform optimal outgroup analysis, detect long branches, evaluate taxon sampling, and perform noise reduction.   An example where such data exploration has lead to markedly improved phylogenetic estimates is the recent study by Culligan et al. (2000), who concluded that the eukaryotic postreplication mismatch repair 'mutS homolog' multigene family (MSH2-6) represents a monophyletic gene family derived from a mutS copy present in the protomitochondrial endosymbiont.  In a
        similar manner, hypothesis testing in molecular evolution can be improved by a novel, computationally intensive test and measure of purifying selection. Classical, distribution-dependent tests measuring rates of synonymous and nonsynonymous substitutions have low power.  The new test employs a comparison
        of the observed amino acid divergence to a null distribution of amino acid divergence predicted by a neutral substitution model and neutral rates of nucleotide divergence.  This Monte Carlo test is shown to have remarkably higher power to detect purifying selection, leading to a dramatic difference in the interpretation of the importance of selection during molecular evolution.  This also provides an example where caution is warranted in the biological interpretation of negative statistical results.  Because most attempts to study
        the importance of natural selection have been based on low power tests, the importance of natural selection as a driving force behind molecular evolution has, and is likely to continue to be, underestimated.  Evolutionary genomics will be much improved by the careful construction and application of powerful statistical approaches to hypothesis testing that focus on the responses of relationships among measureable variables in addition to those which focus primarily on simple parameter estimation.
         

        When marker classes at a locus are coded 1, 0, -1 for MM, Mm, and mm, respectively, the multiple regression for data from large F2 populations has some elegant properties:

        1. The vector of regression coefficients b = (X’X)-1X’Y may be considered a product of marker relationship information contained in X’X and simple linear regression estimates for each marker locus, X’Y.

        2. As sample size , n, gets large, 2X’X/n approaches the correlation matrix among markers, R.

        3. The inverse of R has been derived for the no-interference case by Wright and Mowers (1994).

        4. With evenly spaced markers on a chromosome, R has the same form as an error variance matrix for a first-order autoregressive process.

        5. Marker-pair regressions using linked markers give reasonable estimates of positions and effects of additive genetic factors located between markers.

        6. Less promising is the result that variances of multiple regression are strongly affected by closeness to flanking markers of the nearest distal markers.

        A practical example of use of marker-pair and multiple regressions is given for gray leaf spot tolerance in maize.  Magnitude of effects and location of possible genetic factors are estimated from the regressions.
         
         

        COFFEE: 3:45 p.m., 104 Snedecor Hall