<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
    <channel>
        <title>Statistical Applications in Genetics and Molecular Biology via MedWorm.com</title>
        <description>MedWorm.com provides a medical RSS filtering service. Over 6000 RSS medical sources are combined and output via different filters. This feed contains the latest items from the 'Statistical Applications in Genetics and Molecular Biology' source.</description>
        <link><![CDATA[http://www.medworm.com/rss/search.php?qu=Statistical+Applications+in+Genetics+and+Molecular+Biology&t=Statistical+Applications+in+Genetics+and+Molecular+Biology&s=Search&f=source]]></link>
        <lastBuildDate>Thu, 09 Feb 2012 09:43:48 +0100</lastBuildDate>
        <item>
            <title>The Inheritance Procedure: Multiple Testing of Tree-structured Hypotheses</title>
            <link>http://www.medworm.com/index.php?rid=5623742&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol11%2Fiss1%2Fart11</link>
            <description>We present the inheritance procedure, a method of familywise error control for hypotheses structured in a tree. The method starts testing at the top of the tree, following up on those branches in which it finds significant results, and following up on leaf nodes in the neighborhood of those leaves. The method is a uniform improvement over a recently proposed method by Meinshausen. The inheritance procedure has been implemented in the globaltest package which is available on www.bioconductor.org. (Source: Statistical Applications in Genetics and Molecular Biology)</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=5623742</comments>
            <pubDate>Sun, 22 Jan 2012 03:49:38 +0100</pubDate>
            <guid isPermaLink="false">5623742</guid>        </item>
        <item>
            <title>Optimality Criteria for the Design of 2-Color Microarray Studies</title>
            <link>http://www.medworm.com/index.php?rid=5590952&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol11%2Fiss1%2Fart10</link>
            <description>We discuss the definition and application of design criteria for evaluating the efficiency of 2-color microarray designs. First, we point out that design optimality criteria are defined differently for the regression and block design settings. This has caused some confusion in the literature and warrants clarification. Linear models for microarray data analysis have equivalent formulations as ANOVA or regression models. However, this equivalence does not extend to design criteria.  We discuss optimality criterion, and argue against applying regression-style D-optimality to the microarray design problem. We further disfavor E- and D-optimality (as defined in block design) because they are not attuned to scientific questions of interest. (Source: Statistical Applications in Genetics and Mole...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=5590952</comments>
            <pubDate>Sat, 14 Jan 2012 02:15:22 +0100</pubDate>
            <guid isPermaLink="false">5590952</guid>        </item>
        <item>
            <title>Improving Pedigree-based Linkage Analysis by Estimating Coancestry Among Families</title>
            <link>http://www.medworm.com/index.php?rid=5575549&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol11%2Fiss2%2Fart11</link>
            <description>We report results from analyses of three sets of simulated marker data on two different pedigrees. We show that when families share a gene for a trait due to shared ancestry on the order of tens of generations, our method can detect a linkage signal when independent analyses of the families do not. (Source: Statistical Applications in Genetics and Molecular Biology)</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=5575549</comments>
            <pubDate>Fri, 06 Jan 2012 20:39:23 +0100</pubDate>
            <guid isPermaLink="false">5575549</guid>        </item>
        <item>
            <title>Candidate Pathway Based Analysis for Cleft Lip with or without Cleft Palate</title>
            <link>http://www.medworm.com/index.php?rid=5575550&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol11%2Fiss2%2Fart10</link>
            <description>The objective of this research was to identify potential biological pathways associated with non-syndromic cleft lip with or without cleft palate (NSCL/P), and to explore the potential biological mechanisms underlying these associated pathways on risk of NSCL/P. This project was based on the dataset of a previously published genome-wide association (GWA) study on NSCL/P (Beaty et al. 2010). Case-parent trios used here originated from an international consortium (The Gene, Environment Association Studies consortium, GENEVA) formed in 2007. A total of 5,742 individuals from 1,908 CL/P case-parents trios (1,591 complete trios and 317 incomplete trios where one parent was missing) were collected and genotyped using the Illumina Human610-Quad array. Candidate pathways were selected using a list...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=5575550</comments>
            <pubDate>Fri, 06 Jan 2012 20:39:21 +0100</pubDate>
            <guid isPermaLink="false">5575550</guid>        </item>
        <item>
            <title>A Model-Based Analysis to Infer the Functional Content of a Gene List</title>
            <link>http://www.medworm.com/index.php?rid=5575551&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol11%2Fiss2%2Fart9</link>
            <description>An important challenge in statistical genomics concerns integrating experimental data with exogenous information about gene function. A number of statistical methods are available to address this challenge, but most do not accommodate complexities in the functional record. To infer activity of a functional category (e.g., a gene ontology term), most methods use gene-level data on that category, but do not use other functional properties of the same genes. Not doing so creates undue errors in inference. Recent developments in model-based category analysis aim to overcome this difficulty, but in attempting to do so they are faced with serious computational problems. This paper investigates statistical properties and the structure of posterior computation in one such model for the analysis of...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=5575551</comments>
            <pubDate>Fri, 06 Jan 2012 20:39:18 +0100</pubDate>
            <guid isPermaLink="false">5575551</guid>        </item>
        <item>
            <title>Querying Genomic Databases: Refining the Connectivity Map</title>
            <link>http://www.medworm.com/index.php?rid=5575552&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol11%2Fiss2%2Fart8</link>
            <description>The advent of high-throughput biotechnologies, which can efficiently measure gene expression on a global basis, has led to the creation and population of correspondingly rich databases and compendia. Such repositories have the potential to add enormous scientific value beyond that provided by individual studies which, due largely to cost considerations, are typified by small sample sizes. Accordingly, substantial effort has been invested in devising analysis schemes for utilizing gene-expression repositories. Here, we focus on one such scheme, the Connectivity Map (cmap), that was developed with the express purpose of identifying drugs with putative efficacy against a given disease, where the disease in question is characterized by a (differential) gene-expression signature. Initial claims...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=5575552</comments>
            <pubDate>Fri, 06 Jan 2012 20:39:15 +0100</pubDate>
            <guid isPermaLink="false">5575552</guid>        </item>
        <item>
            <title>Adjusting for Spurious Gene-by-Environment Interaction Using Case-Parent Triads</title>
            <link>http://www.medworm.com/index.php?rid=5575553&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol11%2Fiss2%2Fart7</link>
            <description>In the case-parent trio design, unrelated children affected with a disease are genotyped along with their parents. Information may also be collected on environmental factors in the children. The design permits estimation and testing of genetic effects and gene-by-environment interaction. Recently, it has been demonstrated that when genotypes are measured at a non-causal test locus, population stratification can create spurious interaction. That is, the environmental factor can appear to modify the disease risk associated with genotypes at the test locus without modifying the disease risk of genotypes at the causal locus. One design-based approach that is robust to spurious interaction requires the environmental factor to also be available on an unaffected sibling of the affected child. We ...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=5575553</comments>
            <pubDate>Fri, 06 Jan 2012 20:39:11 +0100</pubDate>
            <guid isPermaLink="false">5575553</guid>        </item>
        <item>
            <title>A Family-Based Probabilistic Method for Capturing De Novo Mutations from High-Throughput Short-Read Sequencing Data</title>
            <link>http://www.medworm.com/index.php?rid=5575554&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol11%2Fiss2%2Fart6</link>
            <description>Recent advances in high-throughput DNA sequencing technologies and associated statistical analyses have enabled in-depth analysis of whole-genome sequences. As this technology is applied to a growing number of individual human genomes, entire families are now being sequenced. Information contained within the pedigree of a sequenced family can be leveraged when inferring the donors' genotypes. The presence of a de novo mutation within the pedigree is indicated by a violation of Mendelian inheritance laws. Here, we present a method for probabilistically inferring genotypes across a pedigree using high-throughput sequencing data and producing the posterior probability of de novo mutation at each genomic site examined. This framework can be used to disentangle the effects of germline and somat...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=5575554</comments>
            <pubDate>Fri, 06 Jan 2012 20:39:07 +0100</pubDate>
            <guid isPermaLink="false">5575554</guid>        </item>
        <item>
            <title>Bayesian Sparsity-Path-Analysis of Genetic Association Signal using Generalized t Priors</title>
            <link>http://www.medworm.com/index.php?rid=5575555&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol11%2Fiss2%2Fart5</link>
            <description>We explore the use of generalized t priors on regression coefficients to help understand the nature of association signal within “hit regions” of genome-wide association studies. The particular generalized t distribution we adopt is a Student distribution on the absolute value of its argument. For low degrees of freedom, we show that the generalized t exhibits “sparsity-prior” properties with some attractive features over other common forms of sparse priors and includes the well known double-exponential distribution as the degrees of freedom tends to infinity. We pay particular attention to graphical representations of posterior statistics obtained from sparsity-path-analysis (SPA) where we sweep over the setting of the scale (shrinkage/precision) parameter in the prior to explore ...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=5575555</comments>
            <pubDate>Fri, 06 Jan 2012 20:39:04 +0100</pubDate>
            <guid isPermaLink="false">5575555</guid>        </item>
        <item>
            <title>Principal Components of Heritability for High Dimension Quantitative Traits and General Pedigrees</title>
            <link>http://www.medworm.com/index.php?rid=5575556&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol11%2Fiss2%2Fart4</link>
            <description>The objectives of this paper are the following: i) to review some standard strategies available in the literature to estimate variance components for unbalanced data in mixed models; ii) to propose an ANOVA method for a genetic random effect model to estimate the variance components, which can be applied to general pedigrees and high dimensional family data within the PCH framework; iii) to elucidate the connection between PCH analysis and Linear Discriminant Analysis. We use computer simulations to show that the proposed method has similar asymptotic properties as Lange's method when the number of traits is small, and we study the efficiency of our method when the number of traits is large. A data analysis involving schizophrenia and bipolar quantitative traits is finally presented to ill...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=5575556</comments>
            <pubDate>Fri, 06 Jan 2012 20:38:59 +0100</pubDate>
            <guid isPermaLink="false">5575556</guid>        </item>
        <item>
            <title>Gene Filtering in the Analysis of Illumina Microarray Experiments</title>
            <link>http://www.medworm.com/index.php?rid=5575557&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol11%2Fiss2%2Fart3</link>
            <description>Illumina bead arrays are microarrays that contain a random number of technical replicates (beads) for every probe (bead type) within the same array. Typically around 30 beads are placed at random positions on the array surface, which opens unique opportunities for quality control. Most preprocessing methods for Illumina bead arrays are ported from the Affymetrix microarray platform and ignore the availability of the technical replicates. The large number of beads for a particular bead type on the same array, however, should be highly correlated, otherwise they just measure noise and can be removed from the downstream analysis. Hence, filtering bead types can be considered as an important step of the preprocessing procedure for Illumina platform. This paper proposes a filtering method for I...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=5575557</comments>
            <pubDate>Fri, 06 Jan 2012 20:17:33 +0100</pubDate>
            <guid isPermaLink="false">5575557</guid>        </item>
        <item>
            <title>A Generalized Hidden Markov Model for Determining Sequence-based Predictors of Nucleosome Positioning</title>
            <link>http://www.medworm.com/index.php?rid=5575558&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol11%2Fiss2%2Fart2</link>
            <description>Chromatin structure, in terms of positioning of nucleosomes and nucleosome-free regions in the DNA, has been found to have an immense impact on various cell functions and processes, ranging from transcriptional regulation to growth and development. In spite of numerous experimental and computational approaches being developed in the past few years to determine the intrinsic relationship between chromatin structure (nucleosome positioning) and DNA sequence features, there is yet no universally accurate approach to predict nucleosome positioning from the underlying DNA sequence alone. We here propose an alternative approach to predicting nucleosome positioning from sequence, making use of characteristic sequence differences, and inherent dependencies in overlapping sequence features. Our nuc...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=5575558</comments>
            <pubDate>Fri, 06 Jan 2012 20:17:30 +0100</pubDate>
            <guid isPermaLink="false">5575558</guid>        </item>
        <item>
            <title>Special Issue on Computational Statistical Methods for Genomics and Systems Biology</title>
            <link>http://www.medworm.com/index.php?rid=5575559&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol11%2Fiss2%2Fart1</link>
            <description>We provide a brief editorial introduction to a special issue of Statistical Applications in Genetics and Molecular Biology dedicated to the workshop on &quot;Computational Statistical Methods for Genomics and Systems Biology&quot;, held at the Centre de recherches mathématiques in Montreal in April 2011. (Source: Statistical Applications in Genetics and Molecular Biology)</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=5575559</comments>
            <pubDate>Fri, 06 Jan 2012 20:17:28 +0100</pubDate>
            <guid isPermaLink="false">5575559</guid>        </item>
        <item>
            <title>Stopping-Time Resampling and Population Genetic Inference under Coalescent Models</title>
            <link>http://www.medworm.com/index.php?rid=5575560&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol11%2Fiss1%2Fart9</link>
            <description>To extract full information from samples of DNA sequence data, it is necessary to use sophisticated model-based techniques such as importance sampling under the coalescent. However, these are limited in the size of datasets they can handle efficiently. Chen and Liu (2000) introduced the idea of stopping-time resampling and showed that it can dramatically improve the efficiency of importance sampling methods under a finite-alleles coalescent model. In this paper, a new framework is developed for designing stopping-time resampling schemes under more general models. It is implemented on data both from infinite sites and stepwise models of mutation, and extended to incorporate crossover recombination. A simulation study shows that this new framework offers a substantial improvement in the accu...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=5575560</comments>
            <pubDate>Fri, 06 Jan 2012 20:00:27 +0100</pubDate>
            <guid isPermaLink="false">5575560</guid>        </item>
        <item>
            <title>A Mixture-Model Approach for Parallel Testing for Unequal Variances</title>
            <link>http://www.medworm.com/index.php?rid=5575561&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol11%2Fiss1%2Fart8</link>
            <description>Testing for unequal variances is usually performed in order to check the validity of the assumptions that underlie standard tests for differences between means (the t-test and anova). However, existing methods for testing for unequal variances (Levene's test and Bartlett's test) are notoriously non-robust to normality assumptions, especially for small sample sizes. Moreover, although these methods were designed to deal with one hypothesis at a time, modern applications (such as to microarrays and fMRI experiments) often involve parallel testing over a large number of levels (genes or voxels). Moreover, in these settings a shift in variance may be biologically relevant, perhaps even more so than a change in the mean. This paper proposes a parsimonious model for parallel testing of the equal...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=5575561</comments>
            <pubDate>Fri, 06 Jan 2012 20:00:23 +0100</pubDate>
            <guid isPermaLink="false">5575561</guid>        </item>
        <item>
            <title>Fast Identification of Biological Pathways Associated with a Quantitative Trait Using Group Lasso with Overlaps</title>
            <link>http://www.medworm.com/index.php?rid=5575562&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol11%2Fiss1%2Fart7</link>
            <description>Where causal SNPs (single nucleotide polymorphisms) tend to accumulate within biological pathways, the incorporation of prior pathways information into a statistical model is expected to increase the power to detect true associations in a genetic association study. Most existing pathways-based methods rely on marginal SNP statistics and do not fully exploit the dependence patterns among SNPs within pathways.
We use a sparse regression model, with SNPs grouped into pathways, to identify causal pathways associated with a quantitative trait. Notable features of our “pathways group lasso with adaptive weights” (P-GLAW) algorithm include the incorporation of all pathways in a single regression model, an adaptive pathway weighting procedure that accounts for factors biasing pathway selection...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=5575562</comments>
            <pubDate>Fri, 06 Jan 2012 20:00:19 +0100</pubDate>
            <guid isPermaLink="false">5575562</guid>        </item>
        <item>
            <title>MicroRNA Transcription Start Site Prediction with Multi-objective Feature Selection</title>
            <link>http://www.medworm.com/index.php?rid=5575563&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol11%2Fiss1%2Fart6</link>
            <description>MicroRNAs (miRNAs) are non-coding, short (21-23nt) regulators of protein-coding genes that are generally transcribed first into primary miRNA (pri-miR), followed by the generation of precursor miRNA (pre-miR). This finally leads to the production of the mature miRNA. A large amount of information is available on the pre- and mature miRNAs. However, very little is known about the pri-miRs, due to a lack of knowledge about their transcription start sites (TSSs). Based on the genomic loci, miRNAs can be categorized into two types —intragenic (intra-miR) and intergenic (inter-miR). While it is already an established fact that intra-miRs are commonly transcribed in conjunction with their host genes, the transcription machinery of inter-miRs is poorly understood. Although it is assumed that mi...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=5575563</comments>
            <pubDate>Fri, 06 Jan 2012 20:00:13 +0100</pubDate>
            <guid isPermaLink="false">5575563</guid>        </item>
        <item>
            <title>A Context Dependent Pair Hidden Markov Model for Statistical Alignment</title>
            <link>http://www.medworm.com/index.php?rid=5575564&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol11%2Fiss1%2Fart5</link>
            <description>This article proposes a novel approach to statistical alignment of nucleotide sequences by introducing a context dependent structure on the substitution process in the underlying evolutionary model. We propose to estimate alignments and context dependent mutation rates relying on the observation of two homologous sequences. The procedure is based on a generalized pair-hidden Markov structure, where conditional on the alignment path, the nucleotide sequences follow a Markov distribution. We use a stochastic approximation expectation maximization (saem) algorithm to give accurate estimators of parameters and alignments. We provide results both on simulated data and vertebrate genomes, which are known to have a high mutation rate from CG dinucleotide. In particular, we establish that the meth...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=5575564</comments>
            <pubDate>Fri, 06 Jan 2012 20:00:07 +0100</pubDate>
            <guid isPermaLink="false">5575564</guid>        </item>
        <item>
            <title>Fast Wavelet Based Functional Models for Transcriptome Analysis with Tiling Arrays</title>
            <link>http://www.medworm.com/index.php?rid=5575565&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol11%2Fiss1%2Fart4</link>
            <description>For a better understanding of the biology of an organism, a complete description is needed of all regions of the genome that are actively transcribed. Tiling arrays are used for this purpose. They allow for the discovery of novel transcripts and the assessment of differential expression between two or more experimental conditions such as genotype, treatment, tissue, etc. In tiling array literature, many efforts are devoted to transcript discovery, whereas more recent developments also focus on differential expression. To our knowledge, however, no methods for tiling arrays have been described that can simultaneously assess transcript discovery and identify differentially expressed transcripts. In this paper, we adopt wavelet based functional models to the context of tiling arrays. The high...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=5575565</comments>
            <pubDate>Fri, 06 Jan 2012 19:59:55 +0100</pubDate>
            <guid isPermaLink="false">5575565</guid>        </item>
        <item>
            <title>Transcriptional Network Inference from Functional Similarity and Expression Data: A Global Supervised Approach</title>
            <link>http://www.medworm.com/index.php?rid=5575566&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol11%2Fiss1%2Fart2</link>
            <description>An important challenge in system biology is the inference of biological networks from postgenomic data. Among these biological networks, a gene transcriptional regulatory network focuses on interactions existing between transcription factors (TFs) and and their corresponding target genes. A large number of reverse engineering algorithms were proposed to infer such networks from gene expression profiles, but most current methods have relatively low predictive performances. In this paper, we introduce the novel TNIFSED method (Transcriptional Network Inference from Functional Similarity and Expression Data), that infers a transcriptional network from the integration of correlations and partial correlations of gene expression profiles and gene functional similarities through a supervised clas...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=5575566</comments>
            <pubDate>Fri, 06 Jan 2012 19:45:31 +0100</pubDate>
            <guid isPermaLink="false">5575566</guid>        </item>
        <item>
            <title>Improving Hidden Markov Models for Classification of Human Immunodeficiency Virus-1 Subtypes through Linear Classifier Learning</title>
            <link>http://www.medworm.com/index.php?rid=5575567&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol11%2Fiss1%2Fart1</link>
            <description>Profile Hidden Markov Models (pHMMs) are widely used to model nucleotide or protein sequence families. In many applications, a sequence family classified into several subfamilies is given and each subfamily is modeled separately by one pHMM. A major drawback of this approach is the difficulty of coping with subfamilies composed of very few sequences.
Correct subtyping of human immunodeficiency virus-1 (HIV-1) sequences is one of the most crucial bioinformatic tasks affected by this problem of small subfamilies, i.e., HIV-1 subtypes with a small number of known sequences. To deal with small samples for particular subfamilies of HIV-1, we employ a machine learning approach. More precisely, we make use of an existing HMM architecture and its associated inference engine, while replacing the un...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=5575567</comments>
            <pubDate>Fri, 06 Jan 2012 19:45:26 +0100</pubDate>
            <guid isPermaLink="false">5575567</guid>        </item>
        <item>
            <title>False Discovery Rate Estimation for Stability Selection: Application to Genome-Wide Association Studies</title>
            <link>http://www.medworm.com/index.php?rid=5453128&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart55</link>
            <description>Stability Selection, which combines penalized regression with subsampling, is a promising algorithm to perform variable selection in ultra high dimension. This work is motivated by its evaluation in the context of genome-wide association studies (GWAS). One critical aspect for its use lies in the choice of a decision rule that accounts for the massive number of comparisons realised. The current decision rule relies on the control of the Family Wise Error Rate (FWER) by means of an upper bound derived theoretically. Alternatively, we propose to set the detection threshold according to the more liberal false discovery rate (FDR) criterion. The procedure we propose for its estimation relies on permutations. This procedure is evaluated by simulations according to several scenarios mimicking va...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=5453128</comments>
            <pubDate>Mon, 28 Nov 2011 17:39:03 +0100</pubDate>
            <guid isPermaLink="false">5453128</guid>        </item>
        <item>
            <title>A Calibrated Multiclass Extension of AdaBoost</title>
            <link>http://www.medworm.com/index.php?rid=5436186&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart54</link>
            <description>AdaBoost is a popular and successful data mining technique for binary classification. However, there is no universally agreed upon extension of the method for problems with more than two classes. Most multiclass generalizations simply reduce the problem to a series of binary classification problems. The statistical interpretation of AdaBoost is that it operates through loss-based estimation: by using an exponential loss function as a surrogate for misclassification loss, it sequentially minimizes empirical risk through fitting a base classifier to iteratively reweighted training data. While there are several extensions using loss-based estimation with multiclass base classifiers, these use multiclass versions of the exponential loss that are not classification calibrated: unless restrictio...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=5436186</comments>
            <pubDate>Sun, 20 Nov 2011 23:35:09 +0100</pubDate>
            <guid isPermaLink="false">5436186</guid>        </item>
        <item>
            <title>Multiscale Characterization of Signaling Network Dynamics through Features</title>
            <link>http://www.medworm.com/index.php?rid=5436187&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart53</link>
            <description>We present a multiscale stochastic approach to deal with protein interactions involved in a well-known signaling network, and show that based on some topological network features, it is possible to identify timescales (or resolutions) that characterize complex pathways. (Source: Statistical Applications in Genetics and Molecular Biology)</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=5436187</comments>
            <pubDate>Sun, 20 Nov 2011 23:35:02 +0100</pubDate>
            <guid isPermaLink="false">5436187</guid>        </item>
        <item>
            <title>Modeling Read Counts for CNV Detection in Exome Sequencing Data</title>
            <link>http://www.medworm.com/index.php?rid=5394554&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart52</link>
            <description>We present a hidden Markov model for detecting CNVs from raw read count data, using background read depth from a control set as well as other positional covariates such as GC-content. The model, exomeCopy, is applied to a large chromosome X exome sequencing project identifying a list of large unique CNVs. CNVs predicted by the model and experimentally validated are then recovered using a cross-platform control set from publicly available exome sequencing data. Simulations show high sensitivity for detecting heterozygous and homozygous CNVs, outperforming normalization and state-of-the-art segmentation methods. (Source: Statistical Applications in Genetics and Molecular Biology)</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=5394554</comments>
            <pubDate>Tue, 08 Nov 2011 19:56:45 +0100</pubDate>
            <guid isPermaLink="false">5394554</guid>        </item>
        <item>
            <title>Multiple Testing in Candidate Gene Situations: A Comparison of Classical, Discrete, and Resampling-Based Procedures</title>
            <link>http://www.medworm.com/index.php?rid=5372244&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart51</link>
            <description>In candidate gene association studies, usually several elementary hypotheses are tested simultaneously using one particular set of data. The data normally consist of partly correlated SNP information. Every SNP can be tested for association with the disease, e.g., using the Cochran-Armitage test for trend. To account for the multiplicity of the test situation, different types of multiple testing procedures have been proposed. The question arises whether procedures taking into account the discreteness of the situation show a benefit especially in case of correlated data. We empirically evaluate several different multiple testing procedures via simulation studies using simulated correlated SNP data. We analyze FDR and FWER controlling procedures, special procedures for discrete situations, a...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=5372244</comments>
            <pubDate>Thu, 03 Nov 2011 06:47:27 +0100</pubDate>
            <guid isPermaLink="false">5372244</guid>        </item>
        <item>
            <title>Unsupervised Classification for Tiling Arrays: ChIP-chip and Transcriptome</title>
            <link>http://www.medworm.com/index.php?rid=5372245&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart50</link>
            <description>Tiling arrays make possible a large-scale exploration of the genome thanks to probes which cover the whole genome with very high density, up to 2,000,000 probes. Biological questions usually addressed are either the expression difference between two conditions or the detection of transcribed regions. In this work, we propose to consider both questions simultaneously as an unsupervised classification problem by modeling the joint distribution of the two conditions. In contrast to previous methods, we account for all available information on the probes as well as biological knowledge such as annotation and spatial dependence between probes. Since probes are not biologically relevant units, we propose a classification rule for non-connected regions covered by several probes. Applications to t...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=5372245</comments>
            <pubDate>Wed, 02 Nov 2011 00:50:07 +0100</pubDate>
            <guid isPermaLink="false">5372245</guid>        </item>
        <item>
            <title>Bayesian Learning from Marginal Data in Bionetwork Models</title>
            <link>http://www.medworm.com/index.php?rid=5372246&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart49</link>
            <description>We present a Bayesian computational strategy coupled with a novel approach to summarizing and numerically characterizing biological phenotypes that are represented in terms of the resulting sample distributions of cellular markers. We build on Bayesian simulation methods and mixture modeling to define the approach to linking mechanistic mathematical models of network dynamics to snapshot data, using a toggle switch example integrating simulated and real data as context. (Source: Statistical Applications in Genetics and Molecular Biology)</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=5372246</comments>
            <pubDate>Thu, 27 Oct 2011 23:28:28 +0100</pubDate>
            <guid isPermaLink="false">5372246</guid>        </item>
        <item>
            <title>Adaptive Elastic-Net Sparse Principal Component Analysis for Pathway Association Testing</title>
            <link>http://www.medworm.com/index.php?rid=5353747&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart48</link>
            <description>Pathway or gene set analysis has become an increasingly popular approach for analyzing high-throughput biological experiments such as microarray gene expression studies. The purpose of pathway analysis is to identify differentially expressed pathways associated with outcomes. Important challenges in pathway analysis are selecting a subset of genes contributing most to association with clinical phenotypes and conducting statistical tests of association for the pathways efficiently. We propose a two-stage analysis strategy: (1) extract latent variables representing activities within each pathway using a dimension reduction approach based on adaptive elastic-net sparse principal component analysis; (2) integrate the latent variables with the regression modeling framework to analyze studies wi...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=5353747</comments>
            <pubDate>Tue, 25 Oct 2011 00:23:08 +0100</pubDate>
            <guid isPermaLink="false">5353747</guid>        </item>
        <item>
            <title>Fitting Boolean Networks from Steady State Perturbation Data</title>
            <link>http://www.medworm.com/index.php?rid=5291485&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart47</link>
            <description>Gene perturbation experiments are commonly used for the reconstruction of gene regulatory networks. Typical experimental methodology imposes persistent changes on the network. The resulting data must therefore be interpreted as a steady state from an altered gene regulatory network, rather than a direct observation of the original network. In this article an implicit modeling methodology is proposed in which the unperturbed network of interest is scored by first modeling the persistent perturbation, then predicting the steady state, which may then be compared to the observed data. This results in a many-to-one inverse problem, so a computational Bayesian approach is used to assess model uncertainty.
The methodology is first demonstrated on a number of synthetic networks. It is shown that t...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=5291485</comments>
            <pubDate>Thu, 06 Oct 2011 01:21:51 +0100</pubDate>
            <guid isPermaLink="false">5291485</guid>        </item>
        <item>
            <title>Genetic Linkage Analysis in the Presence of Germline Mosaicism</title>
            <link>http://www.medworm.com/index.php?rid=5291486&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart46</link>
            <description>Germline mosaicism is a genetic condition in which some germ cells of an individual contain a mutation. This condition violates the assumptions underlying classic genetic analysis and may lead to failure of such analysis. In this work we extend the statistical model used for genetic linkage analysis in order to incorporate germline mosaicism. We develop a likelihood ratio test for detecting whether a genetic trait has been introduced into a pedigree by germline mosaicism. We analyze the statistical properties of this test and evaluate its performance via computer simulations. We demonstrate that genetic linkage analysis has high power to identify linkage in the presence of germline mosaicism when our extended model is used. We further use this extended model to provide solid statistical ev...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=5291486</comments>
            <pubDate>Wed, 05 Oct 2011 01:14:13 +0100</pubDate>
            <guid isPermaLink="false">5291486</guid>        </item>
        <item>
            <title>Choice of Summary Statistic Weights in Approximate Bayesian Computation</title>
            <link>http://www.medworm.com/index.php?rid=5266919&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart45</link>
            <description>In this paper, we develop a Genetic Algorithm that can address the fundamental problem of how one should weight the summary statistics included in an approximate Bayesian computation analysis built around an accept/reject algorithm, and how one might choose the tolerance for that analysis. We then demonstrate that using weighted statistics, and a well-chosen tolerance, in such an approximate Bayesian computation approach can result in improved performance, when compared to unweighted analyses, using one example drawn purely from statistics and two drawn from the estimation of population genetics parameters. (Source: Statistical Applications in Genetics and Molecular Biology)</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=5266919</comments>
            <pubDate>Tue, 27 Sep 2011 18:06:09 +0100</pubDate>
            <guid isPermaLink="false">5266919</guid>        </item>
        <item>
            <title>Assessing Modularity Using a Random Matrix Theory Approach</title>
            <link>http://www.medworm.com/index.php?rid=5256177&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart44</link>
            <description>Random matrix theory (RMT) is well suited to describing the emergent properties of systems with complex interactions amongst their constituents through their eigenvalue spectrums. Some RMT results are applied to the problem of clustering high dimensional biological data with complex dependence structure amongst the variables. It will be shown that a gene relevance or correlation network can be constructed by choosing a correlation threshold in a principled way, such that it corresponds to a block diagonal structure in the correlation matrix, if such a structure exists. The structure is then found using community detection algorithms, but with parameter choice guided by RMT predictions. The resulting clustering is compared to a variety of hierarchical clustering outputs and is found to the ...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=5256177</comments>
            <pubDate>Mon, 26 Sep 2011 18:15:49 +0100</pubDate>
            <guid isPermaLink="false">5256177</guid>        </item>
        <item>
            <title>Determining Coding CpG Islands by Identifying Regions Significant for Pattern Statistics on Markov Chains</title>
            <link>http://www.medworm.com/index.php?rid=5256178&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart43</link>
            <description>We present a method for ﬁnding CCGIs that showcases a novel approach we have developed for identifying regions of interest that are signiﬁcant (with respect to a Markov chain) for the counts of any pattern. Our method begins with the exact computation of tail probabilities for the number of CpGs in all regions contained in coding exons, and then applies a greedy algorithm for selecting islands from among the regions. We show that the greedy algorithm provably optimizes a biologically motivated criterion for selecting islands while controlling the false discovery rate.
We applied this approach to the human genome (hg18) and annotated CpG islands in coding exons. The statistical criterion we apply to evaluating islands reduces the number of false positives in existing annotations, while ...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=5256178</comments>
            <pubDate>Fri, 23 Sep 2011 16:25:31 +0100</pubDate>
            <guid isPermaLink="false">5256178</guid>        </item>
        <item>
            <title>Fully Moderated T-statistic for Small Sample Size Gene Expression Arrays</title>
            <link>http://www.medworm.com/index.php?rid=5231366&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart42</link>
            <description>Gene expression microarray experiments with few replications lead to great variability in estimates of gene variances. Several Bayesian methods have been developed to reduce this variability and to increase power. Thus far, moderated t methods assumed a constant coefficient of variation (CV) for the gene variances. We provide evidence against this assumption, and extend the method by allowing the CV to vary with gene expression. Our CV varying method, which we refer to as the fully moderated t-statistic, was compared to three other methods (ordinary t, and two moderated t predecessors). A simulation study and a familiar spike-in data set were used to assess the performance of the testing methods. The results showed that our CV varying method had higher power than the other three methods, i...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=5231366</comments>
            <pubDate>Thu, 15 Sep 2011 16:13:59 +0100</pubDate>
            <guid isPermaLink="false">5231366</guid>        </item>
        <item>
            <title>A Modified Maximum Contrast Method for Unequal Sample Sizes in Pharmacogenomic Studies</title>
            <link>http://www.medworm.com/index.php?rid=5202251&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart41</link>
            <description>In pharmacogenomic studies, biomedical researchers commonly analyze the association between genotype and biological response by using the Kruskal-Wallis test or one-way analysis of variance (ANOVA) after logarithmic transformation of the obtained data. However, because these methods detect unexpected biological response patterns, the power for detecting the expected pattern is reduced. Previously, we proposed a combination of the maximum contrast method and the permuted modified maximum contrast method for unequal sample size in pharmacogenomic studies. However, we noted that the distribution of the permuted modified maximum contrast statistic depends on nuisance parameter σ2, which is the population variance. In this paper, we propose a modified maximum contrast method with a statistic t...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=5202251</comments>
            <pubDate>Wed, 07 Sep 2011 13:58:43 +0100</pubDate>
            <guid isPermaLink="false">5202251</guid>        </item>
        <item>
            <title>MA-SNP — A New Genotype Calling Method for Oligonucleotide SNP Arrays Modeling the Batch Effect with a Normal Mixture Model</title>
            <link>http://www.medworm.com/index.php?rid=5178661&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart40</link>
            <description>In this study, we develop a SNP-specific genotype calling algorithm based on the probe intensity composite representation (PICR) model, while using a normal mixture model to account for the variability of batch effect on the genotype calls. We demonstrate our method with SNP array data in a few studies, including the HapMap project, the coronary heart disease and the UK Blood Service Control studies by the Wellcome Trust Case-Control Consortium, and a methylation profiling study. Our single array based approach outperforms PICR and is comparable to the best multi-array genotype calling methods. (Source: Statistical Applications in Genetics and Molecular Biology)</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=5178661</comments>
            <pubDate>Tue, 30 Aug 2011 19:43:35 +0100</pubDate>
            <guid isPermaLink="false">5178661</guid>        </item>
        <item>
            <title>Weighted Lasso with Data Integration</title>
            <link>http://www.medworm.com/index.php?rid=5173434&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart39</link>
            <description>The lasso is one of the most commonly used methods for high-dimensional regression, but can be unstable and lacks satisfactory asymptotic properties for variable selection. We propose to use weighted lasso with integrated relevant external information on the covariates to guide the selection towards more stable results. Weighting the penalties with external information gives each regression coefficient a covariate specific amount of penalization and can improve upon standard methods that do not use such information by borrowing knowledge from the external material. The method is applied to two cancer data sets, with gene expressions as covariates. We find interesting gene signatures, which we are able to validate. We discuss various ideas on how the weights should be defined and illustrate...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=5173434</comments>
            <pubDate>Mon, 29 Aug 2011 16:20:23 +0100</pubDate>
            <guid isPermaLink="false">5173434</guid>        </item>
        <item>
            <title>Entropy Based Genetic Association Tests and Gene-Gene Interaction Tests</title>
            <link>http://www.medworm.com/index.php?rid=5153470&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart38</link>
            <description>In the past few years, several entropy-based tests have been proposed for testing either single SNP association or gene-gene interaction. These tests are mainly based on Shannon entropy and have higher statistical power when compared to standard χ2 tests. In this paper, we extend some of these tests using a more generalized entropy definition, Rényi entropy, where Shannon entropy is a special case of order 1. The order λ (&gt;0) of Rényi entropy weights the events (genotype/haplotype) according to their probabilities (frequencies). Higher λ places more emphasis on higher probability events while smaller λ (close to 0) tends to assign weights more equally. Thus, by properly choosing the λ, one can potentially increase the power of the tests or the p-value level of significance. We condu...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=5153470</comments>
            <pubDate>Tue, 23 Aug 2011 00:40:54 +0100</pubDate>
            <guid isPermaLink="false">5153470</guid>        </item>
        <item>
            <title>Smoothing Gene Expression Data with Network Information Improves Consistency of Regulated Genes</title>
            <link>http://www.medworm.com/index.php?rid=5115646&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart37</link>
            <description>Gene set analysis methods have become a widely used tool for including prior biological knowledge in the statistical analysis of gene expression data. Advantages of these methods include increased sensitivity, easier interpretation and more conformity in the results. However, gene set methods do not employ all the available information about gene relations. Genes are arranged in complex networks where the network distances contain detailed information about inter-gene dependencies. We propose a method that uses gene networks to smooth gene expression data with the aim of reducing the number of false positives and identify important subnetworks. Gene dependencies are extracted from the network topology and are used to smooth genewise test statistics. To find the optimal degree of smoothing,...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=5115646</comments>
            <pubDate>Tue, 09 Aug 2011 15:46:17 +0100</pubDate>
            <guid isPermaLink="false">5115646</guid>        </item>
        <item>
            <title>Surveying the Manifold Divergence of an Entire Protein Class for Statistical Clues to Underlying Biochemical Mechanisms</title>
            <link>http://www.medworm.com/index.php?rid=5101792&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart36</link>
            <description>Certain residues have no known function yet are co-conserved across distantly related protein families and diverse organisms, suggesting that they perform critical roles associated with as-yet-unidentified molecular properties and mechanisms. This raises the question of how to obtain additional clues regarding these mysterious biochemical phenomena with a view to formulating experimentally testable hypotheses. One approach is to access the implicit biochemical information encoded within the vast amount of genomic sequence data now becoming available. Here, a new Gibbs sampling strategy is formulated and implemented that can partition hundreds of thousands of sequences within a major protein class into multiple, functionally-divergent categories based on those pattern residues that best dis...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=5101792</comments>
            <pubDate>Thu, 04 Aug 2011 21:44:25 +0100</pubDate>
            <guid isPermaLink="false">5101792</guid>        </item>
        <item>
            <title>Measurement of Evidence and Evidence of Measurement</title>
            <link>http://www.medworm.com/index.php?rid=5046616&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart35</link>
            <description>One important use of statistical methods in application to biological data is measurement of evidence, or assessment of the degree to which data support one or another hypothesis. While there is a small literature on this topic, it seems safe to say that consensus has not yet been reached regarding how best, or most accurately, to measure statistical evidence. Here, we propose considering the problem as a measurement problem, rather than as a statistical problem per se, and we explore the consequences of this shift in perspective. Our arguments here are part of an ongoing research program focused on exploiting deep parallelisms between foundations of thermodynamics and foundations of “evidentialism,” in order to derive an absolute scale for the measurement of evidence, a general framew...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=5046616</comments>
            <pubDate>Wed, 20 Jul 2011 22:16:26 +0100</pubDate>
            <guid isPermaLink="false">5046616</guid>        </item>
        <item>
            <title>High-Dimensional Regression and Variable Selection Using CAR Scores</title>
            <link>http://www.medworm.com/index.php?rid=5046617&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart34</link>
            <description>Variable selection is a difficult problem that is particularly challenging in the analysis of high-dimensional genomic data. Here, we introduce the CAR score, a novel and highly effective criterion for variable ranking in linear regression based on Mahalanobis-decorrelation of the explanatory variables. The CAR score provides a canonical ordering that encourages grouping of correlated predictors and down-weights antagonistic variables. It decomposes the proportion of variance explained and it is an intermediate between marginal correlation and the standardized regression coefficient. As a population quantity, any preferred inference scheme can be applied for its estimation. Using simulations, we demonstrate that variable selection by CAR scores is very effective and yields prediction error...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=5046617</comments>
            <pubDate>Tue, 19 Jul 2011 02:08:08 +0100</pubDate>
            <guid isPermaLink="false">5046617</guid>        </item>
        <item>
            <title>Deviance Information Criteria for Model Selection in Approximate Bayesian Computation</title>
            <link>http://www.medworm.com/index.php?rid=5025763&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart33</link>
            <description>Approximate Bayesian computation (ABC) is a class of algorithmic methods in Bayesian inference using statistical summaries and computer simulations. ABC has become popular in evolutionary genetics and in other branches of biology. However, model selection under ABC algorithms has been a subject of intense debate during the recent years. Here, we propose novel approaches to model selection based on posterior predictive distributions and approximations of the deviance. We argue that this framework can settle some contradictions between the computation of model probabilities and posterior predictive checks using ABC posterior distributions. A simulation study and an analysis of a resequencing data set of human DNA show that the deviance criteria lead to sensible results in a number of model c...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=5025763</comments>
            <pubDate>Wed, 13 Jul 2011 00:11:02 +0100</pubDate>
            <guid isPermaLink="false">5025763</guid>        </item>
        <item>
            <title>Random Forests for Genetic Association Studies</title>
            <link>http://www.medworm.com/index.php?rid=5025764&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart32</link>
            <description>The Random Forests (RF) algorithm has become a commonly used machine learning algorithm for genetic association studies. It is well suited for genetic applications since it is both computationally efficient and models genetic causal mechanisms well. With its growing ubiquity, there has been inconsistent and less than optimal use of RF in the literature. The purpose of this review is to breakdown the theoretical and statistical basis of RF so that practitioners are able to apply it in their work. An emphasis is placed on showing how the various components contribute to bias and variance, as well as discussing variable importance measures. Applications specific to genetic studies are highlighted. To provide context, RF is compared to other commonly used machine learning algorithms. (Source: ...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=5025764</comments>
            <pubDate>Wed, 13 Jul 2011 00:10:54 +0100</pubDate>
            <guid isPermaLink="false">5025764</guid>        </item>
        <item>
            <title>Comparison of Clinical Subgroup aCGH Profiles through Pseudolikelihood Ratio Tests</title>
            <link>http://www.medworm.com/index.php?rid=5004740&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart31</link>
            <description>Array-based Comparative Genomic Hybridization (aCGH) is a microarray-based technology that assists in identification of DNA sequence copy number changes across the genome. Examination of differences in instability phenotype, or pattern of copy number alterations, between cancer subtypes can aid in classification of cancers and lead to better understanding of the underlying cytogenic mechanism. Instability phenotypes are composed of a variety of copy number alteration features including height or magnitude of copy number alteration level, frequency of transition between copy number states such as gain and loss, and total number of altered clones or probes. That is, instability phenotype is multivariate in nature. Current methods of instability phenotype assessment, however, are limited to u...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=5004740</comments>
            <pubDate>Thu, 07 Jul 2011 16:46:06 +0100</pubDate>
            <guid isPermaLink="false">5004740</guid>        </item>
        <item>
            <title>Sparse Canonical Covariance Analysis for High-throughput Data</title>
            <link>http://www.medworm.com/index.php?rid=5004741&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart30</link>
            <description>Canonical covariance analysis (CCA) has gained popularity as a method for the analysis of two sets of high-dimensional genomic data. However, it is often difficult to interpret the results because canonical vectors are linear combinations of all variables, and the coefficients are typically nonzero. Several sparse CCA methods have recently been proposed for reducing the number of nonzero coefficients, but these existing methods are not satisfactory because they still give too many nonzero coefficients. In this paper, we propose a new random-effect model approach for sparse CCA; the proposed algorithm can adapt arbitrary penalty functions to CCA without much computational demands. Through simulation studies, we compare various penalty functions in terms of the performance of correct model i...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=5004741</comments>
            <pubDate>Wed, 06 Jul 2011 16:30:26 +0100</pubDate>
            <guid isPermaLink="false">5004741</guid>        </item>
        <item>
            <title>Multiple Imputation of Missing Phenotype Data for QTL Mapping</title>
            <link>http://www.medworm.com/index.php?rid=4966389&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart29</link>
            <description>Missing phenotype data can be a major hurdle to mapping quantitative trait loci (QTL). Though in many cases experiments may be designed to minimize the occurrence of missing data, it is often unavoidable in practice; thus, statistical methods to account for missing data are needed. In this paper we describe an approach for conjoining multiple imputation and QTL mapping. Methods are applied to map genes associated with increased breathing effort in mice after lung inflammation due to allergen challenge in developing lines of the Collaborative Cross, a new mouse genetics resource. Missing data poses a particular challenge in this study because the desired phenotype summary to be mapped is a function of incompletely observed dose-response curves. Comparison of the multiple imputation approach...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=4966389</comments>
            <pubDate>Fri, 24 Jun 2011 02:52:14 +0100</pubDate>
            <guid isPermaLink="false">4966389</guid>        </item>
        <item>
            <title>The Joint Null Criterion for Multiple Hypothesis Tests</title>
            <link>http://www.medworm.com/index.php?rid=4890366&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart28</link>
            <description>Simultaneously performing many hypothesis tests is a problem commonly encountered in high-dimensional biology. In this setting, a large set of p-values is calculated from many related features measured simultaneously. Classical statistics provides a criterion for defining what a “correct” p-value is when performing a single hypothesis test. We show here that even when each p-value is marginally correct under this single hypothesis criterion, it may be the case that the joint behavior of the entire set of p-values is problematic. On the other hand, there are cases where each p-value is marginally incorrect, yet the joint distribution of the set of p-values is satisfactory. Here, we propose a criterion defining a well behaved set of simultaneously calculated p-values that provides precis...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=4890366</comments>
            <pubDate>Wed, 01 Jun 2011 10:00:33 +0100</pubDate>
            <guid isPermaLink="false">4890366</guid>        </item>
        <item>
            <title>Quantifying the Relative Contribution of the Heterozygous Class to QTL Detection Power</title>
            <link>http://www.medworm.com/index.php?rid=4838167&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart27</link>
            <description>Basic statistical theory implies that genotypic class cardinalities play a strong role in determining power to detect QTL, but the classes do not contribute equal information to the model. For example, while it is generally accepted that homozygotes contribute more to the detection of additive effects, heterozygotes are necessary to detect dominance effects. The literature on QTL detection often mentions the importance of genotypic class sizes in passing (Belknap (1998); Belknap et al. (1996); Jin et al. (2004); Kliebenstein (2007); Kao (2006); Martinez et al. (2002)), but no rigorous study of their relative values appears to exist. The purpose of this paper is to quantify the relative contribution of the heterozygous class. Researchers can use these results in evaluating the tradeoff betw...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=4838167</comments>
            <pubDate>Wed, 18 May 2011 16:06:34 +0100</pubDate>
            <guid isPermaLink="false">4838167</guid>        </item>
        <item>
            <title>A Two-Stage Poisson Model for Testing RNA-Seq Data</title>
            <link>http://www.medworm.com/index.php?rid=4838168&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart26</link>
            <description>RNA sequencing technology is providing data of unprecedented throughput, resolution, and accuracy. Although there are many different computational tools for processing these data, there are a limited number of statistical methods for analyzing them, and even fewer that acknowledge the unique nature of individual gene transcription. We introduce a simple and powerful statistical approach, based on a two-stage Poisson model, for modeling RNA sequencing data and testing for biologically important changes in gene expression. The advantages of this approach are demonstrated through simulations and real data applications. (Source: Statistical Applications in Genetics and Molecular Biology)</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=4838168</comments>
            <pubDate>Mon, 16 May 2011 18:58:42 +0100</pubDate>
            <guid isPermaLink="false">4838168</guid>        </item>
        <item>
            <title>Inferring Gene Networks using Robust Statistical Techniques</title>
            <link>http://www.medworm.com/index.php?rid=4826982&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart25</link>
            <description>Inference of gene networks is an important step in understanding cellular dynamics. In this work, a novel algorithm is proposed for inferring gene networks from gene expression data using linear ordinary differential equations. Under the proposed method, a combination of known statistical tools including partial least squares (PLS), leave-one-out jackknifing, and the Akaike information criterion (AIC) are used for robust estimation of gene connectivity matrix. The proposed approach is tested and validated using a computer simulated gene network model and an experimental data on a nine gene network in Eschericia coli. (Source: Statistical Applications in Genetics and Molecular Biology)</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=4826982</comments>
            <pubDate>Fri, 13 May 2011 21:21:04 +0100</pubDate>
            <guid isPermaLink="false">4826982</guid>        </item>
        <item>
            <title>The NBP Negative Binomial Model for Assessing Differential Gene Expression from RNA-Seq</title>
            <link>http://www.medworm.com/index.php?rid=4818705&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart24</link>
            <description>We propose a new statistical test for assessing differential gene expression using RNA sequencing (RNA-Seq) data. Commonly used probability distributions, such as binomial or Poisson, cannot appropriately model the count variability in RNA-Seq data due to overdispersion. The small sample size that is typical in this type of data also prevents the uncritical use of tools derived from large-sample asymptotic theory. The test we propose is based on the NBP parameterization of the negative binomial distribution. It extends an exact test proposed by Robinson and Smyth (2007, 2008). In one version of Robinson and Smyth’s test, a constant dispersion parameter is used to model the count variability between biological replicates. We introduce an additional parameter to allow the dispersion parame...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=4818705</comments>
            <pubDate>Thu, 12 May 2011 23:19:47 +0100</pubDate>
            <guid isPermaLink="false">4818705</guid>        </item>
        <item>
            <title>Analyzing Time-Course Microarray Data Using Functional Data Analysis - A Review</title>
            <link>http://www.medworm.com/index.php?rid=4779069&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart23</link>
            <description>Gene expression over time can be viewed as a continuous process and therefore represented as a continuous curve or function. Functional data analysis (FDA) is a statistical methodology used to analyze functional data that has become increasingly popular in the analysis of time-course gene expression data. Several FDA techniques have been applied to gene expression profiles including functional regression analysis (to describe the relationship between expression profiles and other covariate(s)), functional discriminant analysis (to discriminate and classify groups of genes) and functional principal components analysis (for dimension reduction and clustering). This paper reviews the use of FDA and its associated methods to analyze time-course microarray gene expression data. (Source: Statist...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=4779069</comments>
            <pubDate>Wed, 04 May 2011 00:04:49 +0100</pubDate>
            <guid isPermaLink="false">4779069</guid>        </item>
        <item>
            <title>Disequilibrium Coefficient: A Bayesian Perspective</title>
            <link>http://www.medworm.com/index.php?rid=4779070&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart22</link>
            <description>Hardy-Weinberg Equilibrium (HWE) is an important genetic property that populations should have whenever they are not observing adverse situations as complete lack of panmixia, excess of mutations, excess of selection pressure, etc. HWE for decades has been evaluated; both frequentist and Bayesian methods are in use today. While historically the HWE formula was developed to examine the transmission of alleles in a population from one generation to the next, use of HWE concepts has expanded in human diseases studies to detect genotyping error and disease susceptibility (association); Ryckman and Williams (2008). Most analyses focus on trying to answer the question of whether a population is in HWE. They do not try to quantify how far from the equilibrium the population is. In this paper, we ...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=4779070</comments>
            <pubDate>Tue, 03 May 2011 17:22:45 +0100</pubDate>
            <guid isPermaLink="false">4779070</guid>        </item>
        <item>
            <title>Performance of Matrix Representation with Parsimony for Inferring Species from Gene Trees</title>
            <link>http://www.medworm.com/index.php?rid=4774383&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart21</link>
            <description>Phylogenomic datasets often contain sequence alignments on different subsets of taxa for different genes. A major goal of phylogenetics is often to combine estimated gene trees from many loci into an overall estimate of a species tree.  When data are missing for some combinations of genes and taxa, supertree methods can be used to combine gene trees on different subsets of taxa into an overall tree. However, studies of the performance of supertree methods when gene tree conflict is due to incomplete lineage sorting are needed to understand their statistical properties in this setting.
We find that Matrix Representation with Parsimony (MRP), the most commonly used supertree method, can in many cases infer the species tree in spite of high levels of conflict in the input gene trees. However,...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=4774383</comments>
            <pubDate>Tue, 03 May 2011 00:46:32 +0100</pubDate>
            <guid isPermaLink="false">4774383</guid>        </item>
        <item>
            <title>A Non-Parametric Method for Detecting Specificity Determining Sites in Protein Sequence Alignments</title>
            <link>http://www.medworm.com/index.php?rid=4752134&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart20</link>
            <description>Specificity determining sites (SDSs) in alignments of protein sequences are sites at which subfamilies of the aligned sequences have been under differential selective pressure. Identifying SDSs is important because they are key in understanding the functional specificity of each subfamily. Differential selection at an SDS will result in differences between subfamilies in the distribution of amino-acids at that site. However, statistical analysis of such differences is complicated by phylogenetic relationships within each subfamily, which profoundly influence these differences. We develop a non-parametric approach to evaluating purely statistical SDS evidence in a sequence alignment, taking account of phylogeny through a novel tree-respecting randomisation based on the principle of parsimon...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=4752134</comments>
            <pubDate>Tue, 26 Apr 2011 20:37:32 +0100</pubDate>
            <guid isPermaLink="false">4752134</guid>        </item>
        <item>
            <title>Meta-Analysis of Family-Based and Case-Control Genetic Association Studies that Use the Same Cases</title>
            <link>http://www.medworm.com/index.php?rid=4740620&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart19</link>
            <description>In many cases in genetic epidemiology, the investigators in an effort to control for different sources of confounding and simultaneously to increase the power perform a family-based and a population-based case-control study within the same population, using the same or largely overlapping, set of cases. Various methods have been proposed for performing a combined analysis, but they all require access to individual data that are difficult to gather in a meta-analysis. Here, we propose a simple and efficient summary-based method for performing the meta-analysis. The key point, contrary to the methods presented earlier that need individual data, is the calculation of the covariance between the study estimates (log-Odds Ratios), using only data derived from the literature in the form of a 2x2 ...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=4740620</comments>
            <pubDate>Thu, 21 Apr 2011 12:18:49 +0100</pubDate>
            <guid isPermaLink="false">4740620</guid>        </item>
        <item>
            <title>On the Statistical Properties of SGoF Multitesting Method</title>
            <link>http://www.medworm.com/index.php?rid=4740621&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart18</link>
            <description>In this paper we establish the statistical properties of SGoF multitesting method under a mixture model. It is assumed that the available set of p-values is statistically independent. Special attention is paid to the huge dimension problem in which the number of tests goes to infinity. Formulae for the power and the rate of false discoveries/non-discoveries of SGoF are given, so the role of the gamma-parameter of SGoF is understood. The existing connection between SGoF and a test of significance for the proportion of non-true nulls below gamma is explored. This connection suggests a possible modification of SGoF which may improve the power of the method. Simulation studies and a real data illustration are included. (Source: Statistical Applications in Genetics and Molecular Biology)</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=4740621</comments>
            <pubDate>Thu, 21 Apr 2011 12:18:44 +0100</pubDate>
            <guid isPermaLink="false">4740621</guid>        </item>
        <item>
            <title>Imputation Estimators Partially Correct for Model Misspecification</title>
            <link>http://www.medworm.com/index.php?rid=4730849&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart17</link>
            <description>Inference problems with incomplete observations often aim at estimating population properties of unobserved quantities. One simple way to accomplish this estimation is to impute the unobserved quantities of interest at the individual level and then take an empirical average of the imputed values. We show that this simple imputation estimator can provide partial protection against model misspecification. We illustrate imputation estimators’ robustness to model specification on three examples: mixture model-based clustering, estimation of genotype frequencies in population genetics, and estimation of Markovian evolutionary distances. In the final example, using a representative model misspecification, we demonstrate that in non-degenerate cases, the imputation estimator dominates the plug-...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=4730849</comments>
            <pubDate>Tue, 19 Apr 2011 15:26:11 +0100</pubDate>
            <guid isPermaLink="false">4730849</guid>        </item>
        <item>
            <title>A Variance-Components Model for Distance-Matrix Phylogenetic Reconstruction</title>
            <link>http://www.medworm.com/index.php?rid=4657326&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart16</link>
            <description>Phylogenetic trees describe evolutionary relationships between related organisms (taxa). One approach to estimating phylogenetic trees supposes that a matrix of estimated evolutionary distances between taxa is available. Agglomerative methods have been proposed in which closely related taxon-pairs are successively combined to form ancestral taxa. Several of these computationally efficient agglomerative algorithms involve steps to reduce the variance in estimated distances. We propose an agglomerative phylogenetic method which focuses on statistical modeling of variance components in distance estimates. We consider how these variance components evolve during the agglomerative process. Our method simultaneously produces two topologically identical rooted trees, one tree having branch lengths...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=4657326</comments>
            <pubDate>Wed, 30 Mar 2011 19:20:57 +0100</pubDate>
            <guid isPermaLink="false">4657326</guid>        </item>
        <item>
            <title>Application of the Lasso to Expression Quantitative Trait Loci Mapping</title>
            <link>http://www.medworm.com/index.php?rid=4589196&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart15</link>
            <description>Univariate methods have frequently been used to discover Quantitative Trait Loci for gene expression measurements, often with much success. However, correlations caused by Linkage Disequilibrium as well as chance correlations, which are functions of the large number of markers typically used in such studies, mean that causative regions can often cause multiple signals. Traditional investigations into the number of QTL for a given phenotype, such as visual inspection of likelihood plots, are not feasible when considering thousands of phenotypes. Stepwise methods have been suggested to counter this, but these are known to produce unstable models and there are difficulties in deriving significance estimates. The Lasso is a shrinkage method which has often been employed to discover true signal...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=4589196</comments>
            <pubDate>Mon, 14 Mar 2011 20:17:34 +0100</pubDate>
            <guid isPermaLink="false">4589196</guid>        </item>
        <item>
            <title>Exploratory Analysis of Multiple Omics Datasets Using the Adjusted RV Coefficient</title>
            <link>http://www.medworm.com/index.php?rid=4539600&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart14</link>
            <description>The integration of multiple high-dimensional data sets (omics data) has been a very active but challenging area of bioinformatics research in recent years. Various adaptations of non-standard multivariate statistical tools have been suggested that allow to analyze and visualize such data sets simultaneously. However, these methods typically can deal with two data sets only, whereas systems biology experiments often generate larger numbers of high-dimensional data sets. For this reason, we suggest an explorative analysis of similarity between data sets as an initial analysis steps. This analysis is based on the RV coefficient, a matrix correlation, that can be interpreted as a generalization of the squared correlation from two single variables to two sets of variables. It has been shown bef...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=4539600</comments>
            <pubDate>Thu, 03 Mar 2011 02:13:27 +0100</pubDate>
            <guid isPermaLink="false">4539600</guid>        </item>
        <item>
            <title>Linear Combination Test for Hierarchical Gene Set Analysis</title>
            <link>http://www.medworm.com/index.php?rid=4599310&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart13</link>
            <description>Gene-set analysis (GSA) aims to identify sets of differentially expressed genes by a phenotype in DNA microarray studies. Challenges occur due to the salient characteristics of the data: (1) the number of genes is far larger than the number of observations; (2) gene expression measurements, especially within each gene set, can be highly correlated; and (3) the number of gene sets that can be examined is large and increasing rapidly. These challenges call for gene-set testing procedures that have both efficiency in computation for large GSAs and high power in the presence of the high correlation.
We propose a new GSA approach called Linear Combination Test (LCT), incorporating the covariance matrix estimator of gene expression into the test statistic. The proposed LCT and two other GSA meth...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=4599310</comments>
            <pubDate>Tue, 01 Mar 2011 18:38:53 +0100</pubDate>
            <guid isPermaLink="false">4599310</guid>        </item>
        <item>
            <title>Information Metrics in Genetic Epidemiology</title>
            <link>http://www.medworm.com/index.php?rid=4511047&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart12</link>
            <description>Information-theoretic metrics have been proposed for studying gene-gene and gene-environment interactions in genetic epidemiology. Although these metrics have proven very promising, they are typically interpreted in the context of communications and information transmission, diminishing their tangibility for epidemiologists and statisticians. In this paper, we clarify the interpretation of information-theoretic metrics. In particular, we develop the methods so that their relation to the global properties of probability models is made clear and contrast them with log-linear models for multinomial data. Hopefully, a better understanding of their properties and probabilistic implications will promote their acceptance and correct usage in genetic epidemiology. Our novel development also sugges...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=4511047</comments>
            <pubDate>Wed, 23 Feb 2011 20:17:32 +0100</pubDate>
            <guid isPermaLink="false">4511047</guid>        </item>
        <item>
            <title>Interval Estimation of Familial Correlations from Pedigrees</title>
            <link>http://www.medworm.com/index.php?rid=4458753&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart11</link>
            <description>Conclusions regarding the adequacy of the method in terms of bias, absolute bias, variance, and confidence interval coverage probabilities are presented on the basis of results from simulation studies. We determine under what circumstances the nominal 95 percent confidence intervals have excellent average coverage of the true values even for samples of small size and under what circumstances the results must be viewed with caution. We then describe a procedure by which, for both small family and large family structures, we find that the estimates we recommend provide accurate results. (Source: Statistical Applications in Genetics and Molecular Biology)</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=4458753</comments>
            <pubDate>Thu, 10 Feb 2011 19:59:34 +0100</pubDate>
            <guid isPermaLink="false">4458753</guid>        </item>
        <item>
            <title>Large Sample Approximations of Probabilities of Correct Evolutionary Tree Estimation and Biases of Maximum Likelihood Estimation</title>
            <link>http://www.medworm.com/index.php?rid=4449249&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart10</link>
            <description>Simulation studies have been the main way in which properties of maximum likelihood estimation of evolutionary trees from aligned sequence data have been studied. Because trees are unusual parameters and because fitting is computationally intensive, such studies have a heavy computational cost. We develop an asymptotic framework that can be used to obtain probabilities of correct topological reconstruction and study other properties of likelihood methods when a single split is poorly resolved. Simulations suggest that while approximations to log likelihood differences are better for less well-resolved topologies, approximations to probabilities of correct reconstruction are generally good. We used the approximations to investigate biases in estimation and found that maximum likelihood esti...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=4449249</comments>
            <pubDate>Tue, 08 Feb 2011 01:40:00 +0100</pubDate>
            <guid isPermaLink="false">4449249</guid>        </item>
        <item>
            <title>A Robust Statistical Method to Detect Null Alleles in Microsatellite and SNP Datasets in Both Panmictic and Inbred Populations</title>
            <link>http://www.medworm.com/index.php?rid=4435793&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart9</link>
            <description>Null alleles are common technical artifacts in genetic-based analysis. Powerful methods enabling their detection in either panmictic or inbred populations have been proposed. However, none of these methods appears unbiased in both types of mating systems, necessitating a priori knowledge of the inbreeding level of the population under study. To counter this problem, I propose to use the software FDist2 to detect the atypical fixation indices that characterize markers with null alleles. The rational behind this approach and the parameter settings are explained. The power of the method for various sample sizes, degrees of inbreeding and null allele frequencies is evaluated using simulated microsatellite and SNP datasets and then compared to two other null allele detection methods. The result...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=4435793</comments>
            <pubDate>Fri, 04 Feb 2011 17:39:07 +0100</pubDate>
            <guid isPermaLink="false">4435793</guid>        </item>
        <item>
            <title>Log-Linear Modelling of Protein Dipeptide Structure Reveals Interesting Patterns of Side-Chain–Backbone Interactions</title>
            <link>http://www.medworm.com/index.php?rid=4599311&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart8</link>
            <description>It has long been known that the amino-acid sequence of a protein determines its 3-dimensional structure, but accurate ab initio prediction of structure from sequence remains elusive. We gain insight into local protein structure conformation by studying the relationship of dihedral angles in pairs of residues in protein sequences (dipeptides). We adopt a contingency table approach, exploring a targeted set of hypotheses through log-linear modelling to detect patterns of association of these dihedral angles in all dipeptides considered. Our models indicate a substantial association of the side-chain conformation of the first residue with the backbone conformation of the second residue (side-to-back interaction) as well as a weaker but still significant association of the backbone conformatio...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=4599311</comments>
            <pubDate>Tue, 25 Jan 2011 17:39:52 +0100</pubDate>
            <guid isPermaLink="false">4599311</guid>        </item>
        <item>
            <title>Log-Linear Modelling of Protein Dipeptide Structure Reveals Interesting Patterns of Side-ChainâBackbone Interactions</title>
            <link>http://www.medworm.com/index.php?rid=4398210&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart8</link>
            <description>It has long been known that the amino-acid sequence of a protein determines its 3-dimensional structure, but accurate ab initio prediction of structure from sequence remains elusive. We gain insight into local protein structure conformation by studying the relationship of dihedral angles in pairs of residues in protein sequences (dipeptides). We adopt a contingency table approach, exploring a targeted set of hypotheses through log-linear modelling to detect patterns of association of these dihedral angles in all dipeptides considered. Our models indicate a substantial association of the side-chain conformation of the first residue with the backbone conformation of the second residue (side-to-back interaction) as well as a weaker but still significant association of the backbone conformatio...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=4398210</comments>
            <pubDate>Tue, 25 Jan 2011 17:39:52 +0100</pubDate>
            <guid isPermaLink="false">4398210</guid>        </item>
        <item>
            <title>A Three Component Latent Class Model for Robust Semiparametric Gene Discovery</title>
            <link>http://www.medworm.com/index.php?rid=4381450&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart7</link>
            <description>We propose a robust model for discovering differentially expressed genes which directly incorporates biological significance, i.e., effect dimension. Using the so-called c-fold rule, we transform the expressions into a nominal observed random variable with three categories: below a fixed lower threshold, above a fixed upper threshold or within the two thresholds. Gene expression data is then transformed into a nominal variable with three levels possibly originated by three different distributions corresponding to under expressed, not differential, and over expressed genes. This leads to a statistical model for a 3-component mixture of trinomial distributions with suitable constraints on the parameter space. In order to obtain the MLE estimates, we show how to implement a constrained EM alg...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=4381450</comments>
            <pubDate>Fri, 21 Jan 2011 17:40:51 +0100</pubDate>
            <guid isPermaLink="false">4381450</guid>        </item>
        <item>
            <title>Learning from Past Treatments and Their Outcome Improves Prediction of In Vivo Response to Anti-HIV Therapy</title>
            <link>http://www.medworm.com/index.php?rid=4348599&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart6</link>
            <description>Infections with the human immunodeficiency virus type 1 (HIV-1) are treated with combinations of drugs. Unfortunately, HIV responds to the treatment by developing resistance mutations. Consequently, the genome of the viral target proteins is sequenced and inspected for resistance mutations as part of routine diagnostic procedures for ensuring an effective treatment. For predicting response to a combination therapy, currently available computer-based methods rely on the genotype of the virus and the composition of the regimen as input. However, no available tool takes full advantage of the knowledge about the order of and the response to previously prescribed regimens. The resulting high-dimensional feature space makes existing methods difficult to apply in a straightforward fashion. The ma...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=4348599</comments>
            <pubDate>Fri, 14 Jan 2011 18:24:36 +0100</pubDate>
            <guid isPermaLink="false">4348599</guid>        </item>
        <item>
            <title>Accuracy and Computational Efficiency of a Graphical Modeling Approach to Linkage Disequilibrium Estimation</title>
            <link>http://www.medworm.com/index.php?rid=4317232&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart5</link>
            <description>We develop recent work on using graphical models for linkage disequilibrium to provide efficient programs for model fitting, phasing, and imputation of missing data in large data sets. Two important features contribute to the computational efficiency: the separation of the model fitting and phasing-imputation processes into different programs, and holding in memory only the data within a moving window of loci during model fitting. Optimal parameter values were chosen by cross-validation to maximize the probability of correctly imputing masked genotypes. The best accuracy obtained is slightly below than that from the Beagle program of Browning and Browning, and our fitting program is slower. However, for large data sets, it uses less storage. For a reference set of n individuals genotyped a...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=4317232</comments>
            <pubDate>Thu, 06 Jan 2011 18:41:37 +0100</pubDate>
            <guid isPermaLink="false">4317232</guid>        </item>
        <item>
            <title>A Comparison of Multifactor Dimensionality Reduction and L1-Penalized Regression to Identify Gene-Gene Interactions in Genetic Association Studies</title>
            <link>http://www.medworm.com/index.php?rid=4317233&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart4</link>
            <description>In this study, we compare the performance of MDR, the traditional lasso with L1 penalty (TL1), and the group lasso for categorical data with group-wise L1 penalty (GL1) to detect gene-gene interactions through a broad range of simulations.
We find that each method has both advantages and disadvantages, and relative performance is context dependent. TL1 frequently over-fits, identifying false positive as well as true positive loci. MDR has higher power for epistatic models that exhibit independent main effects; for both Lasso methods, main effects tend to dominate. For purely epistatic models, GL1 has the best performance for lower minor allele frequencies, but MDR performs best for higher frequencies. These results provide guidance of when each approach might be best suited for detecting a...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=4317233</comments>
            <pubDate>Thu, 06 Jan 2011 18:41:34 +0100</pubDate>
            <guid isPermaLink="false">4317233</guid>        </item>
        <item>
            <title>Learning Monotonic Genotype-Phenotype Maps</title>
            <link>http://www.medworm.com/index.php?rid=4317234&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart3</link>
            <description>We present efficient algorithms for parameter estimation and model selection. The model is validated using simulated data and applied to HIV drug resistance data. We find that the effect of many resistance mutations is non-linear and depends on the genetic background in which they occur. (Source: Statistical Applications in Genetics and Molecular Biology)</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=4317234</comments>
            <pubDate>Thu, 06 Jan 2011 18:41:31 +0100</pubDate>
            <guid isPermaLink="false">4317234</guid>        </item>
        <item>
            <title>Repeated Measures Semiparametric Regression Using Targeted Maximum Likelihood Methodology with Application to Transcription Factor Activity Discovery</title>
            <link>http://www.medworm.com/index.php?rid=4317235&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart2</link>
            <description>In longitudinal and repeated measures data analysis, often the goal is to determine the effect of a treatment or aspect on a particular outcome (e.g., disease progression). We consider a semiparametric repeated measures regression model, where the parametric component models effect of the variable of interest and any modification by other covariates. The expectation of this parametric component over the other covariates is a measure of variable importance. Here, we present a targeted maximum likelihood estimator of the finite dimensional regression parameter, which is easily estimated using standard software for generalized estimating equations.The targeted maximum likelihood method provides double robust and locally efficient estimates of the variable importance parameters and inference b...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=4317235</comments>
            <pubDate>Thu, 06 Jan 2011 18:41:28 +0100</pubDate>
            <guid isPermaLink="false">4317235</guid>        </item>
        <item>
            <title>A Markov-Chain Model for the Analysis of High-Resolution Enzymatically 18O-Labeled Mass Spectra</title>
            <link>http://www.medworm.com/index.php?rid=4317236&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol10%2Fiss1%2Fart1</link>
            <description>The enzymatic 18O-labeling is a useful quantification technique to account for between-spectrum variability of the results of mass spectrometry experiments. One of the important issues related to the use of the technique is the problem of incomplete labeling of peptide molecules, which may result in biased estimates of the relative peptide abundance. In this manuscript, we propose a Markov-chain model, which takes into account the possibility of incomplete labeling in the estimation of the relative abundance from the observed data. This allows for the use of less precise but faster labeling strategies, which should better fit in the high-throughput proteomic framework. Our method does not require extra experimental steps, as proposed in the approaches developed by Mirgorodskaya et al. (200...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=4317236</comments>
            <pubDate>Thu, 06 Jan 2011 18:41:25 +0100</pubDate>
            <guid isPermaLink="false">4317236</guid>        </item>
        <item>
            <title>Including Probe-Level Measurement Error in Robust Mixture Clustering of Replicated Microarray Gene Expression</title>
            <link>http://www.medworm.com/index.php?rid=4248080&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol9%2Fiss1%2Fart42</link>
            <description>Probabilistic mixture models provide a popular approach to cluster noisy gene expression data for exploring gene function. Since gene expression data obtained from microarray experiments are often associated with significant sources of technical and biological noise, replicated experiments are typically used to deal with data variability, and internal replication (e.g. from multiple probes per gene in an experiment) provides valuable information about technical sources of noise. However, current implementations of mixture models either do not consider the correlation between the replicated measurements for the same experimental condition, or ignore the probe-level measurement error, and thus overlook the rich information about technical noise. Moreover, most current methods use non-robust ...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=4248080</comments>
            <pubDate>Fri, 10 Dec 2010 07:09:06 +0100</pubDate>
            <guid isPermaLink="false">4248080</guid>        </item>
        <item>
            <title>Predicting Patient Survival from Longitudinal Gene Expression</title>
            <link>http://www.medworm.com/index.php?rid=4188769&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol9%2Fiss1%2Fart41</link>
            <description>Characterizing dynamic gene expression pattern and predicting patient outcome is now significant and will be of more interest in the future with large scale clinical investigation of microarrays. However, there is currently no method that has been developed for prediction of patient outcome using longitudinal gene expression, where gene expression of patients is being monitored across time. Here, we propose a novel prediction approach for patient survival time that makes use of time course structure of gene expression. This method is applied to a burn study. The genes involved in the final predictors are enriched in the inflammatory response and immune system related pathways. Moreover, our method is consistently better than prediction methods using individual time point gene expression or...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=4188769</comments>
            <pubDate>Mon, 22 Nov 2010 16:21:34 +0100</pubDate>
            <guid isPermaLink="false">4188769</guid>        </item>
        <item>
            <title>Spatial Clustering of Array CGH Features in Combination with Hierarchical Multiple Testing</title>
            <link>http://www.medworm.com/index.php?rid=4174421&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol9%2Fiss1%2Fart40</link>
            <description>We propose a new approach for clustering DNA features using array CGH data from multiple tumor samples. We distinguish data-collapsing (joining contiguous DNA clones or probes with extremely similar data into regions) from clustering (joining contiguous, correlated regions based on a maximum likelihood principle). The model-based clustering algorithm accounts for the apparent spatial patterns in the data. We evaluate the randomness of the clustering result by a cluster stability score in combination with cross-validation. Moreover, we argue that the clustering really captures spatial genomic dependency by showing that coincidental clustering of independent regions is very unlikely.Using the region and cluster information, we combine testing of these for association with a clinical variable...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=4174421</comments>
            <pubDate>Tue, 16 Nov 2010 18:27:46 +0100</pubDate>
            <guid isPermaLink="false">4174421</guid>        </item>
        <item>
            <title>Permutation P-values Should Never Be Zero: Calculating Exact P-values When Permutations Are Randomly Drawn</title>
            <link>http://www.medworm.com/index.php?rid=4123762&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol9%2Fiss1%2Fart39</link>
            <description>Permutation tests are amongst the most commonly used statistical tools in modern genomic research, a process by which p-values are attached to a test statistic by randomly permuting the sample or gene labels. Yet permutation p-values published in the genomic literature are often computed incorrectly, understated by about 1/m, where m is the number of permutations. The same is often true in the more general situation when Monte Carlo simulation is used to assign p-values. Although the p-value understatement is usually small in absolute terms, the implications can be serious in a multiple testing context. The understatement arises from the intuitive but mistaken idea of using permutation to estimate the tail probability of the test statistic. We argue instead that permutation should be viewe...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=4123762</comments>
            <pubDate>Mon, 01 Nov 2010 04:20:21 +0100</pubDate>
            <guid isPermaLink="false">4123762</guid>        </item>
        <item>
            <title>Regression-Based Multi-Trait QTL Mapping Using a Structural Equation Model</title>
            <link>http://www.medworm.com/index.php?rid=4080866&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol9%2Fiss1%2Fart38</link>
            <description>Quantitative trait loci (QTL) mapping often results in data on a number of traits that have well-established causal relationships. Many multi-trait QTL mapping methods that account for the correlation among multiple traits have been developed to improve the statistical power and the precision of QTL parameter estimation. However, none of these methods are capable of incorporating the causal structure among the traits. Consequently, genetic functions of the QTL may not be fully understood. Structural equation modeling (SEM) allows researchers to explicitly characterize the causal structure among the variables and to decompose effects into direct, indirect, and total effects. In this paper, we developed a multi-trait SEM method of QTL mapping that takes into account the causal relationships ...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=4080866</comments>
            <pubDate>Tue, 19 Oct 2010 16:28:13 +0100</pubDate>
            <guid isPermaLink="false">4080866</guid>        </item>
        <item>
            <title>The Detection of Blur in Affymetrix GeneChips</title>
            <link>http://www.medworm.com/index.php?rid=4080867&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol9%2Fiss1%2Fart37</link>
            <description>High correlations were obtained between probes in seemingly unrelated probe sets, following an examination of the data from thousands of Affymetrix GeneChips. Investigation revealed that these unexpected correlations were between probes that were adjacent to high-valued probes. Using carefully selected probes, together with simple linear models, the extent of blur has been measured for each CEL file.  The cause is shown to be attributable to poorly performing scanners. Blur can result in the doubling of the values of thousands of probes. This in turn can lead to the doubling of the expression level for hundreds of probe sets. (Source: Statistical Applications in Genetics and Molecular Biology)</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=4080867</comments>
            <pubDate>Mon, 18 Oct 2010 16:14:32 +0100</pubDate>
            <guid isPermaLink="false">4080867</guid>        </item>
        <item>
            <title>Optimal Tests Shrinking Both Means and Variances Applicable to Microarray Data Analysis</title>
            <link>http://www.medworm.com/index.php?rid=4024939&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol9%2Fiss1%2Fart36</link>
            <description>As a consequence of the âlarge p small nâ characteristic for microarray data, hypothesis tests based on individual genes often result in low average power. There are several proposed tests that attempt to improve power. Among these, the FS test that was developed using the concept of James-Stein shrinkage to estimate the variances showed a striking average power improvement. In this paper, we establish a framework in which we model the key parameters with a distribution to find an optimal Bayes test which we call the MAP test (where MAP stands for Maximum Average Power). Under this framework, the FS test can be derived as an empirical Bayes test approximating the MAP test corresponding to modeling the variances. By modeling both the means and the variances with a distribution, a ...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=4024939</comments>
            <pubDate>Sat, 02 Oct 2010 10:09:17 +0100</pubDate>
            <guid isPermaLink="false">4024939</guid>        </item>
        <item>
            <title>Assessment of LD Matrix Measures for the Analysis of Biological Pathway Association</title>
            <link>http://www.medworm.com/index.php?rid=4024940&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol9%2Fiss1%2Fart35</link>
            <description>Complex diseases will have multiple functional sites, and it will be invaluable to understand the cross-locus interaction in terms of linkage disequilibrium (LD) between those sites (epistasis) in addition to the haplotype-LD effects. We investigated the statistical properties of a class of matrix-based statistics to assess this epistasis. These statistical methods include two LD contrast tests (Zaykin et al., 2006) and partial least squares regression (Wang et al., 2008). To estimate Type 1 error rates and power, we simulated multiple two-variant disease models using the SIMLA software package. SIMLA allows for the joint action of up to two disease genes in the simulated data with all possible multiplicative interaction effects between them. Our goal was to detect an interaction between m...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=4024940</comments>
            <pubDate>Sat, 02 Oct 2010 10:09:13 +0100</pubDate>
            <guid isPermaLink="false">4024940</guid>        </item>
        <item>
            <title>On Optimal Selection of Summary Statistics for Approximate Bayesian Computation</title>
            <link>http://www.medworm.com/index.php?rid=3937823&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol9%2Fiss1%2Fart34</link>
            <description>How best to summarize large and complex datasets is a problem that arises in many areas of science. We approach it from the point of view of seeking data summaries that minimize the average squared error of the posterior distribution for a parameter of interest under approximate Bayesian computation (ABC). In ABC, simulation under the model replaces computation of the likelihood, which is convenient for many complex models. Simulated and observed datasets are usually compared using summary statistics, typically in practice chosen on the basis of the investigator's intuition and established practice in the field. We propose two algorithms for automated choice of efficient data summaries. Firstly, we motivate minimisation of the estimated entropy of the posterior approximation as a heuristic...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=3937823</comments>
            <pubDate>Mon, 06 Sep 2010 21:24:37 +0100</pubDate>
            <guid isPermaLink="false">3937823</guid>        </item>
        <item>
            <title>On the Optimal Design of Genetic Variant Discovery Studies</title>
            <link>http://www.medworm.com/index.php?rid=3910442&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol9%2Fiss1%2Fart33</link>
            <description>The recent emergence of massively parallel sequencing technologies has enabled an increasing number of human genome re-sequencing studies, notable among them being the 1000 Genomes Project. The main aim of these studies is to identify the yet unknown genetic variants in a genomic region, mostly low frequency variants (frequency less than 5%). We propose here a set of statistical tools that address how to optimally design such studies in order to increase the number of genetic variants we expect to discover. Within this framework, the tradeoff between lower coverage for more individuals and higher coverage for fewer individuals can be naturally solved. The methods here are also useful for estimating the number of genetic variants missed in a discovery study performed at low coverage. We sho...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=3910442</comments>
            <pubDate>Fri, 27 Aug 2010 16:15:43 +0100</pubDate>
            <guid isPermaLink="false">3910442</guid>        </item>
        <item>
            <title>Mapping Quantitative Trait Loci in a Non-Equilibrium Population</title>
            <link>http://www.medworm.com/index.php?rid=3884233&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol9%2Fiss1%2Fart32</link>
            <description>The genetic control of a complex trait can be studied by testing and mapping the genotypes of the underlying quantitative trait loci (QTLs) through their associations with observable marker genotypes. All existing statistical methods for QTL mapping assume an equilibrium population, allowing marker-QTL associations to be simply described at the gametic level. However, many mapping populations in practice may deviate from equilibrium; thus, gametic associations cannot reflect marker-QTL associations at the genotype level. We develop a robust model for mapping QTLs in a non-equilibrium natural population in which individuals are not necessarily randomly mating due to various evolutionary forces. Without use of Hardy-Weinberg equilibrium, the new model founds marker-QTL associations directly ...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=3884233</comments>
            <pubDate>Thu, 19 Aug 2010 23:14:55 +0100</pubDate>
            <guid isPermaLink="false">3884233</guid>        </item>
        <item>
            <title>Granger Causality Analysis of Human Cell-Cycle Gene Expression Profiles</title>
            <link>http://www.medworm.com/index.php?rid=3864334&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol9%2Fiss1%2Fart31</link>
            <description>Granger causality (GC) tests are ideally suited to investigate time series data generated by bivariate vector autoregressive (VAR) processes. Recent studies have applied GC analysis and its extensions for modeling functional relationships and network structure from temporal gene expression profiles. The present study investigates GC analysis of human cell-cycle gene expression profiles that can be modeled as a first-order bivariate VAR. Analytical results presented establish the contribution of the VAR process parameters, including auto-regulatory feedback and noise variance to the mean-squared forecast error, as a critical component in identifying statistically significant GC relationships. These results in turn discourage blind inference of functional relationship between a given pair of...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=3864334</comments>
            <pubDate>Sat, 14 Aug 2010 00:56:48 +0100</pubDate>
            <guid isPermaLink="false">3864334</guid>        </item>
        <item>
            <title>Lasso Logistic Regression, GSoft and the Cyclic Coordinate Descent Algorithm: Application to Gene Expression Data</title>
            <link>http://www.medworm.com/index.php?rid=3860917&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol9%2Fiss1%2Fart30</link>
            <description>Statistical methods generating sparse models are of great value in the gene expression field, where the number of covariates (genes) under study moves about the thousands while the sample sizes seldom reach a hundred of individuals. For phenotype classification, we propose different lasso logistic regression approaches with specific penalizations for each gene. These methods are based on a generalized soft-threshold (GSoft) estimator. We also show that a recent algorithm for convex optimization, namely, the cyclic coordinate descent (CCD) algorithm, provides with a way to solve the optimization problem significantly faster than with other competing methods. Viewing GSoft as an iterative thresholding procedure allows us to get the asymptotic properties of the resulting estimates in a straig...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=3860917</comments>
            <pubDate>Thu, 12 Aug 2010 16:33:34 +0100</pubDate>
            <guid isPermaLink="false">3860917</guid>        </item>
        <item>
            <title>Generalizing Moving Averages for Tiling Arrays Using Combined P-Value Statistics</title>
            <link>http://www.medworm.com/index.php?rid=3830429&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol9%2Fiss1%2Fart29</link>
            <description>High density tiling arrays are an effective strategy for genome-wide identification of transcription factor binding regions. Sliding window methods that calculate moving averages of log ratios or t-statistics have been useful for the analysis of tiling array data. Here, we present a method that generalizes the moving average approach to evaluate sliding windows of p-values by using combined p-value statistics. In particular, the combined p-value framework can be useful in situations when taking averages of the corresponding test-statistic for the hypothesis may not be appropriate or when it is difficult to assess the significance of these averages. We exhibit the strengths of the combined p-values methods on Drosophila tiling array data and assess their ability to predict genomic regions e...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=3830429</comments>
            <pubDate>Fri, 06 Aug 2010 16:58:41 +0100</pubDate>
            <guid isPermaLink="false">3830429</guid>        </item>
        <item>
            <title>Confidently Estimating the Number of DNA Replication Origins</title>
            <link>http://www.medworm.com/index.php?rid=3721045&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol9%2Fiss1%2Fart28</link>
            <description>We present a method for estimating and providing a confidence interval for the number of DNA replication origins in the genome of the yeast Kluyveromyces lactis. The method requires an initial set of verified sites from which a position specific frequency matrix (PSFM) can be constructed. We further assume that we have access to a sparingly used experimental procedure which can verify the functionality of a few, but not all, computationally predicted sites. While our motivation comes from estimating the number of autonomously replicating sequences (ARSs), our method can also be applied to estimating the genome-wide number of âfunctionalâ transcription factor binding sites, where functionality is determined by experimental verification of the transcription factor binding event usi...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=3721045</comments>
            <pubDate>Fri, 02 Jul 2010 16:47:21 +0100</pubDate>
            <guid isPermaLink="false">3721045</guid>        </item>
        <item>
            <title>Classification of Genomic Sequences via Wavelet Variance and a Self-Organizing Map with an Application to Mitochondrial DNA</title>
            <link>http://www.medworm.com/index.php?rid=3721046&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol9%2Fiss1%2Fart27</link>
            <description>We present a new methodology for discriminating genomic symbolic sequences, which combines wavelet analysis and a self-organizing map algorithm. Wavelets are used to extract variation across various scales in the oligonucleotide patterns of a sequence. The variation is quantified by the estimated wavelet variance, which yields a feature vector. Feature vectors obtained from many genomic sequences, possibly of different lengths, are then classified with a nonparametric self-organizing map scheme. When applied to nearly 200 entire mitochondrial DNA sequences, or their fragments, the method predicts species taxonomic group membership very well, and allows the results to be visualized. When only thousands of nucleotides are available, wavelet-based feature vectors of short oligonucleotide patt...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=3721046</comments>
            <pubDate>Fri, 02 Jul 2010 16:47:19 +0100</pubDate>
            <guid isPermaLink="false">3721046</guid>        </item>
        <item>
            <title>Locating Multiple Interacting Quantitative Trait Loci with the Zero-Inflated Generalized Poisson Regression</title>
            <link>http://www.medworm.com/index.php?rid=3686090&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol9%2Fiss1%2Fart26</link>
            <description>We consider the problem of locating multiple interacting quantitative trait loci (QTL) influencing traits measured in counts. In many applications the distribution of the count variable has a spike at zero. Zero-inflated generalized Poisson regression (ZIGPR) allows for an additional probability mass at zero and hence an improvement in the detection of significant loci. Classical model selection criteria often overestimate the QTL number. Therefore, modified versions of the Bayesian Information Criterion (mBIC and EBIC) were successfully used for QTL mapping. We apply these criteria based on ZIGPR as well as simpler models. An extensive simulation study shows their good power detecting QTL while controlling the false discovery rate. We illustrate how the inability of the Poisson distributi...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=3686090</comments>
            <pubDate>Tue, 22 Jun 2010 17:18:09 +0100</pubDate>
            <guid isPermaLink="false">3686090</guid>        </item>
        <item>
            <title>A Random Coefficients Model for Regional Co-Expression Associated with DNA Copy Number</title>
            <link>http://www.medworm.com/index.php?rid=3686091&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol9%2Fiss1%2Fart25</link>
            <description>Regional co-expression refers to the phenomenon of contiguous genes exhibiting similar expression patterns. Among others, DNA copy number aberrations may be causally involved in regional co-expression. We propose a random coefficients model to explain regional co-expression from DNA copy number information, while modeling residual co-expression due to other causes by a correlated error structure. We show how the model parameters may be estimated (computationally efficient and consistently) from high-dimensional data, and suggest several robustifications of the estimation procedure. From the model we are able to assess whether there is a shared effect on expression levels due to the DNA copy number aberrations, but also whether this effect is homogeneous across genes. In two examples we use...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=3686091</comments>
            <pubDate>Tue, 22 Jun 2010 17:18:07 +0100</pubDate>
            <guid isPermaLink="false">3686091</guid>        </item>
        <item>
            <title>Buckley-James Boosting for Survival Analysis with High-Dimensional Biomarker Data</title>
            <link>http://www.medworm.com/index.php?rid=3643703&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol9%2Fiss1%2Fart24</link>
            <description>There has been increasing interest in predicting patients' survival after therapy by investigating gene expression microarray data.  In the regression and classification models with high-dimensional genomic data, boosting has been successfully applied to build accurate predictive models and conduct variable selection simultaneously. We propose the Buckley-James boosting for the semiparametric accelerated failure time models with right censored survival data, which can be used to predict survival of future patients using the high-dimensional genomic data. In the spirit of adaptive LASSO, twin boosting is also incorporated to fit more sparse models. The proposed methods have a unified approach to fit linear models, non-linear effects models with possible interactions. The methods can perform...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=3643703</comments>
            <pubDate>Tue, 08 Jun 2010 20:39:00 +0100</pubDate>
            <guid isPermaLink="false">3643703</guid>        </item>
        <item>
            <title>Shrinkage Estimation of Effect Sizes as an Alternative to Hypothesis Testing Followed by Estimation in High-Dimensional Biology: Applications to Differential Gene Expression</title>
            <link>http://www.medworm.com/index.php?rid=3643704&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol9%2Fiss1%2Fart23</link>
            <description>Research on analyzing microarray data has focused on the problem of identifying differentially expressed genes to the neglect of the problem of how to integrate evidence that a gene is differentially expressed with information on the extent of its differential expression. Consequently, researchers currently prioritize genes for further study either on the basis of volcano plots or, more commonly, according to simple estimates of the fold change after filtering the genes with an arbitrary statistical significance threshold. While the subjective and informal nature of the former practice precludes quantification of its reliability, the latter practice is equivalent to using a hard-threshold estimator of the expression ratio that is not known to perform well in terms of mean-squared error, th...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=3643704</comments>
            <pubDate>Tue, 08 Jun 2010 20:38:57 +0100</pubDate>
            <guid isPermaLink="false">3643704</guid>        </item>
        <item>
            <title>Network Enrichment Analysis in Complex Experiments</title>
            <link>http://www.medworm.com/index.php?rid=3590002&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol9%2Fiss1%2Fart22</link>
            <description>Cellular functions of living organisms are carried out through complex systems of interacting components. Including such interactions in the analysis, and considering sub-systems defined by biological pathways instead of individual components (e.g. genes), can lead to new findings about complex biological mechanisms. Networks are often used to capture such interactions and can be incorporated in models to improve the efficiency in estimation and inference. In this paper, we propose a model for incorporating external information about interactions among genes (proteins/metabolites) in differential analysis of gene sets. We exploit the framework of mixed linear models and propose a flexible inference procedure for analysis of changes in biological pathways. The proposed method facilitates th...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=3590002</comments>
            <pubDate>Sun, 23 May 2010 03:57:40 +0100</pubDate>
            <guid isPermaLink="false">3590002</guid>        </item>
        <item>
            <title>The Generalized Odds Ratio as a Measure of Genetic Risk Effect in the Analysis and Meta-Analysis of Association Studies</title>
            <link>http://www.medworm.com/index.php?rid=3559122&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol9%2Fiss1%2Fart21</link>
            <description>The significance of risk effects in genetic association studies is assessed using the odds ratio for various genetic models (dominant, recessive and co-dominant) by merging genotypes. These models are not independent and there is no a priori biological justification for their choice. Consequently, the interpretation of their results can be problematic, especially when multiallelic variants and disease progression are investigated. The introduction of the generalized odds ratio (ORG) may be a remedy. The ORG utilizes the complete genotype distribution and it provides an estimate of the magnitude of the association, given that the mutational load and/or the phenotype are treated as a graded exposure and/or outcome. The performance of the ORG was tested in 13 meta-analyses with binary outcome...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=3559122</comments>
            <pubDate>Wed, 12 May 2010 16:33:11 +0100</pubDate>
            <guid isPermaLink="false">3559122</guid>        </item>
        <item>
            <title>Space Oriented Rank-Based Data Integration</title>
            <link>http://www.medworm.com/index.php?rid=3456086&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol9%2Fiss1%2Fart20</link>
            <description>Integration of data from multiple omics platforms has become a major challenge in studying complex systems and traits. For integrating data from multiple platforms, the underlying spaces from which the top ranked elements come from are likely to be different. Thus, taking the underlying spaces into consideration explicitly is important, as failure to do so would lead to inefficient use of data and might render biases and/or sub-optimal results. We propose two space oriented classes of heuristic algorithms for integrating ranked lists from omic scale data. These algorithms are either Borda inspired or Markov chain based that take the underlying spaces of the individual ranked lists into account explicitly. We applied this set of algorithms to a number of problems, including one that aims at...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=3456086</comments>
            <pubDate>Sat, 10 Apr 2010 03:29:09 +0100</pubDate>
            <guid isPermaLink="false">3456086</guid>        </item>
        <item>
            <title>Sub-Modular Resolution Analysis by Network Mixture Models</title>
            <link>http://www.medworm.com/index.php?rid=3456087&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol9%2Fiss1%2Fart19</link>
            <description>Inferring the structure of networks usually involves the attempt of retrieving their modular organization and knowing its possible interpretation, while quantifying the involved computational complexity through the methods and algorithms to be used. In protein interactomics, it is assumed that even the most recently created interactomes are known only up to a certain degree of coverage and accuracy, due to both experimental and computational limitations. Therefore, we need to infer from the measured interactomes about real interactomes as much as we infer from samples relative to a reference population. In order to exploit additional information sources, it is common to integrate multiple omic data and pursue method fusion. Particularly after the advent of high-throughput technologies, the...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=3456087</comments>
            <pubDate>Sat, 10 Apr 2010 03:29:06 +0100</pubDate>
            <guid isPermaLink="false">3456087</guid>        </item>
        <item>
            <title>Reconstructability Analysis as a Tool for Identifying Gene-Gene Interactions in Studies of Human Diseases</title>
            <link>http://www.medworm.com/index.php?rid=3330360&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol9%2Fiss1%2Fart18</link>
            <description>There are a number of common human diseases for which the genetic component may include an epistatic interaction of multiple genes. Detecting these interactions with standard statistical tools is difficult because there may be an interaction effect, but minimal or no main effect. Reconstructability analysis (RA) uses Shannon's information theory to detect relationships between variables in categorical datasets. We applied RA to simulated data for five different models of gene-gene interaction, and find that even with heritability levels as low as 0.008, and with the inclusion of 50 non-associated genes in the dataset, we can identify the interacting gene pairs with an accuracy of ≥80%. We applied RA to a real dataset of type 2 non-insulin-dependent diabetes (NIDDM) cases and controls, an...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=3330360</comments>
            <pubDate>Thu, 04 Mar 2010 01:53:44 +0100</pubDate>
            <guid isPermaLink="false">3330360</guid>        </item>
        <item>
            <title>Sparse Partial Least Squares Classification for High Dimensional Data</title>
            <link>http://www.medworm.com/index.php?rid=3330361&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol9%2Fiss1%2Fart17</link>
            <description>Partial least squares (PLS) is a well known dimension reduction method which has been recently adapted for high dimensional classification problems in genome biology. We develop sparse versions of the recently proposed two PLS-based classification methods using sparse partial least squares (SPLS). These sparse versions aim to achieve variable selection and dimension reduction simultaneously. We consider both binary and multicategory classification. We provide analytical and simulation-based insights about the variable selection properties of these approaches and benchmark them on well	known publicly available datasets that involve tumor classification with high dimensional gene expression data. We show that incorporation of SPLS into a generalized linear model (GLM) framework provides high...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=3330361</comments>
            <pubDate>Thu, 04 Mar 2010 01:53:41 +0100</pubDate>
            <guid isPermaLink="false">3330361</guid>        </item>
        <item>
            <title>Trilocus Disequilibrium Analysis of Multiallelic Markers in Outcrossing Populations</title>
            <link>http://www.medworm.com/index.php?rid=3261797&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol9%2Fiss1%2Fart16</link>
            <description>We present a closed-form EM algorithm framework to estimate trigenic linkage disequilibria coefficients of three multiallelic markers and present joint and separate statistical hypothesis tests of different linkage disequilibria. Linkage disequilibria analysis with three multiallelic markers is shown to be considerably more powerful than a two marker analysis or a three marker analysis that treats the multiallelic markers as biallelic markers. A three multiallelic marker model was used to analyze marker data from Lycoris longituba, a tulip-like ornamental plant in China, where each marker consisted of two to four distinct alleles. This algorithm will be useful for studying the pattern of genetic variation for outcrossing populations. (Source: Statistical Applications in Genetics and Molecu...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=3261797</comments>
            <pubDate>Wed, 10 Feb 2010 17:46:58 +0100</pubDate>
            <guid isPermaLink="false">3261797</guid>        </item>
        <item>
            <title>Weighted-LASSO for Structured Network Inference from Time Course Data</title>
            <link>http://www.medworm.com/index.php?rid=3230458&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol9%2Fiss1%2Fart15</link>
            <description>We present a weighted-LASSO method to infer the parameters of a first-order vector auto-regressive model that describes time course expression data generated by directed gene-to-gene regulation networks. These networks are assumed to own prior internal structures of connectivity which drive the inference method. This prior structure can be either derived from prior biological knowledge or inferred by the method itself. We illustrate the performance of this structure-based penalization both on synthetic data and on two canonical regulatory networks (the yeast cell cycle regulation network and the E. coli S.O.S. DNA repair network). (Source: Statistical Applications in Genetics and Molecular Biology)</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=3230458</comments>
            <pubDate>Mon, 01 Feb 2010 17:46:52 +0100</pubDate>
            <guid isPermaLink="false">3230458</guid>        </item>
        <item>
            <title>An Internal Calibration Method for Protein-Array Studies</title>
            <link>http://www.medworm.com/index.php?rid=3215423&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol9%2Fiss1%2Fart14</link>
            <description>Nuisance factors in a protein-array study add obfuscating variation to spot intensity measurements, diminishing the accuracy and precision of protein concentration predictions. The effects of nuisance factors may be reduced by design of experiments, and by estimating and then subtracting nuisance effects. Estimated nuisance effects also inform about the quality of the study and suggest refinements for future studies.We demonstrate a method to reduce nuisance effects by incorporating a non-interfering internal calibration in the study design and its complemental analysis of variance. We illustrate this method by applying a chip-level internal calibration in a biomarker discovery study. The variability of sample intensity estimates was reduced 16% to 92% with a median of 58%; confidence inte...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=3215423</comments>
            <pubDate>Thu, 28 Jan 2010 04:42:28 +0100</pubDate>
            <guid isPermaLink="false">3215423</guid>        </item>
        <item>
            <title>Comparing Spatial Maps of Human Population-Genetic Variation Using Procrustes Analysis</title>
            <link>http://www.medworm.com/index.php?rid=3215424&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol9%2Fiss1%2Fart13</link>
            <description>Recent applications of principal components analysis (PCA) and multidimensional scaling (MDS) in human population genetics have found that &quot;statistical maps&quot; based on the genotypes in population-genetic samples often resemble geographic maps of the underlying sampling locations. To provide formal tests of these qualitative observations, we describe a Procrustes analysis approach for quantitatively assessing the similarity of population-genetic and geographic maps. We confirm in two scenarios, one using single-nucleotide polymorphism (SNP) data from Europe and one using SNP data worldwide, that a measurably high level of concordance exists between statistical maps of population-genetic variation and geographic maps of sampling locations. Two other examples illustrate the versatility of the ...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=3215424</comments>
            <pubDate>Thu, 28 Jan 2010 04:42:23 +0100</pubDate>
            <guid isPermaLink="false">3215424</guid>        </item>
        <item>
            <title>An Alternative Model of Type A Dependence in a Gene Set of Correlated Genes</title>
            <link>http://www.medworm.com/index.php?rid=3211127&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol9%2Fiss1%2Fart12</link>
            <description>Klebanov et al. (2006) proposed a new type of stochastic dependence, Type A dependence, between gene expression levels. They estimated the abundance of Type A pairs by testing the correlation coefficients of gene pairs. We propose a new model, hidden regulator dependence, as an alternative to Type A dependence. We show that the correlation based procedure proposed by Klebanov et al. (2006) fails to differentiate hidden regulator dependence from Type A dependence, although their probabilistic structures are quite different. (Source: Statistical Applications in Genetics and Molecular Biology)</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=3211127</comments>
            <pubDate>Tue, 26 Jan 2010 17:41:27 +0100</pubDate>
            <guid isPermaLink="false">3211127</guid>        </item>
        <item>
            <title>Asymptotic Distribution of the &quot;Orthogonal&quot; Quantitative Transmission Disequilibrium Test in a Structured Population: Exact Formula</title>
            <link>http://www.medworm.com/index.php?rid=3211128&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol9%2Fiss1%2Fart11</link>
            <description>Population structure is a recurrent problem for the detection of associations between a marker and a trait, because it can lead to an excess of false positives of the association tests. One popular way of circumventing this problem is the use of family based tests, which consider the transmission of the genotype from the parents to the offspring. Here we focus on quantitative traits and study the Abecasis “orthogonal&quot; quantitative transmission disequilibrium test, which is commonly used in family based association studies. We derive the probability distribution of this test under a general model of structured population. Our derivations show that this test leads to a small excess of false positives due to population structure. They also illustrate and quantify how the heterogeneity in ge...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=3211128</comments>
            <pubDate>Tue, 26 Jan 2010 17:41:25 +0100</pubDate>
            <guid isPermaLink="false">3211128</guid>        </item>
        <item>
            <title>Parameter Estimation in Multiple-Hidden I.I.D. Models from Biological Multiple Alignment</title>
            <link>http://www.medworm.com/index.php?rid=3211129&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol9%2Fiss1%2Fart10</link>
            <description>In this work we deal with parameter estimation in a latent variable model, namely the multiple-hidden i.i.d. model, which is derived from multiple alignment algorithms. We first provide a rigorous formalism for the homology structure of k sequences related by a star-shaped phylogenetic tree in the context of multiple alignment based on indel evolution models. We discuss possible definitions of likelihoods and compare them to the criterion used in multiple alignment algorithms. Existence of two different Information divergence rates is established and a divergence property is shown under additional assumptions. This would yield consistency for the parameter in parametrization schemes for which the divergence property holds. We finally extend the definition of the multiple-hidden i.i.d. mode...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=3211129</comments>
            <pubDate>Tue, 26 Jan 2010 17:41:22 +0100</pubDate>
            <guid isPermaLink="false">3211129</guid>        </item>
        <item>
            <title>An Empirical Bayesian Method for Estimating Biological Networks from Temporal Microarray Data</title>
            <link>http://www.medworm.com/index.php?rid=3178084&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol9%2Fiss1%2Fart9</link>
            <description>Gene regulatory networks refer to the interactions that occur among genes and other cellular products. The topology of these networks can be inferred from measurements of changes in gene expression over time. However, because the measurement device (i.e., microarrays) typically yields information on thousands of genes over few biological replicates, these systems are quite difficult to elucidate. An approach with proven effectiveness for inferring networks is the Dynamic Bayesian Network. We have developed an iterative empirical Bayesian procedure with a Kalman filter that estimates the posterior distributions of network parameters. We compare our method to similar existing methods on simulated data and real microarray time series data. We find that the proposed method performs comparably ...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=3178084</comments>
            <pubDate>Fri, 15 Jan 2010 21:03:48 +0100</pubDate>
            <guid isPermaLink="false">3178084</guid>        </item>
        <item>
            <title>Dealing with Heterogeneity between Cohorts in Genomewide SNP Association Studies</title>
            <link>http://www.medworm.com/index.php?rid=3170431&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol9%2Fiss1%2Fart8</link>
            <description>In Genomewide association (GWA) studies investigating thousands of SNPs, large sample sizes are needed to obtain a reasonable power after correction for multiple testing. To obtain the necessary sample sizes, data from different populations/cohorts are combined. The problem of pooling evidence across cohorts bears some resemblance with meta-analysis of clinical trials, and in fact classical meta-analytic methodologies from that field are typically used in GWAs. However, in genetics, it can be expected that the cohorts show some amount of heterogeneity in the association measures that are used for significance testing. In this paper, we demonstrate how it is possible to exploit this heterogeneity to improve our ability to detect influential genetic variants. We also discuss how pathway anal...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=3170431</comments>
            <pubDate>Wed, 13 Jan 2010 22:19:23 +0100</pubDate>
            <guid isPermaLink="false">3170431</guid>        </item>
        <item>
            <title>The Apportionment of Total Genetic Variation by Categorical Analysis of Variance</title>
            <link>http://www.medworm.com/index.php?rid=3170432&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol9%2Fiss1%2Fart7</link>
            <description>We wish to suggest the categorical analysis of variance as a means of quantifying the proportion of total genetic variation attributed to different sources of variation. This method potentially challenges researchers to rethink conclusions derived from a well-known method known as the analysis of molecular variance (AMOVA). The CATANOVA framework allows explicit definition, and estimation, of two measures of genetic differentiation. These parameters form the subject of interest in many research programmes, but are often confused with the correlation measures defined in AMOVA, which cannot be interpreted as relative contributions of particular sources of variation. Through a simulation approach, we show that under certain conditions, researchers who use AMOVA to estimate these measures of g...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=3170432</comments>
            <pubDate>Wed, 13 Jan 2010 22:19:19 +0100</pubDate>
            <guid isPermaLink="false">3170432</guid>        </item>
        <item>
            <title>Optimisation of HMM Topologies Enhances DNA and Protein Sequence Modelling</title>
            <link>http://www.medworm.com/index.php?rid=3148273&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol9%2Fiss1%2Fart6</link>
            <description>Hidden Markov models (HMMs) play a major role in applications to unravel biomolecular functionality. Though HMMs are technically mature and widely applied in computational biology, there is a potential of methodical optimisation concerning its modelling of biological data sources with varying sequence lengths.

Single building blocks of these models, the states, are associated with a certain holding time, being the link to the length distribution of represented sequence motifs. An adaptation of regular HMM topologies to bell-shaped sequence lengths is achieved by a serial chain-linking of hidden states, while residing in the class of conventional hidden Markov models. The factor of the repetition of states (r) and the parameter for state-specific duration of stay (p) are determined by fitt...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=3148273</comments>
            <pubDate>Wed, 06 Jan 2010 23:07:18 +0100</pubDate>
            <guid isPermaLink="false">3148273</guid>        </item>
        <item>
            <title>Detecting Genotyping Error Using Measures of Degree of Hardy-Weinberg Disequilibrium</title>
            <link>http://www.medworm.com/index.php?rid=3148274&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol9%2Fiss1%2Fart5</link>
            <description>Tests for Hardy-Weinberg equilibrium (HWE) have been used to detect genotyping error, but those tests have low power unless the sample size is very large. We assessed the performance of measures of departure from HWE as an alternative way of screening for genotyping error. Three measures of the degree of disequilibrium (α, ,D, and F) were tested for their ability to detect genotyping error of 5% or more using simulations and a real dataset of 184 children with leukemia genotyped at 28 single nucleotide polymorphisms. The simulations indicate that all three disequilibrium coefficients can usefully detect genotyping error as judged by the area under the Receiver Operator Characteristic (ROC) curve. Their discriminative ability increases as the error rate increases, and is greater if the gen...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=3148274</comments>
            <pubDate>Wed, 06 Jan 2010 23:07:13 +0100</pubDate>
            <guid isPermaLink="false">3148274</guid>        </item>
        <item>
            <title>Informative or Noninformative Calls for Gene Expression: A Latent Variable Approach</title>
            <link>http://www.medworm.com/index.php?rid=3148275&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol9%2Fiss1%2Fart4</link>
            <description>The strength and weakness of microarray technology can be attributed to the enormous amount of information it is generating. To fully enhance the benefit of microarray technology for testing differentially expressed genes and classification, there is a need to minimize the amount of irrelevant genes present in microarray data. A major interest is to use probe-level data to call genes informative or noninformative based on the trade-off between the array-to-array variability and the measurement error. Existing works in this direction include filtering likely uninformative sets of hybridization (FLUSH; Calza et al., 2007) and I/NI calls for the exclusion of noninformative genes using FARMS (I/NI calls; Talloen et al., 2007; Hochreiter et al., 2006). In this paper, we propose a linear mixed m...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=3148275</comments>
            <pubDate>Wed, 06 Jan 2010 23:07:08 +0100</pubDate>
            <guid isPermaLink="false">3148275</guid>        </item>
        <item>
            <title>A Bayesian Hierarchical Model for Quantitative Real-Time PCR Data</title>
            <link>http://www.medworm.com/index.php?rid=3148276&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol9%2Fiss1%2Fart3</link>
            <description>We present a Bayesian hierarchical model for quantitative real-time polymerase chain reaction (PCR) data, aiming at relative quantification of DNA copy number in different biological samples. The model is specified in terms of a hidden Markov model for fluorescence intensities measured at successive cycles of the polymerase chain reaction. The efficiency of the reaction is assumed to depend on the abundance of the target DNA through fluorescence intensities, and the relationship is specified based on the kinetics of the reaction. The model incorporates the intrinsic random nature of the process as well as measurement error. Taking a Bayesian inferential approach, marginal posterior distributions of the quantities of interest are estimated using Markov chain Monte Carlo. The method is appli...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=3148276</comments>
            <pubDate>Wed, 06 Jan 2010 23:07:04 +0100</pubDate>
            <guid isPermaLink="false">3148276</guid>        </item>
        <item>
            <title>Testing for Gene-Gene Interaction with AMMI Models</title>
            <link>http://www.medworm.com/index.php?rid=3148277&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol9%2Fiss1%2Fart2</link>
            <description>We describe the use of the biplot to display the structure of the interaction and evaluate the performance of the AMMI and the special cases of the AMMI previously described by Tukey and Mandel with simulated data sets. Our simulated study showed that the AMMI model is as powerful as general linear models when the interaction is not modeled in the presence of marginal effects. However, in the presence of pure epitasis, i.e. in the absence of marginal effects, the AMMI method was not found to be superior to other tested regression methods. (Source: Statistical Applications in Genetics and Molecular Biology)</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=3148277</comments>
            <pubDate>Wed, 06 Jan 2010 23:07:00 +0100</pubDate>
            <guid isPermaLink="false">3148277</guid>        </item>
        <item>
            <title>Epistatic Interactions</title>
            <link>http://www.medworm.com/index.php?rid=3148278&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol9%2Fiss1%2Fart1</link>
            <description>The term &quot;epistasis&quot; is sometimes used to describe some form of statistical interaction between genetic factors and is alternatively sometimes used to describe instances in which the effect of a particular genetic variant is masked by a variant at another locus. In general statistical tests for interaction are of limited use in detecting &quot;epistasis&quot; in the sense of masking. It is, however, shown that there are relations between empirical data patterns and epistasis that have not been previously noted. These relations can sometimes be exploited to empirically test for &quot;epistatic interactions&quot; in the sense of the masking of the effect of a particular genetic variant by a variant at another locus. (Source: Statistical Applications in Genetics and Molecular Biology)</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=3148278</comments>
            <pubDate>Wed, 06 Jan 2010 22:52:59 +0100</pubDate>
            <guid isPermaLink="false">3148278</guid>        </item>
        <item>
            <title>A Unified Mixed Effects Model for Gene Set Analysis of Time Course Microarray Experiments</title>
            <link>http://www.medworm.com/index.php?rid=2971751&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol8%2Fiss1%2Fart47</link>
            <description>We describe simulation studies using gene expression data with &quot;real life&quot; correlations and we demonstrate the proposed random coefficient model using a mouse colon development time course dataset. The agreement between results of the proposed random coefficient model and the previous reports for this proof-of-concept trial further validates this methodology, which provides a unified statistical model for systems analysis of microarray experiments with complex experimental designs when re-sampling based methods are difficult to apply. (Source: Statistical Applications in Genetics and Molecular Biology)</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=2971751</comments>
            <pubDate>Sun, 08 Nov 2009 00:16:48 +0100</pubDate>
            <guid isPermaLink="false">2971751</guid>        </item>
        <item>
            <title>Statistical Screening Method for Genetic Factors Influencing Susceptibility to Common Diseases in a Two-Stage Genome-Wide Association Study</title>
            <link>http://www.medworm.com/index.php?rid=2962238&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol8%2Fiss1%2Fart46</link>
            <description>A genome-wide association study (GWAS) is a standard strategy for detecting disease susceptibility genes, despite unsettled controversies on many aspects, including optimal study design and statistical analysis. As for study design, a two-stage design has been applied to maximize cost-effectiveness. However, there has been little consensus on appropriate statistical analysis for two-stage design. Thereby perplexing the researchers as to which statistical measures should be applied at the first stage, and how to determine the significance level of the differences at the second stage. Here, using simulation studies, we compared statistical operating characteristics of the screening in a two-stage GWAS by taking into consideration the proper balance of false-positive and false-negative error....</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=2962238</comments>
            <pubDate>Wed, 04 Nov 2009 18:52:42 +0100</pubDate>
            <guid isPermaLink="false">2962238</guid>        </item>
        <item>
            <title>A Regularized Regression Approach for Dissecting Genetic Conflicts that Increase Disease Risk in Pregnancy</title>
            <link>http://www.medworm.com/index.php?rid=2922638&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol8%2Fiss1%2Fart45</link>
            <description>Human diseases developed during pregnancy could be caused by the direct effects of both maternal and fetal genes, and/or by the indirect effects caused by genetic conflicts. Genetic conflicts exist when the effects of fetal genes are opposed by the effects of maternal genes, or when there is a conflict between the maternal and paternal genes within the fetal genome. The two types of genetic conflicts involve the functions of different genes in different genomes and are genetically distinct. Differentiating and further dissecting the two sets of genetic conflict effects that increase disease risk during pregnancy present statistical challenges, and have been traditionally pursued as two separate endeavors. In this article, we develop a unified framework to model and test the two sets of gen...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=2922638</comments>
            <pubDate>Fri, 23 Oct 2009 20:42:53 +0100</pubDate>
            <guid isPermaLink="false">2922638</guid>        </item>
        <item>
            <title>Transmission Disequilibrium Test Power and Sample Size in the Presence of Locus Heterogeneity</title>
            <link>http://www.medworm.com/index.php?rid=2874812&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol8%2Fiss1%2Fart44</link>
            <description>Locus heterogeneity is one of the most important issues in gene mapping and can cause significant reductions in statistical power for gene mapping, yet no research to date has provided power and sample size calculations for family-based association methods in the presence of locus heterogeneity. The purpose of this research is three-fold: (i) to provide an analytic solution to the incorporation of locus heterogeneity into power and sample size calculations for the TDT statistic; (ii) to verify our analytic solution with simulations; and (iii) to study how different factors affect sample size requirement for the TDT in the presence of locus heterogeneity. The detection of association in the presence of locus heterogeneity requires a greater sample size than in its absence. This increase is ...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=2874812</comments>
            <pubDate>Fri, 09 Oct 2009 02:54:47 +0100</pubDate>
            <guid isPermaLink="false">2874812</guid>        </item>
        <item>
            <title>Characterizing the D2 Statistic: Word Matches in Biological Sequences</title>
            <link>http://www.medworm.com/index.php?rid=2874813&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol8%2Fiss1%2Fart43</link>
            <description>Word matches are often used in sequence comparison methods, either as a measure of sequence similarity or in the first search steps of algorithms such as BLAST or BLAT. The D2 statistic is the number of matches of words of k letters between two sequences. Recent advances have been made in the characterization of this statistic and in the approximation of its distribution. Here, these results are extended to the case of approximate word matches.We compute the exact value of the variance of the D2 statistic for the case of a uniform letter distribution, and introduce a method to provide accurate approximations of the variance in the remaining cases. This enables the distribution of D2 to be approximated for typical situations arising in biological research. We apply these results to the iden...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=2874813</comments>
            <pubDate>Thu, 08 Oct 2009 20:24:50 +0100</pubDate>
            <guid isPermaLink="false">2874813</guid>        </item>
        <item>
            <title>MC-Normalization: A Novel Method for Dye-Normalization of Two-Channel Microarray Data</title>
            <link>http://www.medworm.com/index.php?rid=2854466&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol8%2Fiss1%2Fart42</link>
            <description>Pre-processing plays a vital role in two-color microarray data analysis. An analysis is characterized by its ability to identify differentially expressed genes (its sensitivity) and its ability to provide unbiased estimators of the true regulation (its bias). It has been shown that microarray experiments regularly underestimate the true regulation of differentially expressed genes. We introduce the MC-normalization, where C stands for channel-wise normalization, with considerably lower bias than the commonly used standard methods. The idea behind the MC-normalization is that the channels' individual intensities determine the correction, rather than the average intensity which is the case for the widely used MA-normalization. The two methods were evaluated using spike-in data from an in-hou...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=2854466</comments>
            <pubDate>Fri, 02 Oct 2009 00:02:53 +0100</pubDate>
            <guid isPermaLink="false">2854466</guid>        </item>
        <item>
            <title>M-quantile Regression Analysis of Temporal Gene Expression Data</title>
            <link>http://www.medworm.com/index.php?rid=2818517&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol8%2Fiss1%2Fart41</link>
            <description>We present a new method to approach this problem. Firstly, the temporal profiles of the genes are modelled by a parametric M-quantile regression model. This model is particularly appealing to small-sample gene expression data, as it is very robust against outliers and it does not make any assumption on the error distribution. Secondly, we further increase the robustness of the method by summarising the M-quantile regression models for a large range of quantile values into an M-quantile coefficient. Finally, we fit a polynomial M-quantile regression model to the M-quantile coefficients over time and employ a Hotelling T2-test to detect significant differences of the temporal M-quantile coefficients profiles across conditions. Extensive simulations show the increased power and robustness of ...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=2818517</comments>
            <pubDate>Tue, 22 Sep 2009 16:43:02 +0100</pubDate>
            <guid isPermaLink="false">2818517</guid>        </item>
        <item>
            <title>Modeling Dependence in Methylation Patterns with Application to Ovarian Carcinomas</title>
            <link>http://www.medworm.com/index.php?rid=2818518&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol8%2Fiss1%2Fart40</link>
            <description>Changes in cytosine methylation at CpG nucleotides are observed in many cancers and offer great potential for translational research. Diseases such as ovarian cancer that are especially challenging to diagnose and treat are of particular interest, and abnormal methylation in the tandem repeats Sat2 and NBL2 has been observed in a collection of ovarian carcinomas. In earlier analyses of double-stranded methylation patterns in 0.2 kb regions of Sat2 and NBL2, we detected clusters of identically methylated sites in close proximity. These clusters could not be explained by random variation, and our findings suggested a high degree of site-to-site dependence. However, previously developed stochastic models for methylation change have either treated CpG sites independently or employed a context ...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=2818518</comments>
            <pubDate>Tue, 22 Sep 2009 16:42:58 +0100</pubDate>
            <guid isPermaLink="false">2818518</guid>        </item>
        <item>
            <title>Calculating Asymptotic Significance Levels of the Constrained Likelihood Ratio Test with Application to Multivariate Genetic Linkage Analysis</title>
            <link>http://www.medworm.com/index.php?rid=2806552&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol8%2Fiss1%2Fart39</link>
            <description>The asymptotic distribution of the multivariate variance component linkage analysis likelihood ratio test has provoked some contradictory accounts in the literature. In this paper we confirm that some previous results are not correct by deriving the asymptotic distribution in one special case. It is shown that this special case is a good approximation to the distribution in many situations. We also introduce a new approach to simulating from the asymptotic distribution of the likelihood ratio test statistic in constrained testing problems. It is shown that this method is very efficient for small p-values, and is applicable even when the constraints are not convex. The method is related to a multivariate integration problem. We illustrate how the approach can be applied to multivariate link...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=2806552</comments>
            <pubDate>Fri, 18 Sep 2009 02:14:42 +0100</pubDate>
            <guid isPermaLink="false">2806552</guid>        </item>
        <item>
            <title>A Statistical Model for Genetic Mapping of Viral Infection by Integrating Epidemiological Behavior</title>
            <link>http://www.medworm.com/index.php?rid=2781027&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol8%2Fiss1%2Fart38</link>
            <description>Large-scale studies of genetic variation may be helpful for understanding the genetic control mechanisms of viral infection and, ultimately, predicting and eliminating infectious disease outbreaks. We propose a new statistical model for detecting specific DNA sequence variants that are responsible for viral infection. This model considers additive, dominance and epistatic effects of haplotypes from three different genomes, recipient, transmitter and virus, through an epidemiological process. The model is constructed within the maximum likelihood framework and implemented with the EM algorithm. A number of hypothesis tests about population genetic structure and diversity and the pattern of genetic control are formulated. A series of closed forms for the EM algorithm to estimate haplotype fr...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=2781027</comments>
            <pubDate>Wed, 09 Sep 2009 20:56:14 +0100</pubDate>
            <guid isPermaLink="false">2781027</guid>        </item>
        <item>
            <title>Identifying Individuals in a Complex Mixture of DNA with Unknown Ancestry</title>
            <link>http://www.medworm.com/index.php?rid=2781028&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol8%2Fiss1%2Fart37</link>
            <description>A new test was recently developed that could use a high-density set of single nucleotide polymorphisms (SNPs) to determine whether a specific individual contributed to a mixture of DNA. The test statistic compared the genotype for the individual to the allele frequencies in the mixture and to the allele frequencies in a reference group. This test requires the ancestries of the reference group to be nearly identical to those of the contributors to the mixture. Here, we first quantify the bias, the increase in type I and type II error, when the ancestries are not well matched. Then, we show that the test can also be biased if the number of subjects in the two groups differ or if the platforms used to measure SNP intensities differ. We then introduce a new test statistic and a test that only ...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=2781028</comments>
            <pubDate>Wed, 09 Sep 2009 20:56:11 +0100</pubDate>
            <guid isPermaLink="false">2781028</guid>        </item>
        <item>
            <title>Prediction of Motifs Based on a Repeated-Measures Model for Integrating Cross-Species Sequence and Expression Data</title>
            <link>http://www.medworm.com/index.php?rid=2781029&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol8%2Fiss1%2Fart36</link>
            <description>De novo identification of transcription factor binding sites (TFBS) is a challenging computational problem because TFBSs are relatively short sequences buried in long genomic regions. Earlier methods incorporated genome-wide expression data and promoter sequences into a linear-model framework, regressing expression on counts of putative TFBSs in promoters for a single species. More recently, it has been shown that examining sequence data across multiple species improves the prediction of TFBSs. In this work, we describe an extension of the single-species, linear-model framework for the analysis of paired cross-species sequence and expression data. A repeated measures model for gene-expression measurements across species is used, accounting for phylogenetic relationships among species throu...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=2781029</comments>
            <pubDate>Wed, 09 Sep 2009 20:56:08 +0100</pubDate>
            <guid isPermaLink="false">2781029</guid>        </item>
        <item>
            <title>Ancestral Recombination Graphs under Non-Random Ascertainment, with Applications to Gene Mapping</title>
            <link>http://www.medworm.com/index.php?rid=2781030&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol8%2Fiss1%2Fart35</link>
            <description>Consider a sample of apparently unrelated individuals, for which marker genotype and phenotype data is available. When individuals are sampled on phenotypes, we propose an ascertained ancestral recombination graph (ARG) that models shared ancestry of the sample chromosomes given phenotype data along a region that possibly harbors a disease susceptibility gene. The ascertained ARG is used to define a gene mapping algorithm by means of a lod score and associated p-values based on permutation testing. Under certain modeling simplifications, the lod score and p-values can be computed exactly, without any Monte Carlo approximations, even for unphased chromosome genotype data. Our method handles incomplete penetrance, varying marker allele frequencies and neutral mutations, and is based on a Hid...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=2781030</comments>
            <pubDate>Wed, 09 Sep 2009 20:56:04 +0100</pubDate>
            <guid isPermaLink="false">2781030</guid>        </item>
        <item>
            <title>Rotation Testing in Gene Set Enrichment Analysis for Small Direct Comparison Experiments</title>
            <link>http://www.medworm.com/index.php?rid=2644235&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol8%2Fiss1%2Fart34</link>
            <description>Gene Set Enrichment Analysis (GSEA) is a method for analysing gene expression data with a focus on a priori defined gene sets. The permutation test generally used in GSEA for testing the significance of gene set enrichment involves permutation of a phenotype vector and is developed for data from an indirect comparison design, i.e. unpaired data. In some studies the samples representing two phenotypes are paired, e.g. samples taken from a patient before and after treatment, or if samples representing two phenotypes are hybridised to the same two-channel array (direct comparison design). In this paper we will focus on data from direct comparison experiments, but the methods can be applied to paired data in general. For these types of data, a standard permutation test for paired data that ran...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=2644235</comments>
            <pubDate>Mon, 27 Jul 2009 18:14:34 +0100</pubDate>
            <guid isPermaLink="false">2644235</guid>        </item>
        <item>
            <title>A Multivariate Growth Curve Model for Ranking Genes in Replicated Time Course Microarray Data</title>
            <link>http://www.medworm.com/index.php?rid=2568182&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol8%2Fiss1%2Fart33</link>
            <description>Gene ranking problem in time course microarray experiments is challenging since gene expression levels between different time points are correlated. This is because, expression values at successive time points are usually taken from the same organism, tissue or culture. Moreover, time dependency of gene expression values is usually of interest and often is the biological problem that motivates the experiment. We propose a multivariate growth curve model for ranking genes and estimating mean gene expression profiles in replicated time course microarray data. The approach takes the within individual correlation as well as the temporal ordering into consideration. Moreover, time is incorporated as a continuous variable in the model to account for the temporal pattern. Polynomial profiles are ...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=2568182</comments>
            <pubDate>Wed, 01 Jul 2009 22:39:40 +0100</pubDate>
            <guid isPermaLink="false">2568182</guid>        </item>
        <item>
            <title>Estimation of Selection Intensity under Overdominance by Bayesian Methods</title>
            <link>http://www.medworm.com/index.php?rid=2559446&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol8%2Fiss1%2Fart32</link>
            <description>A balanced pattern in the allele frequencies of polymorphic loci is a potential sign of selection, particularly of overdominance. Although this type of selection is of some interest in population genetics, there exists no likelihood based approaches specifically tailored to make inference on selection intensity. To fill this gap, we present Bayesian methods to estimate selection intensity under k-allele models with overdominance. Our model allows for an arbitrary number of loci and alleles within a locus. The neutral and selected variability within each locus are modeled with corresponding k-allele models. To estimate the posterior distribution of the mean selection intensity in a multilocus region, a hierarchical setup between loci is used. The methods are demonstrated with data at the Hu...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=2559446</comments>
            <pubDate>Wed, 01 Jul 2009 02:33:53 +0100</pubDate>
            <guid isPermaLink="false">2559446</guid>        </item>
        <item>
            <title>Model Selection Based on FDR-Thresholding Optimizing the Area under the ROC-Curve</title>
            <link>http://www.medworm.com/index.php?rid=2516676&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol8%2Fiss1%2Fart31</link>
            <description>We evaluate variable selection by multiple tests controlling the false discovery rate (FDR) to build a linear score for prediction of clinical outcome in high-dimensional data. Quality of prediction is assessed by the receiver operating characteristic curve (ROC) for prediction in independent patients. Thus we try to combine both goals: prediction and controlled structure estimation. We show that the FDR-threshold which provides the ROC-curve with the largest area under the curve (AUC) varies largely over the different parameter constellations not known in advance. Hence, we investigated a new cross validation procedure based on the maximum rank correlation estimator to determine the optimal selection threshold. This procedure (i) allows choosing an appropriate selection criterion, (ii) pr...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=2516676</comments>
            <pubDate>Thu, 25 Jun 2009 21:08:55 +0100</pubDate>
            <guid isPermaLink="false">2516676</guid>        </item>
        <item>
            <title>Adaptive Transmission Disequilibrium Test for Family Trio Design</title>
            <link>http://www.medworm.com/index.php?rid=2500529&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol8%2Fiss1%2Fart30</link>
            <description>The transmission disequilibrium test (TDT) is a standard method to detect association using family trio design. It is optimal for an additive genetic model. Other TDT-type tests optimal for recessive and dominant models have also been developed. Association tests using family data, including the TDT-type statistics, have been unified to a class of more comprehensive and flexable family-based association tests (FBAT). TDT-type tests have high efficiency when the genetic model is known or correctly specified, but may lose power if the model is mis-specified. Hence tests that are robust to genetic model mis-specification yet efficient are preferred. Constrained likelihood ratio test (CLRT) and MAX-type test have been shown to be efficiency robust. In this paper we propose a new efficiency rob...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=2500529</comments>
            <pubDate>Tue, 23 Jun 2009 20:48:37 +0100</pubDate>
            <guid isPermaLink="false">2500529</guid>        </item>
        <item>
            <title>A Non-Homogeneous Hidden-State Model on First Order Differences for Automatic Detection of Nucleosome Positions</title>
            <link>http://www.medworm.com/index.php?rid=2500530&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol8%2Fiss1%2Fart29</link>
            <description>The ability to map individual nucleosomes accurately across genomes enables the study of relationships between dynamic changes in nucleosome positioning/occupancy and gene regulation. However, the highly heterogeneous nature of nucleosome densities across genomes and short linker regions pose challenges in mapping nucleosome positions based on high-throughput microarray data of micrococcal nuclease (MNase) digested DNA. Previous works rely on additional detrending and careful visual examination to detect low-signal nucleosomes, which may exist in a subpopulation of cells. We propose a non-homogeneous hidden-state model based on first order differences of experimental data along genomic coordinates that bypasses the need for local detrending and can automatically detect nucleosome positions...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=2500530</comments>
            <pubDate>Fri, 19 Jun 2009 20:15:30 +0100</pubDate>
            <guid isPermaLink="false">2500530</guid>        </item>
        <item>
            <title>Extensions of Sparse Canonical Correlation Analysis with Applications to Genomic Data</title>
            <link>http://www.medworm.com/index.php?rid=2467205&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol8%2Fiss1%2Fart28</link>
            <description>In recent work, several authors have introduced methods for sparse canonical correlation analysis (sparse CCA). Suppose that two sets of measurements are available on the same set of observations. Sparse CCA is a method for identifying sparse linear combinations of the two sets of variables that are highly correlated with each other. It has been shown to be useful in the analysis of high-dimensional genomic data, when two sets of assays are available on the same set of samples. In this paper, we propose two extensions to the sparse CCA methodology. (1) Sparse CCA is an unsupervised method; that is, it does not make use of outcome measurements that may be available for each observation (e.g., survival time or cancer subtype). We propose an extension to sparse CCA, which we call sparse super...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=2467205</comments>
            <pubDate>Tue, 09 Jun 2009 18:18:40 +0100</pubDate>
            <guid isPermaLink="false">2467205</guid>        </item>
        <item>
            <title>Bayesian Unsupervised Learning with Multiple Data Types</title>
            <link>http://www.medworm.com/index.php?rid=2462358&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol8%2Fiss1%2Fart27</link>
            <description>We report a genetic signature for the basal-like subtype of breast cancer found across a number of previous gene expression array studies. Using the two algorithmic approaches we find that this signature also arises from clustering on the microRNA expression data and appears derivative from this data. (Source: Statistical Applications in Genetics and Molecular Biology)</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=2462358</comments>
            <pubDate>Fri, 05 Jun 2009 22:16:05 +0100</pubDate>
            <guid isPermaLink="false">2462358</guid>        </item>
        <item>
            <title>A Parametric Model for Analyzing Anticipation in Genetically Predisposed Families</title>
            <link>http://www.medworm.com/index.php?rid=2454472&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol8%2Fiss1%2Fart26</link>
            <description>Anticipation, i.e. a decreasing age-at-onset in subsequent generations has been observed in a number of genetically triggered diseases. The impact of anticipation is generally studied in affected parent-child pairs. These analyses are restricted to pairs in which both individuals have been affected and are sensitive to right truncation of the data. We propose a normal random effects model that allows for right-censored observations and includes covariates, and draw statistical inference based on the likelihood function. We applied the model to the hereditary nonpolyposis colorectal cancer (HNPCC)/Lynch syndrome family cohort from the national Danish HNPCC register. Age-at-onset was analyzed in 824 individuals from 2-4 generations in 125 families with proved disease-predisposing mutations. ...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=2454472</comments>
            <pubDate>Tue, 02 Jun 2009 23:09:51 +0100</pubDate>
            <guid isPermaLink="false">2454472</guid>        </item>
        <item>
            <title>Increase of Rejection Rate in Case-Control Studies with the Differential Genotyping Error Rates</title>
            <link>http://www.medworm.com/index.php?rid=2401929&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol8%2Fiss1%2Fart25</link>
            <description>This study extends previous work by examining this issue analytically using the non-centrality parameter of the asymptotic distribution of the chi-squared test and linear trend test (LTT) when there is no difference between case and control genotype frequencies, but there is differential misclassification with SNP data. The parameters examined are the minor allele frequency (MAF) and sample size. When MAF is less than 0.2, differential genotyping errors lead to a rejection rate much larger than the nominal significance level. As the MAF decreases to zero, the increase in the rejection rate becomes larger. The errors that most increase the rejection rate are differential recording of the more common homozygote as the other homozygote and differential recording of the more common homozygote ...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=2401929</comments>
            <pubDate>Thu, 07 May 2009 20:49:21 +0100</pubDate>
            <guid isPermaLink="false">2401929</guid>        </item>
        <item>
            <title>Incorporating Duplicate Genotype Data into Linear Trend Tests of Genetic Association: Methods and Cost-Effectiveness</title>
            <link>http://www.medworm.com/index.php?rid=2395668&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol8%2Fiss1%2Fart24</link>
            <description>The genome-wide association (GWA) study is an increasingly popular way to attempt to identify the causal variants in human disease. Duplicate genotyping (or re-genotyping) a portion of the samples in a GWA study is common, though it is typical for these data to be ignored in subsequent tests of genetic association. We demonstrate a method for including duplicate genotype data in linear trend tests of genetic association which yields increased power. We also consider the cost-effectiveness of collecting duplicate genotype data and find that when the relative cost of genotyping to phenotyping and sample acquisition costs is less than or equal to the genotyping error rate it is more powerful to duplicate genotype the entire sample instead of spending the same money to increase the sample size...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=2395668</comments>
            <pubDate>Tue, 05 May 2009 17:33:50 +0100</pubDate>
            <guid isPermaLink="false">2395668</guid>        </item>
        <item>
            <title>Weighted Multiple Hypothesis Testing Procedures</title>
            <link>http://www.medworm.com/index.php?rid=2500531&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol8%2Fiss1%2Fart23</link>
            <description>Multiple hypothesis testing is commonly used in genome research such as genome-wide studies and gene expression data analysis (Lin, 2005). The widely used Bonferroni procedure controls the family-wise error rate (FWER) for multiple hypothesis testing, but has limited statistical power as the number of hypotheses tested increases. The power of multiple testing procedures can be increased by using weighted p-values (Genovese et al., 2006). The weights for the p-values can be estimated by using certain prior information. Wasserman and Roeder (2006) described a weighted Bonferroni procedure, which incorporates weighted p-values into the Bonferroni procedure, and Rubin et al. (2006) and Wasserman and Roeder (2006) estimated the optimal weights that maximize the power of the weighted Bonferroni ...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=2500531</comments>
            <pubDate>Thu, 16 Apr 2009 21:01:58 +0100</pubDate>
            <guid isPermaLink="false">2500531</guid>        </item>
        <item>
            <title>Multilevel Comparison of Dendrograms: A New Method with an Application for Genetic Classifications</title>
            <link>http://www.medworm.com/index.php?rid=2500532&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol8%2Fiss1%2Fart22</link>
            <description>Procedures are currently available for the evaluation of hierarchical classifications of produce tree dissimilarities or consensus dendrograms. Some tests of cluster validity operate by comparing all possible partitions from a tree with a reference partition. We propose an exhaustive search procedure to compare all partitions from one dendrogram with all partitions derived from the other to detect hierarchical levels at which the two dendrograms show maximum agreement. The method is illustrated using RAPD and microsatellite data in order to detect clones in reed populations. The utility of our approach is its ability to reveal extra information in different genetic data sets which would be hidden otherwise. The method is also useful in any field of science where hierarchical clustering is ...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=2500532</comments>
            <pubDate>Tue, 14 Apr 2009 17:54:19 +0100</pubDate>
            <guid isPermaLink="false">2500532</guid>        </item>
        <item>
            <title>Univariate Shrinkage in the Cox Model for High Dimensional Data</title>
            <link>http://www.medworm.com/index.php?rid=2500533&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol8%2Fiss1%2Fart21</link>
            <description>We propose a method for prediction in Cox's proportional model, when the number of features (regressors), p, exceeds the number of observations, n. The method assumes that the features are independent in each risk set, so that the partial likelihood factors into a product. As such, it is analogous to univariate thresholding in linear regression and nearest shrunken centroids in classification. We call the procedure Cox univariate shrinkage and demonstrate its usefulness on real and simulated data. The method has the attractive property of being essentially univariate in its operation: the features are entered into the model based on the size of their Cox score statistics. We illustrate the new method on real and simulated data, and compare it to other proposed methods for survival predicti...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=2500533</comments>
            <pubDate>Tue, 14 Apr 2009 17:54:16 +0100</pubDate>
            <guid isPermaLink="false">2500533</guid>        </item>
        <item>
            <title>Balanced Gradient Boosting from Imbalanced Data for Clinical Outcome Prediction</title>
            <link>http://www.medworm.com/index.php?rid=2320920&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol8%2Fiss1%2Fart20</link>
            <description>In clinical outcome prediction, such as disease diagnosis and prognosis, it is often assumed that the class, e.g., disease and control, is equally distributed. However, in practice we often encounter biological or clinical data whose class distribution is highly skewed. Since standard supervised learning algorithms intend to maximize the overall prediction accuracy, a prediction model tends to show a strong bias toward the majority class when it is trained on such imbalanced data. Therefore, the class distribution should be incorporated appropriately to learn from imbalanced data. To address this practically important problem, we proposed balanced gradient boosting (BalaBoost) which reformulates gradient boosting to avoid the overfitting to the majority class and is sensitive to the minori...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=2320920</comments>
            <pubDate>Wed, 08 Apr 2009 03:39:43 +0100</pubDate>
            <guid isPermaLink="false">2320920</guid>        </item>
        <item>
            <title>A Nonlinear Mixed-Effects Model for Estimating Calibration Intervals for Unknown Concentrations in Two-Color Microarray Data with Spike-Ins</title>
            <link>http://www.medworm.com/index.php?rid=2125888&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol8%2Fiss1%2Fart5</link>
            <description>In this study, we propose a calibration method for preprocessing spiked-in microarray experiments based on nonlinear mixed-effects models. This method uses a spike-in calibration curve to estimate normalized absolute expression values. Moreover, using the asymptotic properties of the calibration estimate, 100(1-α)% confidence intervals for the estimated expression values can be constructed. Simulations are used to show that the approximations on which the construction of the confidence intervals are based are sufficiently accurate to reach the desired coverage probabilities. We illustrate applicability of our method, by estimating the normalized absolute expression values together with the corresponding confidence intervals for two publicly available cDNA microarray experiments (Hilson et...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=2125888</comments>
            <pubDate>Wed, 21 Jan 2009 18:51:53 +0100</pubDate>
            <guid isPermaLink="false">2125888</guid>        </item>
        <item>
            <title>Dimension Reduction of Microarray Data in the Presence of a Censored Survival Response: A Simulation Study</title>
            <link>http://www.medworm.com/index.php?rid=2125889&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol8%2Fiss1%2Fart4</link>
            <description>An important aspect of microarray studies involves the prediction of patient survival based on their gene expression levels. To cope with the high dimensionality of the microarray gene expression data, it is customary to first reduce the dimension of the gene expression data via dimension reduction methods, and then use the Cox proportional hazards model to predict patient survival. In this paper, we propose a variant of Partial Least Squares, denoted as Rank-based Modified Partial Least Squares (RMPLS), that is insensitive to outlying values of both the response and the gene expressions. We assess the performance of RMPLS and several dimension reduction methods using a simulation model for gene expression data with a censored response. In particular, Principal Component Analysis (PCA), mo...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=2125889</comments>
            <pubDate>Wed, 21 Jan 2009 18:51:48 +0100</pubDate>
            <guid isPermaLink="false">2125889</guid>        </item>
        <item>
            <title>Sequential Analysis for Microarray Data Based on Sensitivity and Meta-Analysis</title>
            <link>http://www.medworm.com/index.php?rid=2125890&amp;cid=s_36498_50_f&amp;fid=36498&amp;url=http%3A%2F%2Fwww.bepress.com%2Fsagmb%2Fvol8%2Fiss1%2Fart3</link>
            <description>Motivation: Transcriptomic studies using microarray technology have become a standard tool in life sciences in the last decade. Nevertheless the cost of these experiments remains high and forces scientists to work with small sample sizes at the expense of statistical power. In many cases, little or no prior knowledge on the underlying variability is available, which would allow an accurate estimation of the number of samples (microarrays) required to answer a particular biological question of interest. We investigate sequential methods, also called group sequential or adaptive designs in the context of clinical trials, for microarray analysis. Through interim analyses at different stages of the experiment and application of a stopping rule a decision can be made as to whether more samples ...</description>
            <author>Statistical Applications in Genetics and Molecular Biology</author>
            <type>journals</type>
        <comments>http://www.medworm.com/rss/comments.php?id=2125890</comments>
            <pubDate>Wed, 21 Jan 2009 18:51:44 +0100</pubDate>
            <guid isPermaLink="false">2125890</guid>        </item>
    </channel>
</rss>

