Statistical Applications in Genetics and Molecular Biology
This is an RSS file. You can use it to subscribe to this data in your favourite RSS reader, such as GoogleReader, or to display this data on your own website or blog.
Subscribe to this data using MyMedWorm.
Subscribe to this data using GoogleReader.
Subscribe to this data using Bloglines.
Subscribe to this data using MyYahoo.
Get the very latest Swine Flu news via the MedWorm Swine Flu RSS news feed - updated hourly from thousands of authoritative health and news sources.
This page shows you the latest items in this publication.
87 records returned
A Unified Mixed Effects Model for Gene Set Analysis of Time Course Microarray Experiments
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
We describe simulation studies using gene expression data with "real life" correlations and we demonstrate the proposed random coefficient model using a mouse colon development time course dataset. The agreement between results of the proposed random coefficient model and the previous reports for this proof-of-concept trial further validates this methodology, which provides a unified statistical model for systems analysis of microarray experiments with complex experimental designs when re-sampling based methods are difficult to apply. (Source: Statistical Applications in Genetics and Molecular Biology)
Source: Statistical Applications in Genetics and Molecular Biology - November 8, 2009 Category: Genetics & Stem Cells Tags: Microarrays Source Type: journals
Statistical Screening Method for Genetic Factors Influencing Susceptibility to Common Diseases in a Two-Stage Genome-Wide Association Study
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
A genome-wide association study (GWAS) is a standard strategy for detecting disease susceptibility genes, despite unsettled controversies on many aspects, including optimal study design and statistical analysis. As for study design, a two-stage design has been applied to maximize cost-effectiveness. However, there has been little consensus on appropriate statistical analysis for two-stage design. Thereby perplexing the researchers as to which statistical measures should be applied at the first stage, and how to determine the significance level of the differences at the second stage. Here, using simulation studies, we compa...
Source: Statistical Applications in Genetics and Molecular Biology - November 4, 2009 Category: Genetics & Stem Cells Tags: Genetics Source Type: journals
A Regularized Regression Approach for Dissecting Genetic Conflicts that Increase Disease Risk in Pregnancy
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
Human diseases developed during pregnancy could be caused by the direct effects of both maternal and fetal genes, and/or by the indirect effects caused by genetic conflicts. Genetic conflicts exist when the effects of fetal genes are opposed by the effects of maternal genes, or when there is a conflict between the maternal and paternal genes within the fetal genome. The two types of genetic conflicts involve the functions of different genes in different genomes and are genetically distinct. Differentiating and further dissecting the two sets of genetic conflict effects that increase disease risk during pregnancy present st...
Source: Statistical Applications in Genetics and Molecular Biology - October 23, 2009 Category: Genetics & Stem Cells Tags: Disease Modeling Genetics Statistical Models Source Type: journals
Transmission Disequilibrium Test Power and Sample Size in the Presence of Locus Heterogeneity
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
Locus heterogeneity is one of the most important issues in gene mapping and can cause significant reductions in statistical power for gene mapping, yet no research to date has provided power and sample size calculations for family-based association methods in the presence of locus heterogeneity. The purpose of this research is three-fold: (i) to provide an analytic solution to the incorporation of locus heterogeneity into power and sample size calculations for the TDT statistic; (ii) to verify our analytic solution with simulations; and (iii) to study how different factors affect sample size requirement for the TDT in the ...
Source: Statistical Applications in Genetics and Molecular Biology - October 9, 2009 Category: Genetics & Stem Cells Tags: Computation Design of Experiments and Sample Surveys Genetics Statistical Theory and Methods Source Type: journals
Characterizing the D2 Statistic: Word Matches in Biological Sequences
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
Word matches are often used in sequence comparison methods, either as a measure of sequence similarity or in the first search steps of algorithms such as BLAST or BLAT. The D2 statistic is the number of matches of words of k letters between two sequences. Recent advances have been made in the characterization of this statistic and in the approximation of its distribution. Here, these results are extended to the case of approximate word matches.We compute the exact value of the variance of the D2 statistic for the case of a uniform letter distribution, and introduce a method to provide accurate approximations of the varianc...
Source: Statistical Applications in Genetics and Molecular Biology - October 8, 2009 Category: Genetics & Stem Cells Tags: Computation Computational Biology/Bioinformatics Statistical Theory and Methods Source Type: journals
MC-Normalization: A Novel Method for Dye-Normalization of Two-Channel Microarray Data
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
Pre-processing plays a vital role in two-color microarray data analysis. An analysis is characterized by its ability to identify differentially expressed genes (its sensitivity) and its ability to provide unbiased estimators of the true regulation (its bias). It has been shown that microarray experiments regularly underestimate the true regulation of differentially expressed genes. We introduce the MC-normalization, where C stands for channel-wise normalization, with considerably lower bias than the commonly used standard methods. The idea behind the MC-normalization is that the channels' individual intensities determine t...
Source: Statistical Applications in Genetics and Molecular Biology - October 2, 2009 Category: Genetics & Stem Cells Tags: Computational Biology/Bioinformatics Microarrays Statistical Models Source Type: journals
M-quantile Regression Analysis of Temporal Gene Expression Data
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
We present a new method to approach this problem. Firstly, the temporal profiles of the genes are modelled by a parametric M-quantile regression model. This model is particularly appealing to small-sample gene expression data, as it is very robust against outliers and it does not make any assumption on the error distribution. Secondly, we further increase the robustness of the method by summarising the M-quantile regression models for a large range of quantile values into an M-quantile coefficient. Finally, we fit a polynomial M-quantile regression model to the M-quantile coefficients over time and employ a Hotelling T2-te...
Source: Statistical Applications in Genetics and Molecular Biology - September 22, 2009 Category: Genetics & Stem Cells Tags: General Biostatistics Microarrays Statistical Models Source Type: journals
Modeling Dependence in Methylation Patterns with Application to Ovarian Carcinomas
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
Changes in cytosine methylation at CpG nucleotides are observed in many cancers and offer great potential for translational research. Diseases such as ovarian cancer that are especially challenging to diagnose and treat are of particular interest, and abnormal methylation in the tandem repeats Sat2 and NBL2 has been observed in a collection of ovarian carcinomas. In earlier analyses of double-stranded methylation patterns in 0.2 kb regions of Sat2 and NBL2, we detected clusters of identically methylated sites in close proximity. These clusters could not be explained by random variation, and our findings suggested a high de...
Source: Statistical Applications in Genetics and Molecular Biology - September 22, 2009 Category: Genetics & Stem Cells Tags: Statistical Models Source Type: journals
Calculating Asymptotic Significance Levels of the Constrained Likelihood Ratio Test with Application to Multivariate Genetic Linkage Analysis
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
The asymptotic distribution of the multivariate variance component linkage analysis likelihood ratio test has provoked some contradictory accounts in the literature. In this paper we confirm that some previous results are not correct by deriving the asymptotic distribution in one special case. It is shown that this special case is a good approximation to the distribution in many situations. We also introduce a new approach to simulating from the asymptotic distribution of the likelihood ratio test statistic in constrained testing problems. It is shown that this method is very efficient for small p-values, and is applicable...
Source: Statistical Applications in Genetics and Molecular Biology - September 18, 2009 Category: Genetics & Stem Cells Tags: Genetics Source Type: journals
A Statistical Model for Genetic Mapping of Viral Infection by Integrating Epidemiological Behavior
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
Large-scale studies of genetic variation may be helpful for understanding the genetic control mechanisms of viral infection and, ultimately, predicting and eliminating infectious disease outbreaks. We propose a new statistical model for detecting specific DNA sequence variants that are responsible for viral infection. This model considers additive, dominance and epistatic effects of haplotypes from three different genomes, recipient, transmitter and virus, through an epidemiological process. The model is constructed within the maximum likelihood framework and implemented with the EM algorithm. A number of hypothesis tests ...
Source: Statistical Applications in Genetics and Molecular Biology - September 9, 2009 Category: Genetics & Stem Cells Tags: Genetics Source Type: journals
Identifying Individuals in a Complex Mixture of DNA with Unknown Ancestry
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
A new test was recently developed that could use a high-density set of single nucleotide polymorphisms (SNPs) to determine whether a specific individual contributed to a mixture of DNA. The test statistic compared the genotype for the individual to the allele frequencies in the mixture and to the allele frequencies in a reference group. This test requires the ancestries of the reference group to be nearly identical to those of the contributors to the mixture. Here, we first quantify the bias, the increase in type I and type II error, when the ancestries are not well matched. Then, we show that the test can also be biased i...
Source: Statistical Applications in Genetics and Molecular Biology - September 9, 2009 Category: Genetics & Stem Cells Tags: Computational Biology/Bioinformatics Genetics Source Type: journals
Prediction of Motifs Based on a Repeated-Measures Model for Integrating Cross-Species Sequence and Expression Data
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
De novo identification of transcription factor binding sites (TFBS) is a challenging computational problem because TFBSs are relatively short sequences buried in long genomic regions. Earlier methods incorporated genome-wide expression data and promoter sequences into a linear-model framework, regressing expression on counts of putative TFBSs in promoters for a single species. More recently, it has been shown that examining sequence data across multiple species improves the prediction of TFBSs. In this work, we describe an extension of the single-species, linear-model framework for the analysis of paired cross-species sequ...
Source: Statistical Applications in Genetics and Molecular Biology - September 9, 2009 Category: Genetics & Stem Cells Tags: Computational Biology/Bioinformatics Source Type: journals
Ancestral Recombination Graphs under Non-Random Ascertainment, with Applications to Gene Mapping
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
Consider a sample of apparently unrelated individuals, for which marker genotype and phenotype data is available. When individuals are sampled on phenotypes, we propose an ascertained ancestral recombination graph (ARG) that models shared ancestry of the sample chromosomes given phenotype data along a region that possibly harbors a disease susceptibility gene. The ascertained ARG is used to define a gene mapping algorithm by means of a lod score and associated p-values based on permutation testing. Under certain modeling simplifications, the lod score and p-values can be computed exactly, without any Monte Carlo approximat...
Source: Statistical Applications in Genetics and Molecular Biology - September 9, 2009 Category: Genetics & Stem Cells Tags: General Biostatistics Genetics Statistical Models Statistical Theory and Methods Source Type: journals
Rotation Testing in Gene Set Enrichment Analysis for Small Direct Comparison Experiments
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
Gene Set Enrichment Analysis (GSEA) is a method for analysing gene expression data with a focus on a priori defined gene sets. The permutation test generally used in GSEA for testing the significance of gene set enrichment involves permutation of a phenotype vector and is developed for data from an indirect comparison design, i.e. unpaired data. In some studies the samples representing two phenotypes are paired, e.g. samples taken from a patient before and after treatment, or if samples representing two phenotypes are hybridised to the same two-channel array (direct comparison design). In this paper we will focus on data f...
Source: Statistical Applications in Genetics and Molecular Biology - July 27, 2009 Category: Genetics & Stem Cells Tags: Computational Biology/Bioinformatics Microarrays Statistical Theory and Methods Source Type: journals
A Multivariate Growth Curve Model for Ranking Genes in Replicated Time Course Microarray Data
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
Gene ranking problem in time course microarray experiments is challenging since gene expression levels between different time points are correlated. This is because, expression values at successive time points are usually taken from the same organism, tissue or culture. Moreover, time dependency of gene expression values is usually of interest and often is the biological problem that motivates the experiment. We propose a multivariate growth curve model for ranking genes and estimating mean gene expression profiles in replicated time course microarray data. The approach takes the within individual correlation as well as th...
Source: Statistical Applications in Genetics and Molecular Biology - July 1, 2009 Category: Genetics & Stem Cells Tags: General Biostatistics Genetics Longitudinal Data Analysis and Time Series Microarrays Multivariate Analysis Statistical Models Source Type: journals
Estimation of Selection Intensity under Overdominance by Bayesian Methods
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
A balanced pattern in the allele frequencies of polymorphic loci is a potential sign of selection, particularly of overdominance. Although this type of selection is of some interest in population genetics, there exists no likelihood based approaches specifically tailored to make inference on selection intensity. To fill this gap, we present Bayesian methods to estimate selection intensity under k-allele models with overdominance. Our model allows for an arbitrary number of loci and alleles within a locus. The neutral and selected variability within each locus are modeled with corresponding k-allele models. To estimate the ...
Source: Statistical Applications in Genetics and Molecular Biology - July 1, 2009 Category: Genetics & Stem Cells Tags: General Biostatistics Genetics Statistical Models Statistical Theory and Methods Source Type: journals
Model Selection Based on FDR-Thresholding Optimizing the Area under the ROC-Curve
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
We evaluate variable selection by multiple tests controlling the false discovery rate (FDR) to build a linear score for prediction of clinical outcome in high-dimensional data. Quality of prediction is assessed by the receiver operating characteristic curve (ROC) for prediction in independent patients. Thus we try to combine both goals: prediction and controlled structure estimation. We show that the FDR-threshold which provides the ROC-curve with the largest area under the curve (AUC) varies largely over the different parameter constellations not known in advance. Hence, we investigated a new cross validation procedure ba...
Source: Statistical Applications in Genetics and Molecular Biology - June 25, 2009 Category: Genetics & Stem Cells Tags: General Biostatistics Source Type: journals
Adaptive Transmission Disequilibrium Test for Family Trio Design
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
The transmission disequilibrium test (TDT) is a standard method to detect association using family trio design. It is optimal for an additive genetic model. Other TDT-type tests optimal for recessive and dominant models have also been developed. Association tests using family data, including the TDT-type statistics, have been unified to a class of more comprehensive and flexable family-based association tests (FBAT). TDT-type tests have high efficiency when the genetic model is known or correctly specified, but may lose power if the model is mis-specified. Hence tests that are robust to genetic model mis-specification yet ...
Source: Statistical Applications in Genetics and Molecular Biology - June 23, 2009 Category: Genetics & Stem Cells Tags: Genetics Source Type: journals
A Non-Homogeneous Hidden-State Model on First Order Differences for Automatic Detection of Nucleosome Positions
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
The ability to map individual nucleosomes accurately across genomes enables the study of relationships between dynamic changes in nucleosome positioning/occupancy and gene regulation. However, the highly heterogeneous nature of nucleosome densities across genomes and short linker regions pose challenges in mapping nucleosome positions based on high-throughput microarray data of micrococcal nuclease (MNase) digested DNA. Previous works rely on additional detrending and careful visual examination to detect low-signal nucleosomes, which may exist in a subpopulation of cells. We propose a non-homogeneous hidden-state model bas...
Source: Statistical Applications in Genetics and Molecular Biology - June 19, 2009 Category: Genetics & Stem Cells Tags: Computational Biology/Bioinformatics Source Type: journals
Extensions of Sparse Canonical Correlation Analysis with Applications to Genomic Data
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
In recent work, several authors have introduced methods for sparse canonical correlation analysis (sparse CCA). Suppose that two sets of measurements are available on the same set of observations. Sparse CCA is a method for identifying sparse linear combinations of the two sets of variables that are highly correlated with each other. It has been shown to be useful in the analysis of high-dimensional genomic data, when two sets of assays are available on the same set of samples. In this paper, we propose two extensions to the sparse CCA methodology. (1) Sparse CCA is an unsupervised method; that is, it does not make use of ...
Source: Statistical Applications in Genetics and Molecular Biology - June 9, 2009 Category: Genetics & Stem Cells Tags: Computational Biology/Bioinformatics General Biostatistics Genetics Laboratory and Basic Science Research Microarrays Multivariate Analysis Statistical Models Statistical Theory and Methods Source Type: journals
Bayesian Unsupervised Learning with Multiple Data Types
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
We report a genetic signature for the basal-like subtype of breast cancer found across a number of previous gene expression array studies. Using the two algorithmic approaches we find that this signature also arises from clustering on the microRNA expression data and appears derivative from this data. (Source: Statistical Applications in Genetics and Molecular Biology)
Source: Statistical Applications in Genetics and Molecular Biology - June 5, 2009 Category: Genetics & Stem Cells Tags: Computation Computational Biology/Bioinformatics Microarrays Statistical Models Statistical Theory and Methods Survival Analysis Source Type: journals
A Parametric Model for Analyzing Anticipation in Genetically Predisposed Families
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
Anticipation, i.e. a decreasing age-at-onset in subsequent generations has been observed in a number of genetically triggered diseases. The impact of anticipation is generally studied in affected parent-child pairs. These analyses are restricted to pairs in which both individuals have been affected and are sensitive to right truncation of the data. We propose a normal random effects model that allows for right-censored observations and includes covariates, and draw statistical inference based on the likelihood function. We applied the model to the hereditary nonpolyposis colorectal cancer (HNPCC)/Lynch syndrome family coho...
Source: Statistical Applications in Genetics and Molecular Biology - June 2, 2009 Category: Genetics & Stem Cells Tags: Genetics Source Type: journals
Increase of Rejection Rate in Case-Control Studies with the Differential Genotyping Error Rates
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
This study extends previous work by examining this issue analytically using the non-centrality parameter of the asymptotic distribution of the chi-squared test and linear trend test (LTT) when there is no difference between case and control genotype frequencies, but there is differential misclassification with SNP data. The parameters examined are the minor allele frequency (MAF) and sample size. When MAF is less than 0.2, differential genotyping errors lead to a rejection rate much larger than the nominal significance level. As the MAF decreases to zero, the increase in the rejection rate becomes larger. The errors that m...
Source: Statistical Applications in Genetics and Molecular Biology - May 7, 2009 Category: Genetics & Stem Cells Tags: Categorical Data Analysis Design of Experiments and Sample Surveys Genetics Source Type: journals
Incorporating Duplicate Genotype Data into Linear Trend Tests of Genetic Association: Methods and Cost-Effectiveness
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
The genome-wide association (GWA) study is an increasingly popular way to attempt to identify the causal variants in human disease. Duplicate genotyping (or re-genotyping) a portion of the samples in a GWA study is common, though it is typical for these data to be ignored in subsequent tests of genetic association. We demonstrate a method for including duplicate genotype data in linear trend tests of genetic association which yields increased power. We also consider the cost-effectiveness of collecting duplicate genotype data and find that when the relative cost of genotyping to phenotyping and sample acquisition costs is ...
Source: Statistical Applications in Genetics and Molecular Biology - May 5, 2009 Category: Genetics & Stem Cells Tags: Genetics Source Type: journals
Weighted Multiple Hypothesis Testing Procedures
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
Multiple hypothesis testing is commonly used in genome research such as genome-wide studies and gene expression data analysis (Lin, 2005). The widely used Bonferroni procedure controls the family-wise error rate (FWER) for multiple hypothesis testing, but has limited statistical power as the number of hypotheses tested increases. The power of multiple testing procedures can be increased by using weighted p-values (Genovese et al., 2006). The weights for the p-values can be estimated by using certain prior information. Wasserman and Roeder (2006) described a weighted Bonferroni procedure, which incorporates weighted p-value...
Source: Statistical Applications in Genetics and Molecular Biology - April 16, 2009 Category: Genetics & Stem Cells Tags: General Biostatistics Statistical Theory and Methods Source Type: journals
Multilevel Comparison of Dendrograms: A New Method with an Application for Genetic Classifications
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
Procedures are currently available for the evaluation of hierarchical classifications of produce tree dissimilarities or consensus dendrograms. Some tests of cluster validity operate by comparing all possible partitions from a tree with a reference partition. We propose an exhaustive search procedure to compare all partitions from one dendrogram with all partitions derived from the other to detect hierarchical levels at which the two dendrograms show maximum agreement. The method is illustrated using RAPD and microsatellite data in order to detect clones in reed populations. The utility of our approach is its ability to re...
Source: Statistical Applications in Genetics and Molecular Biology - April 14, 2009 Category: Genetics & Stem Cells Tags: Multivariate Analysis Source Type: journals
Univariate Shrinkage in the Cox Model for High Dimensional Data
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
We propose a method for prediction in Cox's proportional model, when the number of features (regressors), p, exceeds the number of observations, n. The method assumes that the features are independent in each risk set, so that the partial likelihood factors into a product. As such, it is analogous to univariate thresholding in linear regression and nearest shrunken centroids in classification. We call the procedure Cox univariate shrinkage and demonstrate its usefulness on real and simulated data. The method has the attractive property of being essentially univariate in its operation: the features are entered into the mode...
Source: Statistical Applications in Genetics and Molecular Biology - April 14, 2009 Category: Genetics & Stem Cells Tags: Survival Analysis Source Type: journals
Balanced Gradient Boosting from Imbalanced Data for Clinical Outcome Prediction
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
In clinical outcome prediction, such as disease diagnosis and prognosis, it is often assumed that the class, e.g., disease and control, is equally distributed. However, in practice we often encounter biological or clinical data whose class distribution is highly skewed. Since standard supervised learning algorithms intend to maximize the overall prediction accuracy, a prediction model tends to show a strong bias toward the majority class when it is trained on such imbalanced data. Therefore, the class distribution should be incorporated appropriately to learn from imbalanced data. To address this practically important prob...
Source: Statistical Applications in Genetics and Molecular Biology - April 8, 2009 Category: Genetics & Stem Cells Tags: Computational Biology/Bioinformatics Microarrays Statistical Models Statistical Theory and Methods Source Type: journals
A Nonlinear Mixed-Effects Model for Estimating Calibration Intervals for Unknown Concentrations in Two-Color Microarray Data with Spike-Ins
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
In this study, we propose a calibration method for preprocessing spiked-in microarray experiments based on nonlinear mixed-effects models. This method uses a spike-in calibration curve to estimate normalized absolute expression values. Moreover, using the asymptotic properties of the calibration estimate, 100(1-α)% confidence intervals for the estimated expression values can be constructed. Simulations are used to show that the approximations on which the construction of the confidence intervals are based are sufficiently accurate to reach the desired coverage probabilities. We illustrate applicability of our method, by e...
Source: Statistical Applications in Genetics and Molecular Biology - January 21, 2009 Category: Genetics & Stem Cells Tags: Computational Biology/Bioinformatics Source Type: journals
Dimension Reduction of Microarray Data in the Presence of a Censored Survival Response: A Simulation Study
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
An important aspect of microarray studies involves the prediction of patient survival based on their gene expression levels. To cope with the high dimensionality of the microarray gene expression data, it is customary to first reduce the dimension of the gene expression data via dimension reduction methods, and then use the Cox proportional hazards model to predict patient survival. In this paper, we propose a variant of Partial Least Squares, denoted as Rank-based Modified Partial Least Squares (RMPLS), that is insensitive to outlying values of both the response and the gene expressions. We assess the performance of RMPLS...
Source: Statistical Applications in Genetics and Molecular Biology - January 21, 2009 Category: Genetics & Stem Cells Tags: Microarrays Statistical Theory and Methods Survival Analysis Source Type: journals
Sequential Analysis for Microarray Data Based on Sensitivity and Meta-Analysis
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
Motivation: Transcriptomic studies using microarray technology have become a standard tool in life sciences in the last decade. Nevertheless the cost of these experiments remains high and forces scientists to work with small sample sizes at the expense of statistical power. In many cases, little or no prior knowledge on the underlying variability is available, which would allow an accurate estimation of the number of samples (microarrays) required to answer a particular biological question of interest. We investigate sequential methods, also called group sequential or adaptive designs in the context of clinical trials, for...
Source: Statistical Applications in Genetics and Molecular Biology - January 21, 2009 Category: Genetics & Stem Cells Tags: Microarrays Source Type: journals
Orthology-Based Multilevel Modeling of Differentially Expressed Mouse and Human Gene Pairs
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
There is great interest in finding human genes expressed through pharmaceutical intervention, thus opening a genomic window into benefit and side-effect profiles of a drug. Human insight gained from FDA-required animal experiments has historically been limited, but in the case of gene expression measurements, proposed biological orthologies between mouse and human genes provide a foothold for animal-to-human extrapolation. We have investigated a five-component, multilevel, bivariate normal mixture model that incorporates mouse, as well as human, gene expression data. The goal is two-fold: to increase human differential gen...
Source: Statistical Applications in Genetics and Molecular Biology - January 13, 2009 Category: Genetics & Stem Cells Tags: Computational Biology/Bioinformatics Genetics Microarrays Statistical Models Source Type: journals
Sparse Canonical Correlation Analysis with Application to Genomic Data Integration
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
Large scale genomic studies with multiple phenotypic or genotypic measures may require the identification of complex multivariate relationships. In multivariate analysis a common way to inspect the relationship between two sets of variables based on their correlation is canonical correlation analysis, which determines linear combinations of all variables of each type with maximal correlation between the two linear combinations. However, in high dimensional data analysis, when the number of variables under consideration exceeds tens of thousands, linear combinations of the entire sets of features may lack biological plausib...
Source: Statistical Applications in Genetics and Molecular Biology - January 6, 2009 Category: Genetics & Stem Cells Tags: General Biostatistics Microarrays Multivariate Analysis Statistical Models Statistical Theory and Methods Source Type: journals
Breast Cancer Diagnosis from Proteomic Mass Spectrometry Data: A Comparative Evaluation
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
The performance results of a wide range of different classifiers applied to proteomic mass spectra data, in a blind comparative assessment organised by Bart Mertens, are reviewed. The different approaches are summarised, issues of how to evaluate and compare the predictions are described, and the results of the different methods are examined. Although the different methods perform differently, their rank ordering varies according to how one measures performance, so that one cannot draw unequivocal conclusions about which is 'best.' Instead, it is clear that what matters is not the method by itself, but the interaction of m...
Source: Statistical Applications in Genetics and Molecular Biology - December 23, 2008 Category: Genetics & Stem Cells Tags: Computational Biology/Bioinformatics General Biostatistics Source Type: journals
Supervised Distance Matrices
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
We present consistent estimators of the resulting distance matrix, including an inverse probability of censoring weighted estimator for use with right-censored outcomes. Supervised distance matrices can be used with standard (unsupervised) clustering algorithms to identify groups of similarly predictive variables and to discover subpopulations of related samples. This approach is illustrated using simulations and an analysis of gene expression data with a censored survival outcome. The proposed methods are widely applicable in genomics and other fields where high-dimensional data is collected on each subject. (Source: Stat...
Source: Statistical Applications in Genetics and Molecular Biology - November 11, 2008 Category: Genetics & Stem Cells Tags: Computational Biology/Bioinformatics Statistical Theory and Methods Survival Analysis Source Type: journals
A Unification of Multivariate Methods for Meta-Analysis of Genetic Association Studies
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
Methods for multivariate meta-analysis of genetic association studies are reviewed, summarized and presented in a unified framework. Modifications of standard models are described in detail in order to be applied in genetic association studies. The model based on summary data is uniformly defined for both discrete and continuous outcomes and analytical expressions for the covariance of the two jointly modeled outcomes are derived for both cases. The models based on the binary nature of the data are fitted using both prospective and retrospective likelihood. Furthermore, formal tests for assessing the genetic model of inher...
Source: Statistical Applications in Genetics and Molecular Biology - October 24, 2008 Category: Genetics & Stem Cells Tags: Clinical Epidemiology Genetics Multivariate Analysis Statistical Models Source Type: journals
Pattern Classification of Phylogeny Signals
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
In this paper we propose the minimum entropy clustering (MEC) method for clustering genes based on their phylogenetic signals. This entropy based method will cluster two genes together when their concatenation can decrease the entropy. An integral feature of MEC is that it chooses the number of clusters automatically, which is a major advantage over the other methods. Our simulation results show that this method is quite successful in clustering genes with a common phylogeny. (Source: Statistical Applications in Genetics and Molecular Biology)
Source: Statistical Applications in Genetics and Molecular Biology - October 17, 2008 Category: Genetics & Stem Cells Tags: Computational Biology/Bioinformatics Statistical Theory and Methods Source Type: journals
Reducing Spatial Flaws in Oligonucleotide Arrays by Using Neighborhood Information
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
We present two similar procedures, of which one is intended solely for use with replicates and the other has wider applicability. By constructing a set of replicates, with one realistically flawed, we are able to examine the extent to which our procedures are capable of repairing the flaw. We find that, for this purpose, our procedures are superior to the existing `Harshlight' procedure. (Source: Statistical Applications in Genetics and Molecular Biology)
Source: Statistical Applications in Genetics and Molecular Biology - October 17, 2008 Category: Genetics & Stem Cells Tags: Microarrays Source Type: journals
Statistical Methods in Integrative Analysis for Gene Regulatory Modules
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
We propose a suite of statistical methods for inferring a cis-regulatory module, which is a combination of several transcription factors binding in the promoter regions to regulate gene expression. The approach is an integrative analysis that combines information from multiple types of biological data, including genomic DNA sequences, genome-wide location analysis (ChIP-chip experiments), and gene expression microarray. More specifically, we use a hidden Markov model to first predict a cluster of transcription factor binding sites in DNA sequences. The predictions are refined by regression analysis on gene expression micro...
Source: Statistical Applications in Genetics and Molecular Biology - October 10, 2008 Category: Genetics & Stem Cells Tags: Computational Biology/Bioinformatics Source Type: journals
Assessing the Validity Domains of Graphical Gaussian Models in Order
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
In this study we focused on statistical methods recently published. All are based on the fact that the number of direct relationships between two variables is very small in regards to the number of possible relationships, p(p-1)/2. In the biological context, this assumption is not always satisfied over the whole graph. It is essential to precisely know the behavior of the methods in regards to the characteristics of the studied object before applying them. For this purpose, we evaluated the validity domain of each method from wide-ranging simulated datasets. We then illustrated our results using recently published biologic...
Source: Statistical Applications in Genetics and Molecular Biology - September 11, 2008 Category: Genetics & Stem Cells Tags: Statistical Models Source Type: journals
A Composite-Conditional-Likelihood Approach for Gene Mapping Based on Linkage Disequilibrium in Windows of Marker Loci
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
A composite-conditional-likelihood (CCL) approach is proposed to map the position of a trait-influencing mutation (TIM) using the ancestral recombination graph (ARG) and importance sampling to reconstruct the genealogy of DNA sequences with respect to windows of marker loci and predict the linkage disequilibrium pattern observed in a sample of cases and controls. The method is designed to fine-map the location of a disease mutation, not as an association study. The CCL function proposed for the position of the TIM is a weighted product of conditional likelihood functions for windows of a given number of marker loci that en...
Source: Statistical Applications in Genetics and Molecular Biology - August 30, 2008 Category: Stem Cells Tags: Genetics Source Type: journals
Approximately Sufficient Statistics and Bayesian Computation
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
The analysis of high-dimensional data sets is often forced to rely upon well-chosen summary statistics. A systematic approach to choosing such statistics, which is based upon a sound theoretical framework, is currently lacking. In this paper we develop a sequential scheme for scoring statistics according to whether their inclusion in the analysis will substantially improve the quality of inference. Our method can be applied to high-dimensional data sets for which exact likelihood equations are not possible. We illustrate the potential of our approach with a series of examples drawn from genetics. In summary, in a context i...
Source: Statistical Applications in Genetics and Molecular Biology - August 30, 2008 Category: Stem Cells Tags: Computational Biology/Bioinformatics Statistical Models Source Type: journals
Data Distribution of Short Oligonucleotide Expression Arrays and Its Application to the Construction of a Generalized Intellectual Framework
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
This article proposes a parametric framework that can handle a wide range of experimental data obtained by GeneChip expression arrays. The framework is based on a parsimonious model, which has been developed according to thermodynamic estimations of the process of hybridization. Using the model, probe data were normalized and summarized into gene expression levels. Verification of the appropriateness of the model is demonstrated statistically by the use of real data obtained from several project series. Furthermore, improved stabilities in changes in expression are observed in comparison with other currently used methods. ...
Source: Statistical Applications in Genetics and Molecular Biology - August 21, 2008 Category: Stem Cells Tags: Computational Biology/Bioinformatics Laboratory and Basic Science Research Microarrays Statistical Theory and Methods Source Type: journals
Estimating Number of Clusters Based on a General Similarity Matrix with Application to Microarray Data
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
Many clustering methods require that the number of clusters believed present in a given data set be specified a priori, and a number of methods for estimating the number of clusters have been developed. However, the selection of the number of clusters is well recognized as a difficult and open problem and there is a need for methods which can shed light on specific aspects of the data. This paper adopts a model for clustering based on a specific structure for a similarity matrix. Publicly available gene expression data sets are analyzed to illustrate the method and the performance of our method is assessed by simulation. (...
Source: Statistical Applications in Genetics and Molecular Biology - August 3, 2008 Category: Stem Cells Tags: Computational Biology/Bioinformatics Microarrays Multivariate Analysis Source Type: journals
Predicting Protein Concentrations with ELISA Microarray Assays, Monotonic Splines and Monte Carlo Simulation
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
We present a method using monotonic spline statistical models (MS), penalized constrained least squares fitting (PCLS) and Monte Carlo simulation (MC) to predict ELISA microarray protein concentrations and estimate their prediction errors. We contrast the MSMC (monotone spline Monte Carlo) method with a LNLS (logistic nonlinear least squares) method using simulated and real ELISA microarray data sets.MSMC rendered good fits in almost all tests, including those with left and/or right clipped standard curves. MS predictions were nominally more accurate; especially at the extremes of the prediction curve. MC provided credible...
Source: Statistical Applications in Genetics and Molecular Biology - July 14, 2008 Category: Stem Cells Tags: Computational Biology/Bioinformatics Microarrays Statistical Models Source Type: journals
Modeling DNA Methylation in a Population of Cancer Cells
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
Little is known about how human cancers grow because direct observations are impractical. Cancers are clonal populations and the billions of cancer cells present in a visible tumor are progeny of a single transformed cell. Therefore, human cancers can be represented by somatic cell ancestral trees that start from a single transformed cell and end with billions of present day cancer cells. We use a genealogical approach to infer tumor growth from somatic trees, employing haplotype DNA methylation pattern variation, or differences between specific CpG sites or "tags," in the cancer genome. DNA methylation is an epigenetic ma...
Source: Statistical Applications in Genetics and Molecular Biology - June 22, 2008 Category: Stem Cells Tags: Computational Biology/Bioinformatics Genetics Source Type: journals
Detecting Two-Locus Gene-Gene Effects Using Monotonisation of the Penetrance Matrix
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
As more genetic loci are genotyped simultaneously and as the interest in effects of combinations of loci increases, the need for more powerful analysis methods is increased. In the present paper we present a method aimed at increasing the power of likelihood ratio tests for case-control studies investigating possible two-locus effects. The method is based on the notion that the expected effect pattern of one locus, as well as the expected pattern of a penetrance matrix representing the effect of two loci, is a monotone one. By using an algorithm for making the estimated penetrance matrix monotone, the alternative hypothesi...
Source: Statistical Applications in Genetics and Molecular Biology - June 10, 2008 Category: Stem Cells Tags: Disease Modeling Genetics Statistical Models Statistical Theory and Methods Source Type: journals
A SNP Streak Model for the Identification of Genetic Regions Identical-by-descent
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
The availability of very dense genetic maps is changing in a fundamental way the methods used to identify the genetic basis of both rare and common inherited traits. The ability to directly compare the genomes of two related individuals and quickly identify those regions that are inherited identical-by-descent (IBD) from a recent common ancestor would be of utility in a wide range of genetic mapping methods. Here, we describe a simple method for using dense SNP maps to identify regions of the genome likely to be inherited IBD by family members. This method is based on identifying obligate recombination events and examining...
Source: Statistical Applications in Genetics and Molecular Biology - May 10, 2008 Category: Stem Cells Tags: Computational Biology/Bioinformatics Genetics Microarrays Source Type: journals
Semi-Parametric Differential Expression Analysis via Partial Mixture Estimation
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
We develop an approach for microarray differential expression analysis, i.e. identifying genes whose expression levels differ between two or more groups. Current approaches to inference rely either on full parametric assumptions or on permutation-based techniques for sampling under the null distribution. In some situations, however, a full parametric model cannot be justified, or the sample size per group is too small for permutation methods to be valid.We propose a semi-parametric framework based on partial mixture estimation which only requires a parametric assumption for the null (equally expressed) distribution and can...
Source: Statistical Applications in Genetics and Molecular Biology - April 29, 2008 Category: Stem Cells Tags: Computational Biology/Bioinformatics Source Type: journals
Re-Cracking the Nucleosome Positioning Code
Email this article to a colleague.
Save this article to My Clippings.
Discuss or comment on this article.
Nucleosomes, the fundamental repeating subunits of all eukaryotic chromatin, are responsible for packaging DNA into chromosomes inside the cell nucleus and controlling gene expression. While it has been well established that nucleosomes exhibit higher affinity for select DNA sequences, until recently it was unclear whether such preferences exerted a significant, genome-wide effect on nucleosome positioning in vivo. This question was seemingly and recently resolved in the affirmative: a wide-ranging series of experimental and computational analyses provided extensive evidence that the instructions for wrapping DNA around nu...
Source: Statistical Applications in Genetics and Molecular Biology - April 21, 2008 Category: Stem Cells Tags: Computational Biology/Bioinformatics General Biostatistics Multivariate Analysis Source Type: journals
