Medicine RSS Search Engine

Bioinformatics Research Bioinformatics OPML fileThis is an OPML file. It can be used to export all the MedWorm RSS feeds on this topic into your personal RSS reader (usually you have to save this file to your own computer before clicking on an Import OPML command in your own feed reader to upload the file which will then import all the feeds) or it can be used by webmasters to integrate MedWorm feeds with their own website. Bioinformatics Research RSS feedThis is an RSS file. You can use it to subscribe to this data in your favourite RSS reader, such as GoogleReader, or to display this data on your own website or blog. subscribe with MyMedWormSubscribe to this data using MyMedWorm.subscribe with GoogleReaderSubscribe to this data using GoogleReader.subscribe with BloglinesSubscribe to this data using Bloglines.subscribe with MyYahooSubscribe to this data using MyYahoo.

This page shows you the most recent publications within this specialty of the MedWorm directory. This is page number 7.

PrePrint: High-Throughput Compression of FASTQ Data with SeqDB
Compression has become a critical step in storing Next-Generation Sequencing data sets because of both the increasing size and decreasing costs of such data. Recent research into efficiently compressing sequence data has focused largely on improving compression ratios. Yet, the throughputs of current methods now lag far behind the I/O bandwidths of modern storage systems. As biologists move their analyses to high-performance systems with greater I/O bandwidth, low-throughput compression becomes a limiting factor. To address this gap, we present a new storage model called SeqDB, which offers high-throughput compression of s...
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - December 12, 2012 Category: Bioinformatics Source Type: research

PrePrint: Extending the Algebraic Formalism for Genome Rearrangements to Include Linear Chromosomes
We present linear time algorithms to compute it and to sort genomes. We show how to compute the rearrangement distance from the adjacency graph, for an easier comparison with other rearrangement distances. A thorough discussion on the relationship between the chromosomal and adjacency representation is also given, and we show how all classic rearrangement operations can be modeled using the algebraic theory.
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - December 12, 2012 Category: Bioinformatics Source Type: research

Communication between the active site and the allosteric site in class A beta-lactamases.
Abstract Bacterial production of beta-lactamases, which hydrolyze beta-lactam type antibiotics, is a common antibiotic resistance mechanism. Antibiotic resistance is a high priority intervention area and one strategy to overcome resistance is to administer antibiotics with beta-lactamase inhibitors in the treatment of infectious diseases. Unfortunately, beta-lactamases are evolving at a rapid pace with new inhibitor resistant mutants emerging every day, driving the design and development of novel beta-lactamase inhibitors. Here, we examined the inhibitor recognition mechanism of two common beta-lactamases using mol...
Source: Computational Biology and Chemistry - December 12, 2012 Category: Bioinformatics Authors: Meneksedag D, Dogan A, Kanlikilicer P, Ozkirimli E Tags: Comput Biol Chem Source Type: research

Multivariate methods and software for association mapping in dose-response genome-wide association studies
Conclusion: Overall, MANOVA was found to be the most powerful method for detecting real signals, and was also the most robust method for detection using alternatives generated with the previous simulation study. This method is also attractive because test statistics follow their expected distributions under the null hypothesis for both simulated and real data. The success of this method inspired the creation of the software program MAGWAS. MAGWAS is a computationally efficient, user-friendly, open source software tool that works on most platforms and performs GWASs for individuals having multivariate responses using standard file formats.
Source: BioData Mining - December 12, 2012 Category: Bioinformatics Authors: Chad BrownTammy HavenerMarisa MedinaRonald KraussHoward McLeodAlison Motsinger-Reif Source Type: research

Fast detection of de novo copy number variants from SNP arrays for case-parent trios
Conclusions: Our results indicate that batch effects and genomic waves are important considerations forcase-parent studies of de novo CNV, and that the minimum distance is an effective statistic forreducing technical variation contributing to false de novo discoveries. Coupled with segmentationand maximum a posteriori estimation, our algorithm compares favorably to the joint HMM withMinimumDistance being much faster.
Source: BMC Bioinformatics - Latest articles - December 12, 2012 Category: Bioinformatics Authors: Robert ScharpfTerri BeatyHolger SchwenderSamuel YounkinAlan ScottIngo Ruczinski Source Type: research

Biostatistics - Reference of manuscripts submitted mid-2011 to mid-2012
Source: Biostatistics - December 12, 2012 Category: Bioinformatics Tags: Index Source Type: research

Matrix variate logistic regression model with application to EEG data
Logistic regression has been widely applied in the field of biomedical research for a long time. In some applications, the covariates of interest have a natural structure, such as that of a matrix, at the time of collection. The rows and columns of the covariate matrix then have certain physical meanings, and they must contain useful information regarding the response. If we simply stack the covariate matrix as a vector and fit a conventional logistic regression model, relevant information can be lost, and the problem of inefficiency will arise. Motivated from these reasons, we propose in this paper the matrix variate logi...
Source: Biostatistics - December 12, 2012 Category: Bioinformatics Authors: Hung, H., Wang, C.-C. Tags: Articles Source Type: research

Efficient estimation of the attributable fraction when there are monotonicity constraints and interactions
The PAF for an exposure is the fraction of disease cases in a population that can be attributed to that exposure. One method of estimating the PAF involves estimating the probability of having the disease given the exposure and confounding variables. In many settings, the exposure will interact with the confounders and the confounders will interact with each other. Also, in many settings, the probability of having the disease is thought, based on subject matter knowledge, to be a monotone increasing function of the exposure and possibly of some of the confounders. We develop an efficient approach for estimating logistic re...
Source: Biostatistics - December 12, 2012 Category: Bioinformatics Authors: Traskin, M., Wang, W., Ten Have, T. R., Small, D. S. Tags: Articles Source Type: research

ROC curve estimation under test-result-dependent sampling
The receiver operating characteristic (ROC) curve is often used to evaluate the performance of a biomarker measured on continuous scale to predict the disease status or a clinical condition. Motivated by the need for novel study designs with better estimation efficiency and reduced study cost, we consider a biased sampling scheme that consists of a SRC and a supplemental TDC. Using this approach, investigators can oversample or undersample subjects falling into certain regions of the biomarker measure, yielding improved precision for the estimation of the ROC curve with a fixed sample size. Test-result-dependent sampling w...
Source: Biostatistics - December 12, 2012 Category: Bioinformatics Authors: Wang, X., Ma, J., George, S. L. Tags: Articles Source Type: research

Testing multiple variance components in linear mixed-effects models
Testing zero variance components is one of the most challenging problems in the context of linear mixed-effects (LME) models. The usual asymptotic chi-square distribution of the likelihood ratio and score statistics under this null hypothesis is incorrect because the null is on the boundary of the parameter space. During the last two decades many tests have been proposed to overcome this difficulty, but these tests cannot be easily applied for testing multiple variance components, especially for testing a subset of them. We instead introduce a simple test statistic based on the variance least square estimator of variance c...
Source: Biostatistics - December 12, 2012 Category: Bioinformatics Authors: Drikvandi, R., Verbeke, G., Khodadadi, A., Partovi Nia, V. Tags: Articles Source Type: research

Signal identification for rare and weak features: higher criticism or false discovery rates?
Signal identification in large-dimensional settings is a challenging problem in biostatistics. Recently, the method of higher criticism (HC) was shown to be an effective means for determining appropriate decision thresholds. Here, we study HC from a false discovery rate (FDR) perspective. We show that the HC threshold may be viewed as an approximation to a natural class boundary (CB) in two-class discriminant analysis which in turn is expressible as the FDR threshold. We demonstrate that in a rare–weak setting in the region of the phase space where signal identification is possible, both thresholds are practicably in...
Source: Biostatistics - December 12, 2012 Category: Bioinformatics Authors: Klaus, B., Strimmer, K. Tags: Articles Source Type: research

Bayesian analysis of RNA sequencing data by estimating multiple shrinkage priors
We present a novel, generic approach to model and analyze such data. Our approach aims at large flexibility of the likelihood (count) model and the regression model alike. Hence, a variety of count models is supported, such as the popular NB model, which accounts for overdispersion. In addition, complex, non-balanced designs and random effects are accommodated. Like some other methods, our method provides shrinkage of dispersion-related parameters. However, we extend it by enabling joint shrinkage of parameters, including those for which inference is desired. We argue that this is essential for Bayesian multiplicity correc...
Source: Biostatistics - December 12, 2012 Category: Bioinformatics Authors: Van De Wiel, M. A., Leday, G. G. R., Pardo, L., Rue, H., Van Der Vaart, A. W., Van Wieringen, W. N. Tags: Articles Source Type: research

Bayesian partitioning for mapping disease risk using a matched case-control approach to confounding
Disease maps are useful for exploring geographical heterogeneity in health outcomes. Typically interest lies in unearthing atypical regions after adjusting for known confounders. This paper presents a Bayesian partitioning approach for analyses when individual-level matching has been used to control confounding. The model makes few assumptions about the surface form and, in particular, permits discontinuity. The specification is inherently parsimonious and posterior sampling permits direct assessment of surface uncertainty; additional unmatched covariates can also be incorporated. The method is used to investigate spatial ...
Source: Biostatistics - December 12, 2012 Category: Bioinformatics Authors: Costain, D. A. Tags: Articles Source Type: research

Classification of patients from time-course gene expression
Classifying patients into different risk groups based on their genomic measurements can help clinicians design appropriate clinical treatment plans. To produce such a classification, gene expression data were collected on a cohort of burn patients, who were monitored across multiple time points. This led us to develop a new classification method using time-course gene expressions. Our results showed that making good use of time-course information of gene expression improved the performance of classification compared with using gene expression from individual time points only. Our method is implemented into an R-package: ti...
Source: Biostatistics - December 12, 2012 Category: Bioinformatics Authors: Zhang, Y., Tibshirani, R., Davis, R. Tags: Articles Source Type: research

Sequential stopping for high-throughput experiments
In high-throughput experiments, the sample size is typically chosen informally. Most formal sample-size calculations depend critically on prior knowledge. We propose a sequential strategy that, by updating knowledge when new data are available, depends less critically on prior assumptions. Experiments are stopped or continued based on the potential benefits in obtaining additional data. The underlying decision-theoretic framework guarantees the design to proceed in a coherent fashion. We propose intuitively appealing, easy-to-implement utility functions. As in most sequential design problems, an exact solution is prohibiti...
Source: Biostatistics - December 12, 2012 Category: Bioinformatics Authors: Rossell, D., Muller, P. Tags: Articles Source Type: research

Mark-specific proportional hazards model with multivariate continuous marks and its application to HIV vaccine efficacy trials
This article studies an extension of this approach to allow a multivariate continuum of competing risks, to better account for the fact that the candidate HIV vaccines tested in efficacy trials have contained multiple HIV sequences, with a purpose to elicit multiple types of immune response that recognize and block different types of HIV viruses. We develop inference for the proportional hazards model in which the regression parameters depend parametrically on the marks, to avoid the curse of dimensionality, and the baseline hazard depends nonparametrically on both time and marks. Goodness-of-fit tests are constructed base...
Source: Biostatistics - December 12, 2012 Category: Bioinformatics Authors: Sun, Y., Li, M., Gilbert, P. B. Tags: Articles Source Type: research

Non-parametric estimation of a time-dependent predictive accuracy curve
A major biomedical goal associated with evaluating a candidate biomarker or developing a predictive model score for event-time outcomes is to accurately distinguish between incident cases from the controls surviving beyond t throughout the entire study period. Extensions of standard binary classification measures like time-dependent sensitivity, specificity, and receiver operating characteristic (ROC) curves have been developed in this context (Heagerty, P. J., and others, 2000. Time-dependent ROC curves for censored survival data and a diagnostic marker. Biometrics 56, 337–344). We propose a direct, non-paramet...
Source: Biostatistics - December 12, 2012 Category: Bioinformatics Authors: Saha-Chaudhuri, P., Heagerty, P. J. Tags: Articles Source Type: research

Marginal additive hazards model for case-cohort studies with multiple disease outcomes: an application to the Atherosclerosis Risk in Communities (ARIC) study
In the case-cohort studies conducted within the Atherosclerosis Risk in Communities (ARIC) study, it is of interest to assess and compare the effect of high-sensitivity C-reactive protein (hs-CRP) on the increased risks of incident coronary heart disease and incident ischemic stroke. Empirical cumulative hazards functions for different levels of hs-CRP reveal an additive structure for the risks for each disease outcome. Additionally, we are interested in estimating the difference in the risk for the different hs-CRP groups. Motivated by this, we consider fitting marginal additive hazards regression models for case-cohort s...
Source: Biostatistics - December 12, 2012 Category: Bioinformatics Authors: Kang, S., Cai, J., Chambless, L. Tags: Articles Source Type: research

Deriving benefit of early detection from biomarker-based prognostic models
Many prognostic models for cancer use biomarkers that have utility in early detection. For example, in prostate cancer, models predicting disease-specific survival use serum prostate-specific antigen levels. These models typically show that higher marker levels are associated with poorer prognosis. Consequently, they are often interpreted as indicating that detecting disease at a lower threshold of the biomarker is likely to generate a survival benefit. However, lowering the threshold of the biomarker is tantamount to early detection. For survival benefit to not be simply an artifact of starting the survival clock earlier,...
Source: Biostatistics - December 12, 2012 Category: Bioinformatics Authors: Inoue, L. Y. T., Gulati, R., Yu, C., Kattan, M. W., Etzioni, R. Tags: Articles Source Type: research

Targeted maximum likelihood estimation for marginal time-dependent treatment effects under density misspecification
Targeted maximum likelihood methods have been proposed to estimate treatment effects for longitudinal data in the presence of time-dependent confounders. This class of methods has been mathematically proven to be doubly robust and to optimize the asymptotic estimating efficiency among the class of regular, semi-parametric estimators when all estimated density components are correctly specified. We show that methods previously proposed to build a one-step estimator with a logistic loss function generalize to a generalized linear loss function, and so may be applied naturally to an outcome that can be described by any expone...
Source: Biostatistics - December 12, 2012 Category: Bioinformatics Authors: Schnitzer, M. E., Moodie, E. E. M., Platt, R. W. Tags: Articles Source Type: research

PrePrint: Rough-Fuzzy Clustering for Grouping Functionally Similar Genes from Microarray Data
Gene expression data clustering is one of the important tasks of functional genomics as it provides a powerful tool for studying functional relationships of genes in a biological process. Identifying co-expressed groups of genes represents the basic challenge in gene clustering problem. In this regard, a gene clustering algorithm, termed as rough-fuzzy c-means, is proposed judiciously integrating the merits of rough sets and fuzzy sets. While the concept of lower and upper approximations of rough sets deals with uncertainty, vagueness, and incompleteness in cluster definition, the integration of probabilistic and possibili...
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - December 11, 2012 Category: Bioinformatics Source Type: research

PrePrint: Reconstruction of Transcriptional Regulatory Networks by Stability-based Network Component Analysis
Reliable inference of transcription regulatory networks is a challenging task in computational biology. Network component analysis (NCA) has become a powerful scheme to uncover regulatory networks behind complex biological processes. However, the performance of NCA is impaired by the high rate of false connections in binding information. In this paper, we integrate stability analysis with NCA to form a novel scheme, namely stability-based NCA (sNCA), for regulatory network identification. The method mainly addresses the inconsistency between gene expression data and binding motif information. Small perturbations are introd...
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - December 11, 2012 Category: Bioinformatics Source Type: research

PrePrint: SP-Dock: Protein-Protein Docking using Shape and Physicochemical Complementarity
In this paper, a framework for protein-protein docking is proposed, which exploits both shape and physicochemical complementarity to generate improved docking predictions. Shape complementarity is achieved by matching local surface patches. However, unlike existing approaches, which are based on single-patch or two-patch matching, we developed a new algorithm that compares simultaneously, groups of neighboring patches from the receptor with groups of neighboring patches from the ligand. Taking into account the fact that shape complementarity in protein surfaces is mostly approximate rather than exact, the proposed group-ba...
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - December 11, 2012 Category: Bioinformatics Source Type: research

IEEE/ACM Transactions on Computational Biology and Bioinformatics - Nov.-Dec. 2012 (Vol. 9, No. 6)
IEEE/ACM Transactions on Computational Biology and Bioinformatics
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - December 11, 2012 Category: Bioinformatics Source Type: research

PrePrint: Multiscale Modelling and Analysis of Planar Cell Polarity in the Drosophila Wing
Modelling across multiple scales is a current challenge in Systems Biology, especially when applied to multicellular organisms. In this paper we present an approach to model at different spatial scales, using the new concept of hierarchically coloured Petri Nets (HCPN). We apply HCPN to model a tissue comprising multiple cells hexagonally packed in a honeycomb formation in order to describe the phenomenon of Planar Cell Polarity (PCP) signalling in Drosophila wing. We have constructed a family of related models, permitting different hypotheses to be explored regarding the mechanisms underlying PCP. In addition our models i...
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - December 11, 2012 Category: Bioinformatics Source Type: research

PrePrint: Mining Quasi-Bicliques from HIV-1--Human Protein Interaction Network: A Multiobjective Biclustering Approach
In this work, we model the problem of mining quasi-bicliques from weighted viral-host protein-protein interaction network as a biclustering problem for identifying strong interaction modules. In this regard, a multiobjective genetic algorithm based biclustering technique is proposed that simultaneously optimizes three objective functions to obtain dense biclusters having high mean interaction strengths. The performance of the proposed technique has been compared with that of other existing biclustering methods on an artificial data. Subsequently, the proposed biclustering method is applied on the records of biologically va...
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - December 11, 2012 Category: Bioinformatics Source Type: research

Architecture for interoperable software in biology.
Abstract Understanding biological complexity demands a combination of high-throughput data and interdisciplinary skills. One way to bring to bear the necessary combination of data types and expertise is by encapsulating domain knowledge in software and composing that software to create a customized data analysis environment. To this end, simple flexible strategies are needed for interconnecting heterogeneous software tools and enabling data exchange between them. Drawing on our own work and that of others, we present several strategies for interoperability and their consequences, in particular, a set of simple data...
Source: Briefings in Bioinformatics - December 11, 2012 Category: Bioinformatics Authors: Bare JC, Baliga NS Tags: Brief Bioinform Source Type: research

Generalization of the normal-exponential model: exploration of a more accurate parametrisation for the signal distribution on Illumina BeadArrays
Conclusions: This paper addresses the lack of fit of the usual normal-exponential model by proposing a more flexible parametrisation of the signal distribution as well as the associated background correction. This new model proves to be considerably more accurate for Illumina microarrays, but the improvement in terms of modeling does not lead to a higher sensitivity in differential analysis. Nevertheless, this realistic modeling makes way for future investigations, in particular to examine the characteristics of pre-processing strategies.
Source: BMC Bioinformatics - Latest articles - December 11, 2012 Category: Bioinformatics Authors: Sandra PlancadeYves RozenholcEiliv Lund Source Type: research

PrePrint: The Depth Problem: Identifying the Most Representative Units in a Data Group
This paper presents a solution to the problem of how to identify the units in groups or clusters that have the greatest degree of centrality and best characterize each group. This problem frequently arises in the classification of data such as types of tumor, gene expression profiles or general biomedical data. It is particularly important in the common context that many units do not properly belong to any cluster. Furthermore, in gene expression data classification, good identification of the most central units in a cluster enables recognition of the most important samples in a particular pathological process. We propose ...
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - December 10, 2012 Category: Bioinformatics Source Type: research

Comparison of co-expression measures: mutual information, correlation, and model based indices
Conclusions: The biweight midcorrelation outperforms MI in terms of elucidating gene pairwise relationships. Coupled with the topological overlap matrix transformation, it often leads to more significantly enriched co-expression modules. Spline and polynomial networks form attractive alternatives to MI in case of non-linear relationships. Our results indicate that MI networks can safely be replaced by correlation networks when it comes to measuring co-expression relationships in stationary data.
Source: BMC Bioinformatics - Latest articles - December 9, 2012 Category: Bioinformatics Authors: Lin SongPeter LangfelderSteve Horvath Source Type: research

A computational pipeline for the development of multi-marker bio-signature panels and ensemble classifiers
Conclusion: Proteo-genomic biomarker ensemble classifiers show promise in the diagnosis of acute renal allograft rejection and can improve classification performance beyond that of individual genomic or proteomic classifiers alone. Validation of our results in an international multicenter study is currently underway.
Source: BMC Bioinformatics - Latest articles - December 8, 2012 Category: Bioinformatics Authors: Oliver GüntherVirginia ChenGabriela FreueRobert BalshawScott TebbuttZsuzsanna HollanderMandeep TakharW McMasterBruce McManusPaul KeownRaymond Ng Source Type: research

Improving stability and understandability of genotype-phenotype mapping in Saccharomyces using regularized variable selection in L-PLS regression
Conclusions: A modification of L-PLS with VIP in a stepwise regularized elimination procedure can improve the understandability and stability of selected genes and background information. The approach is recommended for genome wide association studies where background information is available.
Source: BMC Bioinformatics - Latest articles - December 8, 2012 Category: Bioinformatics Authors: Tahir MehmoodJonas WarringerLars SnipenSolve Sæbø Source Type: research

CDRUG: a web server for predicting anticancer activity of chemical compounds
Summary: Cancer is the leading cause of death worldwide. Screening anticancer candidates from tens of millions of chemical compounds is expensive and time-consuming. A rapid and user-friendly web server, known as CDRUG, is described here to predict the anticancer activity of chemical compounds. In CDRUG, a hybrid score was developed to measure the similarity of different compounds. The performance analysis shows that CDRUG has the area under curve of 0.878, indicating that CDRUG is effective to distinguish active and inactive compounds. Availability: The CDRUG web server and the standard-alone version are freely available ...
Source: Bioinformatics - December 7, 2012 Category: Bioinformatics Authors: Li, G.-H., Huang, J.-F. Tags: DATA AND TEXT MINING Source Type: research

FFPopSim: an efficient forward simulation package for the evolution of large populations
Motivation: The analysis of the evolutionary dynamics of a population with many polymorphic loci is challenging, as a large number of possible genotypes needs to be tracked. In the absence of analytical solutions, forward computer simulations are an important tool in multi-locus population genetics. The run time of standard algorithms to simulate sexual populations increases as 8L with the number of loci L, or with the square of the population size N. Results: We have developed algorithms to simulate large populations with arbitrary genetic maps, including multiple crossovers, with a run time that scales as 3L. If the numb...
Source: Bioinformatics - December 7, 2012 Category: Bioinformatics Authors: Zanini, F., Neher, R. A. Tags: GENETICS AND POPULATION ANALYSIS Source Type: research

GWASTools: an R/Bioconductor package for quality control and analysis of genome-wide association studies
Summary: GWASTools is an R/Bioconductor package for quality control and analysis of genome-wide association studies (GWAS). GWASTools brings the interactive capability and extensive statistical libraries of R to GWAS. Data are stored in NetCDF format to accommodate extremely large datasets that cannot fit within R’s memory limits. The documentation includes instructions for converting data from multiple formats, including variants called from sequencing. GWASTools provides a convenient interface for linking genotypes and intensity data with sample and single nucleotide polymorphism annotation. Availability and implem...
Source: Bioinformatics - December 7, 2012 Category: Bioinformatics Authors: Gogarten, S. M., Bhangale, T., Conomos, M. P., Laurie, C. A., McHugh, C. P., Painter, I., Zheng, X., Crosslin, D. R., Levine, D., Lumley, T., Nelson, S. C., Rice, K., Shen, J., Swarnkar, R., Weir, B. S., Laurie, C. C. Tags: GENETICS AND POPULATION ANALYSIS Source Type: research

A high-performance computing toolset for relatedness and principal component analysis of SNP data
Summary: Genome-wide association studies are widely used to investigate the genetic basis of diseases and traits, but they pose many computational challenges. We developed gdsfmt and SNPRelate (R packages for multi-core symmetric multiprocessing computer architectures) to accelerate two key computations on SNP data: principal component analysis (PCA) and relatedness analysis using identity-by-descent measures. The kernels of our algorithms are written in C/C++ and highly optimized. Benchmarks show the uniprocessor implementations of PCA and identity-by-descent are ~8–50 times faster than the implementations provided ...
Source: Bioinformatics - December 7, 2012 Category: Bioinformatics Authors: Zheng, X., Levine, D., Shen, J., Gogarten, S. M., Laurie, C., Weir, B. S. Tags: GENETICS AND POPULATION ANALYSIS Source Type: research

TIBA: a tool for phylogeny inference from rearrangement data with bootstrap analysis
Summary: TIBA is a tool to reconstruct phylogenetic trees from rearrangement data that consist of ordered lists of synteny blocks (or genes), where each synteny block is shared with all of its homologues in the input genomes. The evolution of these synteny blocks, through rearrangement operations, is modelled by the uniform Double-Cut-and-Join model. Using a true distance estimate under this model and simple distance-based methods, TIBA reconstructs a phylogeny of the input genomes. Unlike any previous tool for inferring phylogenies from rearrangement data, TIBA uses novel methods of robustness estimation to provide suppor...
Source: Bioinformatics - December 7, 2012 Category: Bioinformatics Authors: Lin, Y., Rajan, V., Moret, B. M. E. Tags: PHYLOGENETICS Source Type: research

Interactive exploration of RNA22 microRNA target predictions
Summary: MicroRNA (miRNA) target prediction is an important problem. Given an miRNA sequence the task is to determine the identity of the messenger RNAs targeted by it, the locations within them where the interactions happen and the specifics of the formed heteroduplexes. Here, we describe a web-based application, RNA22-GUI, which we have designed and implemented for the interactive exploration and in-context visualization of predictions by RNA22, one of the popular miRNA target prediction algorithms. Central to our design has been the requirement to provide informative and comprehensive visualization that is integrated wi...
Source: Bioinformatics - December 7, 2012 Category: Bioinformatics Authors: Loher, P., Rigoutsos, I. Tags: SEQUENCE ANALYSIS Source Type: research

Olorin: combining gene flow with exome sequencing in large family studies of complex disease
Motivation: The existence of families with many individuals affected by the same complex disease has long suggested the possibility of rare alleles of high penetrance. In contrast to Mendelian diseases, however, linkage studies have identified very few reproducibly linked loci in diseases such as diabetes and autism. Genome-wide association studies have had greater success with such diseases, but these results explain neither the extreme disease load nor the within-family linkage peaks, of some large pedigrees. Combining linkage information with exome or genome sequencing from large complex disease pedigrees might finally ...
Source: Bioinformatics - December 7, 2012 Category: Bioinformatics Authors: Morris, J. A., Barrett, J. C. Tags: GENOME ANALYSIS Source Type: research

MULTOVL: fast multiple overlaps of genomic regions
We present the MULTOVL application suite that detects and statistically analyses multiple overlaps of genomic regions in a fast and efficient manner. The package supports the detection of multiple region intersections, unions and ‘solitary’ genomic regions. The significance of actually observed overlaps is estimated by comparing them with empirical null distributions generated by random shuffling of the input regions. Availability and implementation: Source code and binaries are downloadable from: http://www.csf.ac.at/facilities/scc/tools. Contact: andras.aszodi@csf.ac.at
Source: Bioinformatics - December 7, 2012 Category: Bioinformatics Authors: Aszodi, A. Tags: GENOME ANALYSIS Source Type: research

Real Time Metagenomics: Using k-mers to annotate metagenomes
Summary: Annotation of metagenomes involves comparing the individual sequence reads with a database of known sequences and assigning a unique function to each read. This is a time-consuming task that is computationally intensive (though not computationally complex). Here we present a novel approach to annotate metagenomes using unique k-mer oligopeptide sequences from 7 to 12 amino acids long. We demonstrate that k-mer-based annotations are faster and approach the sensitivity and precision of blastx-based annotations without loosing accuracy. A last-common ancestor approach was also developed to describe the members of the...
Source: Bioinformatics - December 7, 2012 Category: Bioinformatics Authors: Edwards, R. A., Olson, R., Disz, T., Pusch, G. D., Vonstein, V., Stevens, R., Overbeek, R. Tags: GENOME ANALYSIS Source Type: research

An ensemble correlation-based gene selection algorithm for cancer classification with gene expression data
Motivation: Gene selection for cancer classification is one of the most important topics in the biomedical field. However, microarray data pose a severe challenge for computational techniques. We need dimension reduction techniques that identify a small set of genes to achieve better learning performance. From the perspective of machine learning, the selection of genes can be considered to be a feature selection problem that aims to find a small subset of features that has the most discriminative information for the target. Results: In this article, we proposed an Ensemble Correlation-Based Gene Selection algorithm based o...
Source: Bioinformatics - December 7, 2012 Category: Bioinformatics Authors: Piao, Y., Piao, M., Park, K., Ryu, K. H. Tags: DATA AND TEXT MINING Source Type: research

Inference of temporally varying Bayesian Networks
Motivation: When analysing gene expression time series data, an often overlooked but crucial aspect of the model is that the regulatory network structure may change over time. Although some approaches have addressed this problem previously in the literature, many are not well suited to the sequential nature of the data. Results: Here, we present a method that allows us to infer regulatory network structures that may vary between time points, using a set of hidden states that describe the network structure at a given time point. To model the distribution of the hidden states, we have applied the Hierarchical Dirichlet Proce...
Source: Bioinformatics - December 7, 2012 Category: Bioinformatics Authors: Thorne, T., Stumpf, M. P. H. Tags: SYSTEMS BIOLOGY Source Type: research

Bayesian correlated clustering to integrate multiple datasets
We present a Bayesian method for the unsupervised integrative modelling of multiple datasets, which we refer to as MDI (Multiple Dataset Integration). MDI can integrate information from a wide range of different datasets and data types simultaneously (including the ability to model time series data explicitly using Gaussian processes). Each dataset is modelled using a Dirichlet-multinomial allocation (DMA) mixture model, with dependencies between these models captured through parameters that describe the agreement among the datasets. Results: Using a set of six artificially constructed time series datasets, we show that MD...
Source: Bioinformatics - December 7, 2012 Category: Bioinformatics Authors: Kirk, P., Griffin, J. E., Savage, R. S., Ghahramani, Z., Wild, D. L. Tags: SYSTEMS BIOLOGY Source Type: research

A method for integrative structure determination of protein-protein complexes
Motivation: Structural characterization of protein interactions is necessary for understanding and modulating biological processes. On one hand, X-ray crystallography or NMR spectroscopy provide atomic resolution structures but the data collection process is typically long and the success rate is low. On the other hand, computational methods for modeling assembly structures from individual components frequently suffer from high false-positive rate, rarely resulting in a unique solution. Results: Here, we present a combined approach that computationally integrates data from a variety of fast and accessible experimental tech...
Source: Bioinformatics - December 7, 2012 Category: Bioinformatics Authors: Schneidman-Duhovny, D., Rossi, A., Avila-Sakar, A., Kim, S. J., Velazquez-Muriel, J., Strop, P., Liang, H., Krukenberg, K. A., Liao, M., Kim, H. M., Sobhanifar, S., Dotsch, V., Rajpal, A., Pons, J., Agard, D. A., Cheng, Y., Sali, A. Tags: STRUCTURAL BIOINFORMATICS Source Type: research

Fast protein structure alignment using Gaussian overlap scoring of backbone peptide fragment similarity
Motivation: Aligning and comparing protein structures is important for understanding their evolutionary and functional relationships. With the rapid growth of protein structure databases in recent years, the need to align, superpose and compare protein structures rapidly and accurately has never been greater. Many structural alignment algorithms have been described in the past 20 years. However, achieving an algorithm that is both accurate and fast remains a considerable challenge. Results: We have developed a novel protein structure alignment algorithm called ‘Kpax’, which exploits the highly predictable coval...
Source: Bioinformatics - December 7, 2012 Category: Bioinformatics Authors: Ritchie, D. W., Ghoorah, A. W., Mavridis, L., Venkatraman, V. Tags: STRUCTURAL BIOINFORMATICS Source Type: research

FOLD-EM: automated fold recognition in medium- and low-resolution (4-15 A) electron density maps
Motivation: Owing to the size and complexity of large multi-component biological assemblies, the most tractable approach to determining their atomic structure is often to fit high-resolution radiographic or nuclear magnetic resonance structures of isolated components into lower resolution electron density maps of the larger assembly obtained using cryo-electron microscopy (cryo-EM). This hybrid approach to structure determination requires that an atomic resolution structure of each component, or a suitable homolog, is available. If neither is available, then the amount of structural information regarding that component is ...
Source: Bioinformatics - December 7, 2012 Category: Bioinformatics Authors: Saha, M., Morais, M. C. Tags: STRUCTURAL BIOINFORMATICS Source Type: research

RAPHAEL: recognition, periodicity and insertion assignment of solenoid protein structures
Motivation: Repeat proteins form a distinct class of structures where folding is greatly simplified. Several classes have been defined, with solenoid repeats of periodicity between ca. 5 and 40 being the most challenging to detect. Such proteins evolve quickly and their periodicity may be rapidly hidden at sequence level. From a structural point of view, finding solenoids may be complicated by the presence of insertions or multiple domains. To the best of our knowledge, no automated methods are available to characterize solenoid repeats from structure. Results: Here we introduce RAPHAEL, a novel method for the detection of...
Source: Bioinformatics - December 7, 2012 Category: Bioinformatics Authors: Walsh, I., Sirocco, F. G., Minervini, G., Di Domenico, T., Ferrari, C., Tosatto, S. C. E. Tags: STRUCTURAL BIOINFORMATICS Source Type: research

A counting renaissance: combining stochastic mapping and empirical Bayes to quickly detect amino acid sites under positive selection
Motivation: Statistical methods for comparing relative rates of synonymous and non-synonymous substitutions maintain a central role in detecting positive selection. To identify selection, researchers often estimate the ratio of these relative rates ( ) at individual alignment sites. Fitting a codon substitution model that captures heterogeneity in across sites provides a reliable way to perform such estimation, but it remains computationally prohibitive for massive datasets. By using crude estimates of the numbers of synonymous and non-synonymous substitutions at each site, counting approaches scale well to large dataset...
Source: Bioinformatics - December 7, 2012 Category: Bioinformatics Authors: Lemey, P., Minin, V. N., Bielejec, F., Kosakovsky Pond, S. L., Suchard, M. A. Tags: PHYLOGENETICS Source Type: research

Discriminative modelling of context-specific amino acid substitution probabilities
Motivation: Protein sequence searching and alignment are fundamental tools of modern biology. Alignments are assessed using their similarity scores, essentially the sum of substitution matrix scores over all pairs of aligned amino acids. We previously proposed a generative probabilistic method that yields scores that take the sequence context around each aligned residue into account. This method showed drastically improved sensitivity and alignment quality compared with standard substitution matrix-based alignment. Results: Here, we develop an alternative discriminative approach to predict sequence context-specific substit...
Source: Bioinformatics - December 7, 2012 Category: Bioinformatics Authors: Angermuller, C., Biegert, A., Soding, J. Tags: SEQUENCE ANALYSIS Source Type: research