Login / Register for free to get access to My MedWorm

IEEE/ACM Transactions on Computational Biology and BioinformaticsIEEE/ACM Transactions on Computational Biology and Bioinformatics RSS feedThis is an RSS file. You can use it to subscribe to this data in your favourite RSS reader, such as GoogleReader, or to display this data on your own website or blog. subscribe with MyMedWormSubscribe to this data using MyMedWorm.subscribe with GoogleReaderSubscribe to this data using GoogleReader.subscribe with BloglinesSubscribe to this data using Bloglines.subscribe with MyYahooSubscribe to this data using MyYahoo.

This page shows you the latest items in this publication.

273 records returned

PrePrint: Estimating Haplotype Frequencies by Combining Data from Large DNA Pools with Database Informationemail this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
We present a Bayesian haplotyping method for pooled DNA based on a continuous approximation of the multinomial distribution. The proposed method is applicable when the sizes of the DNA pools and/or the number of considered loci exceed the limits of several earlier methods. In the example analyses the proposed model clearly outperforms a deterministic greedy algorithm on real data from the HapMap database. With a small number of loci the proposed method performs similarly to an EM-algorithm which uses a multinormal approximation for the pooled allele frequencies, but which does not utilize prior information about the haplot...
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - October 16, 2009 Category: Bioinformatics Source Type: journals

PrePrint: A Markov Blanket-Based Model for Gene Regulatory Network Inferenceemail this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
An efficient two step Markov blanket method for modeling and inferring complex regulatory networks from large-scale microarray datasets is presented. The inferred gene regulatory network is based on the time series gene expression data capturing the underlying gene interactions. For constructing a highly accurate GRN, the proposed method performs (i) discovery of a gene's Markov Blanket (MB), (ii) formulation of a flexible measure to determine the network's quality, (iii) efficient searching with the aid of a guided genetic algorithm, (iv) pruning to obtain a minimal set of correct interactions. Investigations are carried ...
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - October 16, 2009 Category: Bioinformatics Source Type: journals

PrePrint: Pairwise Statistical Significance of Local Sequence Alignment Using Sequence-Specific and Position-Specific Substitution Matricesemail this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
Pairwise sequence alignment is a central problem in bioinformatics which forms the basis of many other applications. Two related sequences are expected to have a high alignment score, but relatedness is usually judged by statistical significance rather than by alignment score. Recently, it was shown that pairwise statistical significance gives promising results as an alternative to database statistical significance for getting individual significance estimates of pairwise alignment scores. The improvement was mainly attributed to making the statistical significance estimation process more sequence-specific and database-ind...
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - September 29, 2009 Category: Bioinformatics Source Type: journals

PrePrint: The Impact of Multiple Protein Sequence Alignment on Phylogenetic Estimationemail this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
Multiple sequence alignment is typically the first step in estimating phylogenetic trees, with the assumption being that as alignments improve, so will phylogenetic reconstructions. Over the last decade or so, new multiple sequence alignment methods have been developed to improve comparative analyses of protein structure, but these new methods have not been typically used in phylogenetic analyses. In this paper, we report on a simulation study that we performed to evaluate the consequences of using these new multiple sequence alignment methods in terms of the resultant phylogenetic reconstruction. We find that while alignm...
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - September 3, 2009 Category: Bioinformatics Source Type: journals

IEEE/ACM Transactions on Computational Biology and Bioinformatics - July-September 2009 (Vol. 6, No. 3)email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
IEEE/ACM Transactions on Computational Biology and Bioinformatics
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - July 30, 2009 Category: Bioinformatics Source Type: journals

PrePrint: A Weighted Principal Component Analysis and Its Application to Gene Expression Dataemail this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
In this work we introduce in the first part new developments in Principal Component Analysis (PCA) and in the second part a new method to select variables (genes in our application). Our focus is on problems where the values taken by each variable do not all have the same importance and where the data may be contaminated with noise and contain outliers, as is the case with microarray data. The usual PCA is not appropriate to deal with this kind of problems. In this context, we propose the use of a new correlation coefficient as an alternative to Pearson's. This leads to a so-called weighted PCA (WPCA). In order to illustra...
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - July 21, 2009 Category: Bioinformatics Source Type: journals

PrePrint: Topology Improves Phylogenetic Motif Functional Site Predictionsemail this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
Prediction of protein functional sites from sequence-derived data remains an open bioinformatics problem. We have developed a phylogenetic motif (PM) functional site prediction approach that identifies functional sites from alignment fragments that parallel the evolutionary patterns of the family. In our approach, PMs are identified by comparing tree topologies of each alignment fragment to that of the complete phylogeny. Herein, we bypass the phylogenetic reconstruction step and identify PMs directly from distance matrix comparisons. In order to optimize the new algorithm, we consider three different distance matrices and...
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - July 14, 2009 Category: Bioinformatics Source Type: journals

PrePrint: On the Characterization and Selection of Diverse Conformational Ensembles, with Applications to Flexible Dockingemail this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
To address challenging flexible docking problems, a number of docking algorithms pre-generate large collections of candidate conformers. To remove the redundancy from such ensembles, a central problem in this context is to report a selection of conformers maximizing some geometric diversity criterion. We make three contributions to this problem. First, we resort to geometric optimization so as to report selections maximizing the molecular volume or molecular surface area (MSA) of the selection. Greedy strategies are developed, together with approximation bounds. Second, to assess the efficacy of our algorithms, we investig...
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - June 26, 2009 Category: Bioinformatics Source Type: journals

PrePrint: Influence of Prior Knowledge in Constraint-Based Learning of Gene Regulatory Networksemail this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
Constraint-based structure learning algorithms generally perform well on sparse graphs. Although sparsity is not uncommon, there are some domains where the underlying graph can have some dense regions; one of these domains is gene regulatory networks, which is the main motivation to undertake the study described in this paper. We propose a new constraint-based algorithm that can both increase the quality of output and decrease the computational requirements for learning the structure of gene regulatory networks. The algorithm is based on and extends the PC algorithm. Two different types of information are derived from the ...
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - June 26, 2009 Category: Bioinformatics Source Type: journals

PrePrint: F²Dock: Fast Fourier Protein-Protein Dockingemail this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
The functions of proteins is often realized through their mutual interactions. Determining a relative transformation for a pair of proteins and their conformations which form a stable complex, reproducible in nature, is known as docking. It is an important step in drug design, structure determination and understanding function and structure relationships. We provide a scoring model for rigid docking and error-bounded approximation algorithms to predict docking sites. Translational search is sped up using the Fourier domain. Shape based interactions is shown to give good results for a large range of pairs of proteins.
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - June 6, 2009 Category: Bioinformatics Source Type: journals

PrePrint: Peak Tree: A New Tool for Multiscale Hierarchical Representation and Peak Detection of Mass Spectrometry Dataemail this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
In mass spectrometry (MS) analysis, false peak detection results are unavoidable due to severe spectrum variations. However, most current peak detection methods are neither robust enough to resist spectrum variations nor flexible enough to revise false detection results. To improve flexibility, we introduce peak tree to represent the peak information in MS spectra. Each tree node is a peak judgment on a range of scales, and each tree decomposition, as a set of nodes, is a candidate peak detection result. To improve robustness, we combine peak detection and common peak alignment into a closed-loop framework, which finds the...
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - June 6, 2009 Category: Bioinformatics Source Type: journals

PrePrint: Predicting Metabolic Fluxes Using Gene Expression Differences as Constraintsemail this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
A standard approach to estimate intracellular fluxes on a genome-wide scale is flux balance analysis (FBA), which optimizes an objective function subject to constraints on (relations between) fluxes. The performance of FBA models heavily depends on the relevance of the formulated objective function and the completeness of the defined constraints. Previous studies indicated that FBA predictions can be improved by adding regulatory on/off constraints. These constraints were imposed based on either absolute (Shlomi2007a,Covert2004) or relative (Shlomi2008) gene expression values. We provide a new algorithm that directly uses ...
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - June 6, 2009 Category: Bioinformatics Source Type: journals

PrePrint: A Partial Set Covering Model for Protein Mixture Identification Using Mass Spectrometry Dataemail this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
Protein identification is a key and essential step in mass spectrometry (MS) based proteome research. To date, there are many protein identification strategies that employ either MS data or MS/MS data for database searching. While MS-based methods provide wider coverage than MS/MS-based methods, their identification accuracy is lower since MS data have less information than MS/MS data. Thus, it is desired to design more sophisticated algorithms that achieve higher identification accuracy using MS data. Peptide Mass Fingerprinting (PMF) has been widely used to identify single purified proteins from MS data for many years. I...
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - June 6, 2009 Category: Bioinformatics Source Type: journals

PrePrint: Evolutionary Optimization of Kernel Weights Improves Protein Complex Comembership Predictionemail this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
In recent years, more and more high-throughput data sources useful for protein complex prediction have become available (e.g. gene sequence, mRNA expression, interactions). The integration of these different data sources can be challenging. Recently, it has been recognized that kernel based classifiers are well suited for this task. However, the different kernels (data sources) are often combined using equal weights. Although several methods have been developed to optimize kernel weights, no large scale example of an improvement in classifier performance has been shown yet. In this work, we employ an evolutionary algorithm...
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - May 20, 2009 Category: Bioinformatics Source Type: journals

PrePrint: Fast Surface-Based Travel Depth Estimation Algorithm for Macromolecule Surface Shape Descriptionemail this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
Travel Depth, introduced by Coleman and Sharp in 2006, is a physical interpretation of molecular depth, term frequently used to describe the shape of a molecular active site or binding site. Travel Depth can be seen as the physical distance a solvent molecule would have to travel from a point of the surface, i.e., the Solvent Excluded Surface (SES), to its convex hull. Existing algorithms providing an estimation of the Travel Depth are based on a regular sampling of the molecule volume and on the use of the Dijkstra’s shortest path algorithm. Since Travel Depth is only defined on the molecular surface, this volume-b...
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - May 20, 2009 Category: Bioinformatics Source Type: journals

PrePrint: Linear-Time Algorithms for the Multiple Gene Duplication Problemsemail this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
A fundamental problem arising in the evolutionary molecular biology is to discover the locations of gene duplications and multiple gene duplication episodes based on the phylogenetic information. The solutions to the Multiple Gene Duplication problems can provide useful clues to place the gene duplication events onto the locations of a species tree and to expose the multiple gene duplication episodes. In this paper, we study two variations of the Multiple Gene Duplication problems: the Episode-Clustering (EC) problem and the Minimum Episodes (ME) problem. For the EC problem, we improve the results of Burleigh et~al. with a...
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - May 20, 2009 Category: Bioinformatics Source Type: journals

PrePrint: A General Framework for Analyzing Data from Two Short Time-Series Microarray Experimentsemail this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
We propose a general theoretical framework for analyzing differentially expressed genes and behavior patterns from two homogenous short time-course data. The framework generalizes the recently proposed Hilbert Schmidt Independence Criterion (HSIC) based framework adapting it to the time-series scenario by utilizing tensor analysis for data transformation. The proposed framework is effective in yielding criteria that can identify both the differentially expressed genes and time-course patterns of interest between two time series experiments without requiring to explicitly cluster the data. The results, obtained by applying ...
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - May 16, 2009 Category: Bioinformatics Source Type: journals

PrePrint: Fuzzy ARTMAP Prediction of Biological Activities for Potential HIV-1 Protease Inhibitors Using A Small Molecular Datasetemail this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
We focus on the neuro-fuzzy prediction of biological activities of HIV-1 protease inhibitory compounds when inferring from small training sets. We propose two computational intelligence prediction techniques which are suitable for small training sets, at the expense of some computational overhead. Both techniques are based on the FAMR model. The FAMR is a Fuzzy ARTMAP (FAM) incremental learning system used for classification and probability estimation. During the learning phase, each sample pair is assigned a relevance factor proportional to the importance of that pair. The two proposed algorithms in this paper are: 1. The...
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - May 16, 2009 Category: Bioinformatics Source Type: journals

PrePrint: Model Reduction Using Piecewise-Linear Approximations Preserves Dynamic Properties of the Carbon Starvation Response in Escherichia coliemail this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
The adaptation of the bacterium Escherichia coli to carbon starvation is controlled by a large network of biochemical reactions involving genes, mRNAs, proteins, and signalling molecules. The dynamics of these networks is difficult to analyze, notably due to a lack of quantitative information on parameter values. To overcome these limitations, model reduction approaches based on quasi-steady-state (QSS) and piecewise-linear (PL) approximations have been proposed, resulting in models that are easier to handle mathematically and computationally. The approximations are not supposed to affect the capability of the model to acc...
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - May 8, 2009 Category: Bioinformatics Source Type: journals

PrePrint: Learning Genetic Regulatory Network Connectivity From Time Series Dataemail this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
Recent experimental advances facilitate the collection of time series data that indicate which genes in a cell are expressed. This information can be used to understand the genetic regulatory network that generates the data. Typically, Bayesian analysis approaches are applied which neglect the time series nature of the experimental data, have difficulty in determining the direction of causality, and do not perform well on networks with tight feedback. This paper presents a method to learn genetic network connectivity which exploits the time series nature of experimental data to achieve better causal predictions. This metho...
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - May 8, 2009 Category: Bioinformatics Source Type: journals

PrePrint: Efficient Formulations for Exact Stochastic Simulation of Chemical Systemsemail this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
We present optimized implementations, available from http://cain.sourceforge.net, that offer better performance than previous work. There is no single method that is best for all problems. Simple formulations often work best for systems with a small number of reactions, while some sophisticated methods offer the best performance for large problems and scale well asymptotically. We investigate the performance of each formulation on simple biological systems using a wide range of problem sizes. We also consider the numerical accuracy of the direct and the next reaction method. We have found that special precautions must be t...
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - May 4, 2009 Category: Bioinformatics Source Type: journals

IEEE/ACM Transactions on Computational Biology and Bioinformatics - April-June 2009 (Vol. 6, No. 2)email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
IEEE/ACM Transactions on Computational Biology and Bioinformatics
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - April 30, 2009 Category: Bioinformatics Source Type: journals

PrePrint: Genetic Networks and Soft Computingemail this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
Analysis of gene regulatory networks provides enormous information on various fundamental cellular processes involving growth, development, hormone secretion and cellular communication. Their extraction from available gene expression profiles is a challenging problem. Such reverse engineering of genetic networks offers insight into cellular activity, and towards prediction of adverse effects of new drugs or possible identification of new drug targets. Tasks like classification, clustering and feature selection enable efficient mining of knowledge about gene interactions in the form of networks. It is known that biological ...
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - April 29, 2009 Category: Bioinformatics Source Type: journals

PrePrint: Probabilistic Analysis of Probe Reliability in Differential Gene Expression Studies with Short Oligonucleotide Arraysemail this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
Probe defects are a major source of noise in gene expression studies. While existing approaches detect noisy probes based on external information such as genomic alignments, we introduce and validate a targeted probabilistic method for analyzing probe reliability directly from expression data and independently of the noise source. This provides insights into the various sources of probe-level noise and gives tools to guide probe design.
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - April 25, 2009 Category: Bioinformatics Source Type: journals

PrePrint: Identification and Modeling of Genes with Diurnal Oscillations from Microarray Time Series Dataemail this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
Behavior of living organisms is strongly modulated by the day and night cycle giving rise to a cyclic pattern of activities. Such a pattern helps the organism to coordinate their activities and maintain a balance between what could be performed during the 'day' and what could be relegated to 'night'. This cyclic pattern, called the 'Circadian Rhythm', is a biological phenomenon observed in a large number of organisms. In this paper, our goal is to analyze transcriptome data from Cyanothece for the purpose of discovering genes whose expressions are rhythmic. We cluster these genes into groups that are close in terms of thei...
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - April 18, 2009 Category: Bioinformatics Source Type: journals

PrePrint: Nonnegative Principal Component Analysis for Cancer Molecular Pattern Discoveryemail this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
In this study, we investigate the benefit of adding nonnegative constraints on PCA and develop a nonnegative principal component analysis algorithm (NPCA) to overcome the global nature of PCA. A novel classification algorithm NPCA-SVM is proposed for microarray data pattern discovery. We report strong classification results from the NPCA-SVM algorithm on five benchmark microarray datasets by direct comparison with other related algorithms. We have also proved mathematically and interpreted biologically that microarray data will inevitably encounter over-fitting for a SVM/PCA-SVM learning machine under a Gaussian kernel. In...
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - April 18, 2009 Category: Bioinformatics Source Type: journals

PrePrint: Twin Removal in Genetic Algorithms for Protein Structure Prediction Using Low Resolution Modelemail this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
This paper presents the impact of twins and the measures for their removal from the population of genetic algorithm (GA) when applied to effective conformational searching. It is conclusively shown that a twin removal strategy for a GA provides considerably enhanced performance when investigating solutions to complex ab initio protein structure prediction (PSP) problems in low resolution model. Without twin removal, GA crossover and mutation operations can become ineffectual as generations lose their ability to produce significant differences which can lead to the solution stalling. The paper relaxes the definition of chro...
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - March 18, 2009 Category: Bioinformatics Source Type: journals

PrePrint: On Nakhleh's Metric for Reduced Phylogenetic Networksemail this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
We prove that Nakhleh's metric for reduced phylogenetic networks is also a metric on the classes of tree-child phylogenetic networks, of semi-binary tree-sibling time consistent phylogenetic networks, and of multi-labeled phylogenetic trees. We also prove that it separates distinguishable phylogenetic networks. In this way, it becomes the strongest dissimilarity measure for phylogenetic networks available so far. Furthermore, we propose a generalization of that metric that separates arbitrary phylogenetic networks.
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - March 18, 2009 Category: Bioinformatics Source Type: journals

PrePrint: Computing the Distribution of a Tree Metricemail this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
The Robinson-Foulds (RF) distance is by far the most widely used measure of dissimilarity between trees. Although the distribution of these distances has been investigated for twenty years, an algorithm that is explicitly polynomial time has yet to be described for computing this distribution (which is also the distribution of trees around a given tree under the popular Robinson-Foulds metric). In this paper we derive a polynomial-time algorithm for this distribution. We show how the distribution can be approximated by a Poisson distribution determined by the proportion of leaves that lie in 'cherries' of the given tree. W...
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - March 18, 2009 Category: Bioinformatics Source Type: journals

PrePrint: Finding Significant Matches of Position Weight Matrices in Linear Timeemail this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
Position weight matrices are an important method for modeling signals or motifs in biological sequences, both in DNA and protein contexts. In this paper we present fast algorithms for the problem of finding significant matches of such matrices. Our algorithms are of the on--line type, and they generalize classical multi-pattern matching, filtering, and super-alphabet techniques of combinatorial string matching to the problem of weight matrix matching. Several variants of the algorithms are developed, including multiple matrix extensions that perform the search for several matrices in one scan through the sequence database....
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - March 16, 2009 Category: Bioinformatics Source Type: journals

PrePrint: Evaluation of Geometric Complementarity between Molecular Surfaces Using Compactly Supported Radial Basis Functionsemail this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
One of the challenges faced by all molecular docking algorithms is that of being able to discriminate between correct results and false positives obtained in the simulations. The scoring or energetic function is the one that must fulfill this task. Several scoring functions have been developed and new methodologies are still under development. In this paper we have employed the Compactly Supported Radial Basis Functions (CSRBF) to create analytical representations of molecular surfaces, which are then included as key components of a new scoring function for molecular docking. The method proposed here achieves a better rank...
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - March 6, 2009 Category: Bioinformatics Source Type: journals

PrePrint: Quantifying the Degree of Self-Nestedness of Trees: Application to the Structural Analysis of Plantsemail this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
In this paper we are interested in the problem of approximating trees by trees with a particular self-nested structure. Self-nested trees are such that all their subtrees of a given height are isomorphic. We show that these trees present remarkable compression properties, with high compression rates. In order to measure how far a tree is from being a self-nested tree, we then study how to quantify the degree of self-nestedness of any tree. For this, we define a measure of the self-nestedness of a tree by constructing a self-nested tree that minimizes the distance of the original tree to the set of self-nested trees that em...
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - March 5, 2009 Category: Bioinformatics Source Type: journals

PrePrint: TRIAL: A Tool for Finding Distant Structural Similaritiesemail this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
Finding structural similarities in distant proteins can reveal functional relationships that can not be identified using sequence comparison. Given two proteins A and B and threshold ε Å, we develop an algorithm, TRiplet-based Iterative ALignment (TRIAL) for computing the transformation of B that maximizes the number of aligned residues such that the root mean square distance of the alignment is at most ε Å. Our algorithm is designed with the specific goal of effectively handling proteins with low similarity in primary structure, where existing algorithms perform particularly poorly. Experiments...
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - March 5, 2009 Category: Bioinformatics Source Type: journals

PrePrint: Heuristic Reuseable Dynamic Programming: Efficient Updating of Local Sequence Alignmentemail this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
In this study, we validate "relative node tolerance bound" (RNTB) in the pruned searching space. Furthermore, we improve the performance by quantifying the successful RNTB tolerance probability and switching to rDP on perturbation-resilient columns only. In our searching space derived by a threshold value of 90% of the optimal alignment score, we find that 98.3% of contours contain correctly updated paths, while the contour consumes only 25.36% of the cost of sparse dynamic programming (sDP) method, which corresponds to only 2.55% of a normal dynamic programming runtime with the Smith-Waterman algorithm.
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - March 4, 2009 Category: Bioinformatics Source Type: journals

PrePrint: The Metropolized Partial Importance Sampling MCMC Mixes Slowly on Minimum Reversal Rearrangement Pathsemail this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
We present here a negative result on the rate of convergence of the generally used Markov chains. We prove that the relaxation time of the Markov chains walking on the optimal reversal sorting scenarios might grow exponentially with the size of the signed permutations, namely, with the number of syntheny blocks.
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - February 20, 2009 Category: Bioinformatics Source Type: journals

PrePrint: New Methods for Inference of Local Tree Topologies with Recombinant SNP Sequences in Populationsemail this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
Partly due to ecombination, genealogical history of a set of DNA sequences in a population usually can not be represented by a single tree. Instead, genealogy is better represented by a genealogical network, which is a compact representation of a set of correlated local genealogical trees, each for a short region of genome and possibly with different topology. Inference of genealogical history for a set of DNA sequences under recombination has many potential applications, including association mapping of complex diseases. In this paper, we present two new methods for reconstructing local tree topologies with the presence o...
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - February 20, 2009 Category: Bioinformatics Source Type: journals

PrePrint: A Cluster Refinement Algorithm for Motif Discoveryemail this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
Finding Transcription Factor Binding Sites, i.e., motif discovery, is crucial for understanding the gene regulatory relationship. Motifs are weakly conserved and motif discovery is a NP-hard problem. We propose a new approach called Cluster Refinement Algorithm for Motif Discovery (CRMD). CRMD employs a flexible statistical motif model allowing a variable number of motifs and motif instances. CRMD first uses a novel entropy-based clustering to find complete and good starting candidate motifs from the DNA sequences. CRMD then uses an effective greedy refinement to search for optimal motifs from the candidate motifs. The ref...
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - February 17, 2009 Category: Bioinformatics Source Type: journals

PrePrint: Model Reduction of Multiscale Chemical Langevin Equations: A Numerical Case Studyemail this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
We describe and illustrate the application of a semi-analytical reduction framework for chemical Langevin equations that results in significant gains in computational cost.
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - February 13, 2009 Category: Bioinformatics Source Type: journals

PrePrint: Constructing Level-2 Phylogenetic Networks from Tripletsemail this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
Jansson and Sung showed that, given a dense set of input triplets T (representing hypotheses about the local evolutionary relationships of triplets of taxa), it is possible to determine in polynomial time whether there exists a level-1 network consistent with T, and if so to construct such a network (Inferring a Level-1 Phylogenetic Network from a Dense Set of Rooted Triplets, Theoretical Computer Science, 363, pp. 60-68 (2006)). Here we extend this work by showing that this problem is even polynomial-time solvable for the construction of level-2 networks. This shows that, assuming density, it is tractable to construct pla...
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - February 13, 2009 Category: Bioinformatics Source Type: journals

PrePrint: A Genetic Optimization Approach for Isolating Translational Efficiency Biasemail this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
We present a novel approach to isolating translational efficiency bias in microbial genomes. There are several existent methods for isolating translational efficiency bias. Previous approaches are susceptible to the confounding influences of other potentially dominant biases. Additionally, existing approaches to identifying translational efficiency bias generally require both genomic sequence information and prior knowledge of a set of highly expressed genes. This novel approach provides more accurate results from sequence information alone by resisting the confounding effects of other biases. We validate this increase in ...
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - February 13, 2009 Category: Bioinformatics Source Type: journals

PrePrint: A Novel Heuristic for Local Multiple Alignment of Interspersed DNA Repeatsemail this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
Pairwise local sequence alignment methods have been the prevailing technique to identify homologous nucleotides between related species. However, existing methods that identify and align all homologous nucleotides in one or more genomes have suffered from poor scalability and limited accuracy. We propose a novel method that couples a gapped extension heuristic with an efficient filtration method for identifying interspersed repeats in genome sequences. During gapped extension, we use the MUSCLE implementation of progressive global multiple alignment with iterative refinement. The resulting gapped extensions potentially con...
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - January 28, 2009 Category: Bioinformatics Source Type: journals

IEEE/ACM Transactions on Computational Biology and Bioinformatics - January-March 2009 (Vol. 6, No. 1)email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
IEEE/ACM Transactions on Computational Biology and Bioinformatics
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - January 27, 2009 Category: Bioinformatics Source Type: journals

PrePrint: A Sparse Learning Machine for High-Dimensional Data with Application to Microarray Gene Analysisemail this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
Extracting features from high-dimensional data is a critically important task for pattern recognition and machine learning applications. High-dimensional data typically have much more variables than observations, and contain significant noise, missing components, or outliers. Features extracted from high-dimensional data need to be discriminative, sparse, and can capture essential characteristics of the data. In this paper, we present a way to constructing multivariate features and then classify the data into proper classes. The resulting small subset of features is nearly the best in the sense of Greenshtein's persistence...
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - January 26, 2009 Category: Bioinformatics Source Type: journals

PrePrint: The Gene-Duplication Problem: Near-Linear Time Algorithms for NNI Based Local Searchesemail this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
The gene-duplication problem is to infer a species supertree from a collection of gene trees that are confounded by complex histories of gene duplication events. This problem is NP-complete and thus requires efficient and effective heuristics. Existing heuristics perform a stepwise search of the tree space, where each step is guided by an exact solution to an instance of a local search problem. A classical local search problem is the NNI search problem, which is based on the nearest neighbor interchange operation. In this work we (i) provide a novel near-linear time algorithm for the NNI search problem, (ii) introduce exte...
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - January 21, 2009 Category: Bioinformatics Source Type: journals

PrePrint: An Extended Kalman Filtering Approach to Modelling Nonlinear Dynamic Gene Regulatory Networks via Short Gene Expression Time Seriesemail this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
In this paper, the extended Kalman filter (EKF) algorithm is applied to model the gene regulatory network from gene time series data. The gene regulatory network is considered as a nonlinear dynamic stochastic model that consists of the gene measurement equation and the gene regulation equation. After specifying the model structure, we apply the EKF algorithm for identifying both the model parameters and the actual value of gene expression levels. It is shown that the EKF algorithm is an online estimation algorithm that can identify large number of parameters (including parameters of nonlinear functions) through iterative ...
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - January 21, 2009 Category: Bioinformatics Source Type: journals

PrePrint: On Subset Seeds for Protein Alignmentemail this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
We apply the concept of subset seeds proposed in [1] to similarity search in protein sequences. The main question studied is the design of efficient seed alphabets to construct seeds with optimal sensitivity/selectivity trade-offs. We propose several different design methods and use them to construct several alphabets. We then perform a comparative analysis of seeds built over those alphabets and compare them with the standard BLASTP seeding method [2], [3], as well as with the family of vector seeds proposed in [4]. While the formalism of subset seeds is less expressive (but less costly to implement) than the cumulative p...
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - January 21, 2009 Category: Bioinformatics Source Type: journals

PrePrint: An Approximation Algorithm for the Minimum Breakpoint Linearization Problememail this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
In the recent years there has been a growing interest in inferring the total order of genes or markers on a chromosome, since current genetic mapping efforts might only suffice to produce a partial order. Many interesting optimization problems were thus formulated in the framework of genome rearrangement. As an important one among them, the minimum breakpoint linearization (MBL) problem is to find the total order of a partially-ordered genome that minimizes its breakpoint distance to a reference genome whose genes are already totally ordered. It was previously shown to be NP-hard, and the algorithms proposed so far are all...
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - January 21, 2009 Category: Bioinformatics Source Type: journals

PrePrint: A Metric on the Space of Reduced Phylogenetic Networksemail this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
Phylogenetic networks are leaf-labeled, rooted, acyclic, directed graphs, that model reticulate evolutionary histories. Several measures for quantifying the topological dissimilarity between two phylogenetic networks have been devised for various classes of phylogenetic networks. A biologically-motivated class of phylogenetic networks, namely reduced phylogenetic networks, was recently introduced. None of the existing measures is a metric on the space of reduced phylogenetic networks. In this paper, we provide a polynomiallycomputable
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - January 21, 2009 Category: Bioinformatics Source Type: journals

PrePrint: Information-Theoretic Model of Evolution over Protein Communication Channelemail this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
In this paper, we propose a communication model of evolution and investigate its information-theoretic bounds. The process of evolution is modeled as the retransmission of information over a protein communication channel, where the transmitted message is the organism’s proteome encoded in the DNA. We compute the capacity and the rate-distortion functions of the protein communication system for the three domains of life: Archaea, Bacteria and Eukaryotes. The tradeoff between the transmission rate and the distortion in noisy protein communication channels is analyzed. As expected, comparison between the optimal transm...
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - January 21, 2009 Category: Bioinformatics Source Type: journals

PrePrint: Data Mining on DNA Sequences of Hepatitis B Virusemail this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
In this study, a data mining framework which includes molecular evolution analysis, clustering, feature selection, classifier learning and classification, is introduced. Our research group has collected HBV DNA sequences, either genotype B or C, from over 200 patients specifically for this project. In the molecular evolution analysis and clustering, three subgroups have been identified in genotype C and a clustering method has been developed to separate the subgroups. In the feature selection process, potential markers are selected based on Information Gain for further classifier learning. Then meaningful rules are learnt ...
Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics - January 20, 2009 Category: Bioinformatics Source Type: journals