Bioinformatics Research
This is an OPML file. It can be used to export all the MedWorm RSS feeds on this topic into your personal RSS reader (usually you have to save this file to your own computer before clicking on an Import OPML command in your own feed reader to upload the file which will then import all the feeds) or it can be used by webmasters to integrate MedWorm feeds with their own website.
This is an RSS file. You can use it to subscribe to this data in your favourite RSS reader, such as GoogleReader, or to display this data on your own website or blog.
Subscribe to this data using MyMedWorm.
Subscribe to this data using GoogleReader.
Subscribe to this data using Bloglines.
Subscribe to this data using MyYahoo.
Find the best Christmas presents and January Sales in the UK with this simple shopping directory.
This page shows you the most recent publications within this specialty of the MedWorm directory. This is page number 21.
PARADIGM-SHIFT predicts the function of mutations in multiple cancers using pathway impact analysis
Motivation: A current challenge in understanding cancer processes is to pinpoint which mutations influence the onset and progression of disease. Toward this goal, we describe a method called PARADIGM-SHIFT that can predict whether a mutational event is neutral, gain-or loss-of-function in a tumor sample. The method uses a belief-propagation algorithm to infer gene activity from gene expression and copy number data in the context of a set of pathway interactions.
Results: The method was found to be both sensitive and specific on a set of positive and negative controls for multiple cancers for which pathway information was a...
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Ng, S., Collisson, E. A., Sokolov, A., Goldstein, T., Gonzalez-Perez, A., Lopez-Bigas, N., Benz, C., Haussler, D., Stuart, J. M. Tags: APPLIED AND TRANSLATIONAL BIOINFORMATICS Source Type: research
Finding differentially expressed regions of arbitrary length in quantitative genomic data based on marked point process model
Motivation: High-throughput nucleotide sequencing technologies provide large amounts of quantitative genomic data at nucleotide resolution, which are important for the present and future biomedical researches; for example differential analysis of base-level RNA expression data will improve our understanding of transcriptome, including both coding and non-coding genes. However, most studies of these data have relied on existing genome annotations and thus are limited to the analysis of known transcripts.
Results: In this article, we propose a novel method based on a marked point process model to find differentially expresse...
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Hatsuda, H. Tags: APPLIED AND TRANSLATIONAL BIOINFORMATICS Source Type: research
A novel approach for resolving differences in single-cell gene expression patterns from zygote to blastocyst
We present a novel framework based on Gaussian process latent variable models (GPLVMs) to analyse single-cell qPCR expression data of 48 genes from mouse zygote to blastocyst as presented by (Guo et al., 2010). We extend GPLVMs by introducing gene relevance maps and gradient plots to provide interpretability as in the linear case. Furthermore, we take the temporal group structure of the data into account and introduce a new factor in the GPLVM likelihood which ensures that small distances are preserved for cells from the same developmental stage. Using our novel framework, it is possible to resolve differences in gene expr...
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Buettner, F., Theis, F. J. Tags: APPLIED AND TRANSLATIONAL BIOINFORMATICS Source Type: research
From phenotype to genotype: an association study of longitudinal phenotypic markers to Alzheimer's disease relevant SNPs
Motivation: Imaging genetic studies typically focus on identifying single-nucleotide polymorphism (SNP) markers associated with imaging phenotypes. Few studies perform regression of SNP values on phenotypic measures for examining how the SNP values change when phenotypic measures are varied. This alternative approach may have a potential to help us discover important imaging genetic associations from a different perspective. In addition, the imaging markers are often measured over time, and this longitudinal profile may provide increased power for differentiating genotype groups. How to identify the longitudinal phenotypic...
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Wang, H., Nie, F., Huang, H., Yan, J., Kim, S., Nho, K., Risacher, S. L., Saykin, A. J., Shen, L., for the Alzheimer's Disease Neuroimaging Initiative Tags: BIOINFORMATICS OF HEALTH AND DISEASE, BIOMARKERS AND PERSONALIZED MEDICINE Source Type: research
Drug target prediction using adverse event report systems: a pharmacogenomic approach
Motivation: Unexpected drug activities derived from off-targets are usually undesired and harmful; however, they can occasionally be beneficial for different therapeutic indications. There are many uncharacterized drugs whose target proteins (including the primary target and off-targets) remain unknown. The identification of all potential drug targets has become an important issue in drug repositioning to reuse known drugs for new therapeutic indications.
Results: We defined pharmacological similarity for all possible drugs using the US Food and Drug Administration's (FDA's) adverse event reporting system (AERS) and develo...
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Takarabe, M., Kotera, M., Nishimura, Y., Goto, S., Yamanishi, Y. Tags: BIOINFORMATICS OF HEALTH AND DISEASE, BIOMARKERS AND PERSONALIZED MEDICINE Source Type: research
Bayesian assignment of gene ontology terms to gene expression experiments
This article proposes a probabilistic model for GO term inference. Modelling assumes that gene annotations to GO terms are available and gene involvement in an experiment is represented by a posterior probabilities over gene-specific indicator variables. Such probability measures result from many Bayesian approaches for expression data analysis. The proposed model combines these indicator probabilities in a probabilistic fashion and provides a probabilistic GO term assignment as a result. Experiments on synthetic and microarray data suggest that advantages of the proposed probabilistic GO term inference over statistical te...
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Sykacek, P. Tags: BIOINFORMATICS OF HEALTH AND DISEASE, BIOMARKERS AND PERSONALIZED MEDICINE Source Type: research
An accurate paired sample test for count data
Motivation: Recent technology platforms in proteomics and genomics produce count data for quantitative analysis. Previous works on statistical significance analysis for count data have mainly focused on the independent sample setting, which does not cover the case where pairs of measurements are taken from individual patients before and after treatment. This experimental setting requires paired sample testing such as the paired t-test often used for continuous measurements. A state-of-the-art method uses a negative binomial distribution in a generalized linear model framework for paired sample testing. A paired sample desi...
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Pham, T. V., Jimenez, C. R. Tags: BIOINFORMATICS OF HEALTH AND DISEASE, BIOMARKERS AND PERSONALIZED MEDICINE Source Type: research
Improving HIV coreceptor usage prediction in the clinic using hints from next-generation sequencing data
Motivation: Due to the high mutation rate of human immunodeficiency virus (HIV), drug-resistant-variants emerge frequently. Therefore, researchers are constantly searching for new ways to attack the virus. One new class of anti-HIV drugs is the class of coreceptor antagonists that block cell entry by occupying a coreceptor on CD4 cells. This type of drug just has an effect on the subset of HIVs that use the inhibited coreceptor. A good prediction of whether the viral population inside a patient is susceptible to the treatment is hence very important for therapy decisions and pre-requisite to administering the respective dr...
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Pfeifer, N., Lengauer, T. Tags: BIOINFORMATICS OF HEALTH AND DISEASE, BIOMARKERS AND PERSONALIZED MEDICINE Source Type: research
Gene-gene interaction analysis for the survival phenotype based on the Cox model
Motivation: For the past few decades, many statistical methods in genome-wide association studies (GWAS) have been developed to identify SNP–SNP interactions for case-control studies. However, there has been less work for prospective cohort studies, involving the survival time. Recently, Gui et al. (2011) proposed a novel method, called Surv-MDR, for detecting gene–gene interactions associated with survival time. Surv-MDR is an extension of the multifactor dimensionality reduction (MDR) method to the survival phenotype by using the log-rank test for defining a binary attribute. However, the Surv-MDR method has ...
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Lee, S., Kwon, M.-S., Oh, J. M., Park, T. Tags: BIOINFORMATICS OF HEALTH AND DISEASE, BIOMARKERS AND PERSONALIZED MEDICINE Source Type: research
Event extraction across multiple levels of biological organization
We present the ontological foundations, target types and guidelines for entity and event annotation and introduce the new multi-level event extraction (MLEE) corpus, manually annotated using a structured representation for event extraction. We further adapt and evaluate named entity and event extraction methods for the new task, demonstrating that both can be achieved with performance broadly comparable with that for established molecular entity and event extraction tasks.
Availability: The resources and methods introduced in this study are available from http://nactem.ac.uk/MLEE/.
Contact: pyysalos@cs.man.ac.uk
Supplement...
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Pyysalo, S., Ohta, T., Miwa, M., Cho, H.-C., Tsujii, J., Ananiadou, S. Tags: DATABASES, ONTOLOGIES, AND TEXT MINING Source Type: research
ReLiance: a machine learning and literature-based prioritization of receptor--ligand pairings
Motivation: The prediction of receptor—ligand pairings is an important area of research as intercellular communications are mediated by the successful interaction of these key proteins. As the exhaustive assaying of receptor—ligand pairs is impractical, a computational approach to predict pairings is necessary. We propose a workflow to carry out this interaction prediction task, using a text mining approach in conjunction with a state of the art prediction method, as well as a widely accessible and comprehensive dataset.
Among several modern classifiers, random forests have been found to be the best at this pre...
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Iacucci, E., Tranchevent, L.-C., Popovic, D., Pavlopoulos, G. A., De Moor, B., Schneider, R., Moreau, Y. Tags: DATABASES, ONTOLOGIES, AND TEXT MINING Source Type: research
An approach to describing and analysing bulk biological annotation quality: a case study using UniProtKB
Motivation: Annotations are a key feature of many biological databases, used to convey our knowledge of a sequence to the reader. Ideally, annotations are curated manually, however manual curation is costly, time consuming and requires expert knowledge and training. Given these issues and the exponential increase of data, many databases implement automated annotation pipelines in an attempt to avoid un-annotated entries. Both manual and automated annotations vary in quality between databases and annotators, making assessment of annotation reliability problematic for users. The community lacks a generic measure for determin...
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Bell, M. J., Gillespie, C. S., Swan, D., Lord, P. Tags: DATABASES, ONTOLOGIES, AND TEXT MINING Source Type: research
Imaging, quantification and visualization of spatio-temporal patterning in mESC colonies under different culture conditions
Motivation: Mouse embryonic stem cells (mESCs) have developed into a prime system to study the regulation of pluripotency in stable cell lines. It is well recognized that different, established protocols for the maintenance of mESC pluripotency support morphologically and functionally different cell cultures. However, it is unclear how characteristic properties of cell colonies develop over time and how they are re-established after cell passage depending on the culture conditions. Furthermore, it appears that cell colonies have an internal structure with respect to cell size, marker expression or biomechanical properties,...
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Scherf, N., Herberg, M., Thierbach, K., Zerjatke, T., Kalkan, T., Humphreys, P., Smith, A., Glauche, I., Roeder, I. Tags: BIOIMAGING, SPATIAL-TEMPORAL MODELING AND DATA VISUALIZATION Source Type: research
Hybrid spatial Gillespie and particle tracking simulation
Motivation: Cellular signal transduction involves spatial–temporal dynamics and often stochastic effects due to the low particle abundance of some molecular species. Others can, however, be of high abundances. Such a system can be simulated either with the spatial Gillespie/Stochastic Simulation Algorithm (SSA) or Brownian/Smoluchowski dynamics if space and stochasticity are important. To combine the accuracy of particle-based methods with the superior performance of the SSA, we suggest a hybrid simulation.
Results: The proposed simulation allows an interactive or automated switching for regions or species of interes...
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Klann, M., Ganguly, A., Koeppl, H. Tags: BIOIMAGING, SPATIAL-TEMPORAL MODELING AND DATA VISUALIZATION Source Type: research
REVEAL--visual eQTL analytics
We present Reveal, our visual analytics approach to this challenge. We introduce a graph-based visualization of associations between SNPs and gene expression and a detailed genotype view relating summarized patient cohort genotypes with data from individual patients and statistical analyses.
Availability: Reveal is included in Mayday, our framework for visual exploration and analysis. It is available at http://it.inf.uni-tuebingen.de/software/reveal/.
Contact: guenter.jaeger@uni-tuebingen.de
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Jager, G., Battke, F., Nieselt, K. Tags: BIOIMAGING, SPATIAL-TEMPORAL MODELING AND DATA VISUALIZATION Source Type: research
Trajectory-oriented Bayesian experiment design versus Fisher A-optimal design: an in depth comparison study
Motivation: Experiment design strategies for biomedical models with the purpose of parameter estimation or model discrimination are in the focus of intense research. Experimental limitations such as sparse and noisy data result in unidentifiable parameters and render-related design tasks challenging problems. Often, the temporal resolution of data is a limiting factor and the amount of possible experimental interventions is finite. To address this issue, we propose a Bayesian experiment design algorithm to minimize the prediction uncertainty for a given set of experiments and compare it to traditional A-optimal design.
Res...
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Weber, P., Kramer, A., Dingler, C., Radde, N. Tags: REGULATION, PATHWAYS, AND SYSTEMS BIOLOGY Source Type: research
Comprehensive estimation of input signals and dynamics in biochemical reaction networks
This article presents a new approach which includes the input estimation into the estimation process of the dynamical model parameters by minimizing an objective function containing all parameters simultaneously. We applied this comprehensive approach to an illustrative model with simulated data and compared it to alternative methods. Statistical analyses revealed that our method improves the prediction of the model dynamics and the confidence intervals leading to a proper coverage of the confidence intervals of the dynamic parameters. The method was applied to the JAK-STAT signaling pathway.
Availability: MATLAB code is a...
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Schelker, M., Raue, A., Timmer, J., Kreutz, C. Tags: REGULATION, PATHWAYS, AND SYSTEMS BIOLOGY Source Type: research
Relating drug-protein interaction network with drug side effects
Motivation: Identifying the emergence and underlying mechanisms of drug side effects is a challenging task in the drug development process. This underscores the importance of system–wide approaches for linking different scales of drug actions; namely drug-protein interactions (molecular scale) and side effects (phenotypic scale) toward side effect prediction for uncharacterized drugs.
Results: We performed a large-scale analysis to extract correlated sets of targeted proteins and side effects, based on the co-occurrence of drugs in protein-binding profiles and side effect profiles, using sparse canonical correlation ...
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Mizutani, S., Pauwels, E., Stoven, V., Goto, S., Yamanishi, Y. Tags: REGULATION, PATHWAYS, AND SYSTEMS BIOLOGY Source Type: research
Random sampling of elementary flux modes in large-scale metabolic networks
Motivation: The description of a metabolic network in terms of elementary (flux) modes (EMs) provides an important framework for metabolic pathway analysis. However, their application to large networks has been hampered by the combinatorial explosion in the number of modes. In this work, we develop a method for generating random samples of EMs without computing the whole set.
Results: Our algorithm is an adaptation of the canonical basis approach, where we add an additional filtering step which, at each iteration, selects a random subset of the new combinations of modes. In order to obtain an unbiased sample, all candidate...
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Machado, D., Soons, Z., Patil, K. R., Ferreira, E. C., Rocha, I. Tags: REGULATION, PATHWAYS, AND SYSTEMS BIOLOGY Source Type: research
The architecture of the gene regulatory networks of different tissues
Summary: The great variety of human cell types in morphology and function is due to the diverse gene expression profiles that are governed by the distinctive regulatory networks in different cell types. It is still a challenging task to explain how the regulatory networks achieve the diversity of different cell types. Here, we report on our studies of the design principles of the tissue regulatory system by constructing the regulatory networks of eight human tissues, which subsume the regulatory interactions between transcription factors (TFs), microRNAs (miRNAs) and non-TF target genes. The results show that there are in-...
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Li, J., Hua, X., Haubrock, M., Wang, J., Wingender, E. Tags: REGULATION, PATHWAYS, AND SYSTEMS BIOLOGY Source Type: research
Stoichiometric capacitance reveals the theoretical capabilities of metabolic networks
Motivation: Metabolic engineering aims at modulating the capabilities of metabolic networks by changing the activity of biochemical reactions. The existing constraint-based approaches for metabolic engineering have proven useful, but are limited only to reactions catalogued in various pathway databases.
Results: We consider the alternative of designing synthetic strategies which can be used not only to characterize the maximum theoretically possible product yield but also to engineer networks with optimal conversion capability by using a suitable biochemically feasible reaction called ‘stoichiometric capacitance&rsqu...
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Larhlimi, A., Basler, G., Grimbs, S., Selbig, J., Nikoloski, Z. Tags: REGULATION, PATHWAYS, AND SYSTEMS BIOLOGY Source Type: research
Boolean approach to signalling pathway modelling in HGF-induced keratinocyte migration
Motivation: Cell migration is a complex process that is controlled through the time-sequential feedback regulation of protein signalling and gene regulation. Based on prior knowledge and own experimental data, we developed a large-scale dynamic network describing the onset and maintenance of hepatocyte growth factor-induced migration of primary human keratinocytes. We applied Boolean logic to capture the qualitative behaviour as well as short-and long-term dynamics of the complex signalling network involved in this process, comprising protein signalling, gene regulation and autocrine feedback.
Results: A Boolean model has ...
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Singh, A., Nascimento, J. M., Kowar, S., Busch, H., Boerries, M. Tags: REGULATION, PATHWAYS, AND SYSTEMS BIOLOGY Source Type: research
Identification of chemogenomic features from drug-target interaction networks using interpretable classifiers
Motivation: Drug effects are mainly caused by the interactions between drug molecules and their target proteins including primary targets and off-targets. Identification of the molecular mechanisms behind overall drug–target interactions is crucial in the drug design process.
Results: We develop a classifier-based approach to identify chemogenomic features (the underlying associations between drug chemical substructures and protein domains) that are involved in drug–target interaction networks. We propose a novel algorithm for extracting informative chemogenomic features by using L1 regularized classifiers over...
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Tabei, Y., Pauwels, E., Stoven, V., Takemoto, K., Yamanishi, Y. Tags: PROTEIN INTERACTIONS, MOLECULAR NETWORKS, AND PROTEOMICS Source Type: research
Graphlet-based edge clustering reveals pathogen-interacting proteins
Motivation: Prediction of protein function from protein interaction networks has received attention in the post-genomic era. A popular strategy has been to cluster the network into functionally coherent groups of proteins and assign the entire cluster with a function based on functions of its annotated members. Traditionally, network research has focused on clustering of nodes. However, clustering of edges may be preferred: nodes belong to multiple functional groups, but clustering of nodes typically cannot capture the group overlap, while clustering of edges can. Clustering of adjacent edges that share many neighbors was ...
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Solava, R. W., Michaels, R. P., Milenkovic, T. Tags: PROTEIN INTERACTIONS, MOLECULAR NETWORKS, AND PROTEOMICS Source Type: research
Identifying functional modules in interaction networks through overlapping Markov clustering
Motivation: In recent years, Markov clustering (MCL) has emerged as an effective algorithm for clustering biological networks—for instance clustering protein–protein interaction (PPI) networks to identify functional modules. However, a limitation of MCL and its variants (e.g. regularized MCL) is that it only supports hard clustering often leading to an impedance mismatch given that there is often a significant overlap of proteins across functional modules.
Results: In this article, we seek to redress this limitation. We propose a soft variation of Regularized MCL (R-MCL) based on the idea of iteratively (re-)ex...
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Shih, Y.-K., Parthasarathy, S. Tags: PROTEIN INTERACTIONS, MOLECULAR NETWORKS, AND PROTEOMICS Source Type: research
Techniques to cope with missing data in host-pathogen protein interaction prediction
Motivation: Approaches that use supervised machine learning techniques for protein–protein interaction (PPI) prediction typically use features obtained by integrating several sources of data. Often certain attributes of the data are not available, resulting in missing values. In particular, our host–pathogen PPI datasets have a large fraction, in the range of 58–85% of missing values, which makes it challenging to apply machine learning algorithms.
Results: We show that specialized techniques for missing value imputation can improve the performance of the models significantly. We use cross species informa...
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Kshirsagar, M., Carbonell, J., Klein-Seetharaman, J. Tags: PROTEIN INTERACTIONS, MOLECULAR NETWORKS, AND PROTEOMICS Source Type: research
LocTree2 predicts localization for all domains of life
In this study, we introduced a framework to predict localization in life's three domains, including globular and membrane proteins (3 classes for archaea; 6 for bacteria and 18 for eukaryota). The resulting method, LocTree2, works well even for protein fragments. It uses a hierarchical system of support vector machines that imitates the cascading mechanism of cellular sorting. The method reaches high levels of sustained performance (eukaryota: Q18=65%, bacteria: Q6=84%). LocTree2 also accurately distinguishes membrane and non-membrane proteins. In our hands, it compared favorably with top methods when tested on new data.
A...
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Goldberg, T., Hamp, T., Rost, B. Tags: PROTEIN INTERACTIONS, MOLECULAR NETWORKS, AND PROTEOMICS Source Type: research
EnrichNet: network-based gene set enrichment analysis
Motivation: Assessing functional associations between an experimentally derived gene or protein set of interest and a database of known gene/protein sets is a common task in the analysis of large-scale functional genomics data. For this purpose, a frequently used approach is to apply an over-representation-based enrichment analysis. However, this approach has four drawbacks: (i) it can only score functional associations of overlapping gene/proteins sets; (ii) it disregards genes with missing annotations; (iii) it does not take into account the network structure of physical interactions between the gene/protein sets of inte...
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Glaab, E., Baudot, A., Krasnogor, N., Schneider, R., Valencia, A. Tags: PROTEIN INTERACTIONS, MOLECULAR NETWORKS, AND PROTEOMICS Source Type: research
Protein domain recurrence and order can enhance prediction of protein functions
Motivation: Burgeoning sequencing technologies have generated massive amounts of genomic and proteomic data. Annotating the functions of proteins identified in this data has become a big and crucial problem. Various computational methods have been developed to infer the protein functions based on either the sequences or domains of proteins. The existing methods, however, ignore the recurrence and the order of the protein domains in this function inference.
Results: We developed two new methods to infer protein functions based on protein domain recurrence and domain order. Our first method, DRDO, calculates the posterior pr...
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Messih, M. A., Chitale, M., Bajic, V. B., Kihara, D., Gao, X. Tags: MACROMOLECULAR STRUCTURE, DYNAMICS AND FUNCTION Source Type: research
SANS: high-throughput retrieval of protein sequences allowing 50% mismatches
We present a novel word filter, suffix array neighborhood search (SANS), to identify protein sequence similarities in the range of 50–100% identity with sensitivity comparable to BLAST and 10 times the speed of USEARCH. In contrast to these previous approaches, the complexity of the search is proportional only to the length of the query sequence and independent of database size, enabling fast searching and functional annotation into the future despite rapidly expanding databases.
Availability and implementation: The software is freely available to non-commercial users from our website http://ekhidna.biocenter.helsink...
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Koskinen, J. P., Holm, L. Tags: MACROMOLECULAR STRUCTURE, DYNAMICS AND FUNCTION Source Type: research
A structure-based protocol for learning the family-specific mechanisms of membrane-binding domains
Conclusions: The high accuracy of the learned models and good agreement between the rules discovered using the ADtree classifier and mechanisms reported in the literature reflect the value of machine learning protocols in both prediction and biological knowledge discovery. Our protocol can thus potentially be used as a general function annotation and knowledge mining tool for other protein domains.
Availability: metador.bioengr.uic.edu
Contact: huilu@uic.edu
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Kallberg, M., Bhardwaj, N., Langlois, R., Lu, H. Tags: MACROMOLECULAR STRUCTURE, DYNAMICS AND FUNCTION Source Type: research
Side-chain rotamer changes upon ligand binding: common, crucial, correlate with entropy and rearrange hydrogen bonding
Motivation: Protein movements form a continuum from large domain rearrangements (including folding and restructuring) to side-chain rotamer changes and small rearrangements. Understanding side-chain flexibility upon binding is important to understand molecular recognition events and predict ligand binding.
Methods: In the present work, we developed a well-curated non-redundant dataset of 188 proteins in pairs of structures in the Apo (unbound) and Holo (bound) forms to study the extent and the factors that guide side-chain rotamer changes upon binding.
Results: Our analysis shows that side-chain rotamer changes are widespr...
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Gaudreault, F., Chartier, M., Najmanovich, R. Tags: MACROMOLECULAR STRUCTURE, DYNAMICS AND FUNCTION Source Type: research
Multiple instance learning of Calmodulin binding sites
We present a novel algorithm (MI-1 SVM) for binding site prediction and evaluate its performance on a set of CaM-binding proteins extracted from the Calmodulin Target Database. Our approach directly models the problem of binding site prediction as a large-margin classification problem, and is able to take into account uncertainty in binding site location. We show that the proposed algorithm performs better than the standard SVM formulation, and illustrate its ability to recover known CaM binding motifs. A highly accurate cascaded classification approach using the proposed binding site prediction method to predict CaM bindi...
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Minhas, F. u. A. A., Ben-Hur, A. Tags: MACROMOLECULAR STRUCTURE, DYNAMICS AND FUNCTION Source Type: research
Inferring duplications, losses, transfers and incomplete lineage sorting with nonbinary species trees
We present an algorithm to reconcile a binary gene tree with a nonbinary species tree under a DTLI parsimony criterion. This is the first reconciliation algorithm to capture all four evolutionary processes driving tree incongruence and the first to reconcile non-binary species trees with a transfer model. Our algorithm infers all optimal solutions and reports complete, temporally feasible event histories, giving the gene and species lineages in which each event occurred. It is fixed-parameter tractable, with polytime complexity when the maximum species outdegree is fixed. Application of our algorithms to prokaryotic and eu...
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Stolzer, M., Lai, H., Xu, M., Sathaye, D., Vernot, B., Durand, D. Tags: EVOLUTION, PHYLOGENY, AND COMPARATIVE GENOMICS Source Type: research
Fractionation, rearrangement and subgenome dominance
Motivation: Fractionation is arguably the greatest cause of gene order disruption following whole genome duplication, causing severe biases in chromosome rearrangement-based estimates of evolutionary divergence.
Results: We show how to correct for this bias almost entirely by means of a ‘consolidation’ algorithm for detecting and suitably transforming identifiable regions of fractionation. We characterize the process of fractionation and the performance of the algorithm through realistic simulations. We apply our method to a number of core eudicot genomes, we and by studying the fractionation regions detected, ...
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Sankoff, D., Zheng, C. Tags: EVOLUTION, PHYLOGENY, AND COMPARATIVE GENOMICS Source Type: research
Genomic context analysis reveals dense interaction network between vertebrate ultraconserved non-coding elements
Motivation: Genomic context analysis, also known as phylogenetic profiling, is widely used to infer functional interactions between proteins but rarely applied to non-coding cis-regulatory DNA elements. We were wondering whether this approach could provide insights about utlraconserved non-coding elements (UCNEs). These elements are organized as large clusters, so-called gene regulatory blocks (GRBs) around key developmental genes. Their molecular functions and the reasons for their high degree of conservation remain enigmatic.
Results: In a special setting of genomic context analysis, we analyzed the fate of GRBs after a ...
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Dimitrieva, S., Bucher, P. Tags: EVOLUTION, PHYLOGENY, AND COMPARATIVE GENOMICS Source Type: research
Uncovering the co-evolutionary network among prokaryotic genes
Motivation: Correlated events of gains and losses enable inference of co-evolution relations. The reconstruction of the co-evolutionary interactions network in prokaryotic species may elucidate functional associations among genes.
Results: We developed a novel probabilistic methodology for the detection of co-evolutionary interactions between pairs of genes. Using this method we inferred the co-evolutionary network among 4593 Clusters of Orthologous Genes (COGs). The number of co-evolutionary interactions substantially differed among COGs. Over 40% were found to co-evolve with at least one partner. We partitioned the netwo...
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Cohen, O., Ashkenazy, H., Burstein, D., Pupko, T. Tags: EVOLUTION, PHYLOGENY, AND COMPARATIVE GENOMICS Source Type: research
Evolution of gene neighborhoods within reconciled phylogenies
We describe an algorithm that, given a species tree and a set of gene trees where the leaves are connected by adjacencies, computes an adjacency forest that minimizes the number of gains and breakages of adjacencies (caused by rearrangements) and runs in polynomial time. We use this algorithm to reconstruct contiguous regions of mammalian and plant ancestral genomes in a few minutes for a dozen species and several thousand genes. We show that this method yields reduced conflict between ancestral adjacencies. We detect duplications involving several genes and compare the different modes of evolution between phyla and among ...
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Berard, S., Gallien, C., Boussau, B., Szollosi, G. J., Daubin, V., Tannier, E. Tags: EVOLUTION, PHYLOGENY, AND COMPARATIVE GENOMICS Source Type: research
Nonlinear dimension reduction with Wright-Fisher kernel for genotype aggregation and association mapping
Motivation: Association tests based on next-generation sequencing data are often under-powered due to the presence of rare variants and large amount of neutral or protective variants. A successful strategy is to aggregate genetic information within meaningful single-nucleotide polymorphism (SNP) sets, e.g. genes or pathways, and test association on SNP sets. Many existing methods for group-wise tests require specific assumptions about the direction of individual SNP effects and/or perform poorly in the presence of interactions.
Results: We propose a joint association test strategy based on two key components: a nonlinear s...
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Zhu, H., Li, L., Zhou, H. Tags: MUTATIONS, VARIATIONS, AND POPULATION GENOMICS Source Type: research
An exome sequencing pipeline for identifying and genotyping common CNVs associated with disease with application to psoriasis
Motivation: Despite the prevalence of copy number variation (CNV) in the human genome, only a handful of confirmed associations have been reported between common CNVs and complex disease. This may be partially attributed to the difficulty in accurately genotyping CNVs in large cohorts using array-based technologies. Exome sequencing is now widely being applied to case–control cohorts and presents an exciting opportunity to look for common CNVs associated with disease.
Results: We developed ExoCNVTest: an exome sequencing analysis pipeline to identify disease-associated CNVs and to generate absolute copy number genoty...
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Coin, L. J. M., Cao, D., Ren, J., Zuo, X., Sun, L., Yang, S., Zhang, X., Cui, Y., Li, Y., Jin, X., Wang, J. Tags: MUTATIONS, VARIATIONS, AND POPULATION GENOMICS Source Type: research
Stitching gene fragments with a network matching algorithm improves gene assembly for metagenomics
We present a novel way of using the de Bruijn graph assembly of metagenomes to improve the assembly of genes. A network matching algorithm is proposed for matching the de Bruijn graph of contigs against reference genes, to derive ‘gene paths’ in the graph (sequences of contigs containing gene fragments) that have the highest similarities to known genes, allowing gene fragments contained in multiple contigs to be connected to form more complete (or intact) genes. Tests on simulated and real datasets show that our approach (called GeneStitch) is able to significantly improve the assembly of genes from metagenomic...
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Wu, Y.-W., Rho, M., Doak, T. G., Ye, Y. Tags: SEQUENCING AND SEQUENCE ANALYSIS Source Type: research
MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample
Motivation: Metagenomic binning remains an important topic in metagenomic analysis. Existing unsupervised binning methods for next-generation sequencing (NGS) reads do not perform well on (i) samples with low-abundance species or (ii) samples (even with high abundance) when there are many extremely low-abundance species. These two problems are common for real metagenomic datasets. Binning methods that can solve these problems are desirable.
Results: We proposed a two-round binning method (MetaCluster 5.0) that aims at identifying both low-abundance and high-abundance species in the presence of a large amount of noise due t...
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Wang, Y., Leung, H. C. M., Yiu, S. M., Chin, F. Y. L. Tags: SEQUENCING AND SEQUENCE ANALYSIS Source Type: research
Accurate estimation of short read mapping quality for next-generation genome sequencing
Motivation: Several software tools specialize in the alignment of short next-generation sequencing reads to a reference sequence. Some of these tools report a mapping quality score for each alignment—in principle, this quality score tells researchers the likelihood that the alignment is correct. However, the reported mapping quality often correlates weakly with actual accuracy and the qualities of many mappings are underestimated, encouraging the researchers to discard correct mappings. Further, these low-quality mappings tend to correlate with variations in the genome (both single nucleotide and structural), and suc...
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Ruffalo, M., Koyuturk, M., Ray, S., LaFramboise, T. Tags: SEQUENCING AND SEQUENCE ANALYSIS Source Type: research
Decoding properties of tRNA leave a detectable signal in codon usage bias
This study addresses the codon reading properties of tRNAs and their evolutionary impact on codon usage bias.
Results: Using three different computational methods, the signal of tRNA decoding in codon usage bias is identified. The predictions by the methods generally agree with each other and compare well with experimental evidence of codon reading. This analysis suggests a revised codon reading for cytosolic tRNA in the yeast genome (Saccharomyces cerevisiae) that is more accurate than the common assignment by wobble rules. The results confirm the earlier observation that the wobble rules are not sufficient for a complete...
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Roth, A. C. Tags: SEQUENCING AND SEQUENCE ANALYSIS Source Type: research
DELLY: structural variant discovery by integrated paired-end and split-read analysis
Motivation: The discovery of genomic structural variants (SVs) at high sensitivity and specificity is an essential requirement for characterizing naturally occurring variation and for understanding pathological somatic rearrangements in personal genome sequencing data. Of particular interest are integrated methods that accurately identify simple and complex rearrangements in heterogeneous sequencing datasets at single-nucleotide resolution, as an optimal basis for investigating the formation mechanisms and functional consequences of SVs.
Results: We have developed an SV discovery method, called DELLY, that integrates short...
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Rausch, T., Zichner, T., Schlattl, A., Stutz, A. M., Benes, V., Korbel, J. O. Tags: SEQUENCING AND SEQUENCE ANALYSIS Source Type: research
Indel-tolerant read mapping with trinucleotide frequencies using cache-oblivious kd-trees
Motivation: Mapping billions of reads from next generation sequencing experiments to reference genomes is a crucial task, which can require hundreds of hours of running time on a single CPU even for the fastest known implementations. Traditional approaches have difficulties dealing with matches of large edit distance, particularly in the presence of frequent or large insertions and deletions (indels). This is a serious obstacle both in determining the spectrum and abundance of genetic variations and in personal genomics.
Results: For the first time, we adopt the approximate string matching paradigm of geometric embedding t...
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Mahmud, M. P., Wiedenhoeft, J., Schliep, A. Tags: SEQUENCING AND SEQUENCE ANALYSIS Source Type: research
Long read alignment based on maximal exact match seeds
We present CUSHAW2, a parallelized, accurate, and memory-efficient long read aligner. Our aligner is based on the seed-and-extend approach and uses maximal exact matches as seeds to find gapped alignments. We have evaluated and compared CUSHAW2 to the three other long read aligners BWA-SW, Bowtie2 and GASSST, by aligning simulated and real datasets to the human genome. The performance evaluation shows that CUSHAW2 is consistently among the highest-ranked aligners in terms of alignment quality for both single-end and paired-end alignment, while demonstrating highly competitive speed. Furthermore, our aligner shows good para...
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Liu, Y., Schmidt, B. Tags: SEQUENCING AND SEQUENCE ANALYSIS Source Type: research
Telescoper: de novo assembly of highly repetitive regions
Motivation: With advances in sequencing technology, it has become faster and cheaper to obtain short-read data from which to assemble genomes. Although there has been considerable progress in the field of genome assembly, producing high-quality de novo assemblies from short-reads remains challenging, primarily because of the complex repeat structures found in the genomes of most higher organisms. The telomeric regions of many genomes are particularly difficult to assemble, though much could be gained from the study of these regions, as their evolution has not been fully characterized and they have been linked to aging.
Res...
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Bresler, M., Sheehan, S., Chan, A. H., Song, Y. S. Tags: SEQUENCING AND SEQUENCE ANALYSIS Source Type: research
ECCB 2012 Organization
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Tags: EDITORIAL Source Type: research
ECCB 2012: The 11th European Conference on Computational Biology
Source: Bioinformatics - September 7, 2012 Category: Bioinformatics Authors: Schwede, T., Iber, D. Tags: EDITORIAL Source Type: research

