Login / Register for free to get access to My MedWorm

International Journal of Data Mining and BioinformaticsInternational Journal of Data Mining and Bioinformatics RSS feedThis is an RSS file. You can use it to subscribe to this data in your favourite RSS reader, such as GoogleReader, or to display this data on your own website or blog. subscribe with MyMedWormSubscribe to this data using MyMedWorm.subscribe with GoogleReaderSubscribe to this data using GoogleReader.subscribe with BloglinesSubscribe to this data using Bloglines.subscribe with MyYahooSubscribe to this data using MyYahoo.

This page shows you the latest items in this publication.

An effective convergence independent loop closure method using Forward-Backward Cyclic Coordinate Descent.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
Cyclic Coordinate Descent (CCD) is a popular robotic approach to generate a possible loop that closes the gap between two constrained portions of a protein chain (Canutescu and Dunbrack 2003). In this paper, we describe an effective Forward-Backward CCD (FBCCD) method to connect the two constrained portions of a protein chain without requiring the loop to converge. A test of 30 loops of length 4, 8 and 12 suggests that our method takes fewer number of cycles to produce loops of comparable accuracy and more accurate second portion of the chain, when it is compared to the CCD method. PMID: 19623775 [PubMed - in process]
Source: International Journal of Data Mining and Bioinformatics - July 27, 2009 Category: Bioinformatics Authors: Al-Nasr K, He J Tags: Int J Data Min Bioinform Source Type: journals

Semantic similarity based feature extraction from microarray expression data.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
Previous studies have proven that it is feasible to build sample classifiers using gene expression profiles. To build an effective sample classifier, dimension reduction process is necessary since classic pattern recognition algorithms do not work well in high dimensional space. In this paper, we present a novel feature extraction algorithm by integrating microarray expression data with Gene Ontology (GO). Applying semantic similarity measures, we identify the groups of genes, called virtual genes, which potentially interact with each other for a biological function. The correlation in expressions of virtual genes is u...
Source: International Journal of Data Mining and Bioinformatics - July 27, 2009 Category: Bioinformatics Authors: Cho YR, Zhang A, Xu X Tags: Int J Data Min Bioinform Source Type: journals

Tracking multiple interacting subcellular structure by sequential Monte Carlo method.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
With the wide application of Green Fluorescent Proteins (GFP) in the study of live cells, there is a surging need for computer-aided analysis on the huge amount of image sequence data acquired by the advanced microscopy devices. In this paper, a framework based on Sequential Monte Carlo (SMC) is proposed for multiple interacting object tracking. The distribution of the dimension varying joint state is sampled efficiently by a Reversible Jump Markov Chain Monte Carlo (RJMCMC) algorithm with a novel height swap move. Experimental results were performed on synthetic and real confocal microscopy image sequences. PMID: ...
Source: International Journal of Data Mining and Bioinformatics - July 27, 2009 Category: Bioinformatics Authors: Wen Q, Luby-Phelps K, Gao J Tags: Int J Data Min Bioinform Source Type: journals

Clinical text classification under the Open and Closed Topic Assumptions.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
This paper investigates multi-topic aspects in automatic classification of clinical free text in comparison with general text. In this paper, we facilitate two different views on multi-topics: the Closed Topic Assumption (CTA) and the Open Topic Assumption (OTA). Experimental results show that the characteristics of multi-topic assignments in the Computational Medicine Centre (CMC) Medical NLP Challenge Data is strongly OTA-oriented but general text Reuters-21578 is characterised in the middle of the OTA and CTA spectrum. PMID: 19623772 [PubMed - in process]
Source: International Journal of Data Mining and Bioinformatics - July 27, 2009 Category: Bioinformatics Authors: Sasaki Y, Rea B, Ananiadou S Tags: Int J Data Min Bioinform Source Type: journals

Stroma classification for neuroblastoma on graphics processors.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
Neuroblastoma is one of the most common childhood cancers. We are developing an image analysis system to assist pathologists in their prognosis. Since this system operates on relatively large-scale images and requires sophisticated algorithms, computerised analysis takes a long time to execute. In this paper, we propose a novel approach to benefit from high memory bandwidth and strong floating-point capabilities of graphics processing units. The proposed approach achieves a promising classification accuracy of 99.4% and an execution performance with a gain factor up to 45 times compared to hand-optimised C++ code runni...
Source: International Journal of Data Mining and Bioinformatics - July 27, 2009 Category: Bioinformatics Authors: Ruiz A, Sertel O, Ujaldón M, Catalyurek U, Saltz J, Gurcan MN Tags: Int J Data Min Bioinform Source Type: journals

Clustering sequences by overlap.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
A clustering algorithm is introduced that combines the strengths of clustering and motif finding techniques. Clusters are identified based on unambiguously defined sequence sections as in motif finding algorithms. The definition of similarity within clusters allows transitive matches and, thereby, enables the discovery of remote homologies that cannot be found through motif-finding algorithms. Directed Acyclic Graph (DAG) structures are constructed that link short clusters to the longer ones. We compare the clustering results to the corresponding domains in the InterPro database. A second comparison shows that annotati...
Source: International Journal of Data Mining and Bioinformatics - July 27, 2009 Category: Bioinformatics Authors: Dorr DH, Denton AM Tags: Int J Data Min Bioinform Source Type: journals

A semi-supervised approach to projected clustering with applications to microarray data.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
Recent studies have suggested that extremely low dimensional projected clusters exist in real datasets. Here, we propose a new algorithm for identifying them. It combines object clustering and dimension selection, and allows the input of domain knowledge in guiding the clustering process. Theoretical and experimental results show that even a small amount of input knowledge could already help detect clusters with only 1% of the relevant dimensions. We also show that this semi-supervised algorithm can perform knowledge-guided selective clustering when there are multiple meaningful object groupings. The algorithm is also ...
Source: International Journal of Data Mining and Bioinformatics - July 27, 2009 Category: Bioinformatics Authors: Yip KY, Cheung L, Cheung DW, Jing L, Ng MK Tags: Int J Data Min Bioinform Source Type: journals

Spherical-harmonic decomposition for molecular recognition in electron-density maps.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
Several methods for automatically constructing a protein model from an electron-density map require searching for many small protein-fragment templates in the density. We propose to use the spherical-harmonic decomposition of the template and the maps density to speed this matching. Unlike other template-matching approaches, this allows us to eliminate large portions of the map unlikely to match any templates. We train several first-pass filters for this elimination task. We show our new template-matching method improves accuracy and reduces running time, compared to previous approaches. Finally, we extend our method t...
Source: International Journal of Data Mining and Bioinformatics - June 13, 2009 Category: Bioinformatics Authors: DiMaio FP, Soni AB, Phillips GN, Shavlik JW Tags: Int J Data Min Bioinform Source Type: journals

A space-efficient algorithm for three sequence alignment and ancestor inference.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
We propose a novel algorithm to simultaneously align three biological sequences with affine gap model and infer their common ancestral sequence. It applies the divide-and-conquer strategy to reduce the memory usage from O(n3) to O(n2). At the same time, it is based on dynamic programming and thus the optimal alignment is guaranteed. We implemented the algorithm and tested it extensively with both BAliBASE dataset and simulation data generated by Random Model of Sequence Evolution (ROSE). Compared with other popular multiple sequence alignment tools such as ClustalW and T-Coffee, our program produces not only better ali...
Source: International Journal of Data Mining and Bioinformatics - June 13, 2009 Category: Bioinformatics Authors: Yue F, Tang J Tags: Int J Data Min Bioinform Source Type: journals

Feature cluster selection for high-throughput data analysis.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
Feature selection is effective in selecting predictive gene sets for microarray classification. However, the large number of predictive gene sets and the disparity among them presents a challenge for identifying potential biomarkers. To facilitate biomarker identification, we present a new data mining task, feature cluster selection, which selects from a full set of features a small number of coherent and predictive feature clusters. We provide both theoretical definition and empirical formulation for the new problem, and propose an efficient 3M algorithm. Experiments on microarray data have shown that the 3M algorithm...
Source: International Journal of Data Mining and Bioinformatics - June 13, 2009 Category: Bioinformatics Authors: Yu L Tags: Int J Data Min Bioinform Source Type: journals

Computational identification of protein-coding sequences by comparative analysis.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
Gene prediction is an essential step in understanding the genome of a species once it has been sequenced. For that, a promising direction in current research on gene finding is a comparative genomics approach. In this paper, we present a novel approach to identifying evolutionarily conserved protein-coding sequences in genomes. The method takes advantage of the specific substitution pattern of coding sequences together with the consistency of reading frames. It has been implemented in a software called PROTEA. Large-scale experimentation shows good results. PROTEA is intended to be a useful complement to existing tools...
Source: International Journal of Data Mining and Bioinformatics - June 13, 2009 Category: Bioinformatics Authors: Fontaine A, Touzet H Tags: Int J Data Min Bioinform Source Type: journals

Study of microarray time series data based on Forward-Backward Linear Prediction and Singular Value Decomposition.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
We propose a method to analyse the periodicities of gene expression profiles based on the spectral domain approach. Our spectral reconstruction method outperforms three other recently proposed methods, which do not require any prior knowledge. It is proven that an alternative method for studying cell-cycle regulation is possible even where very little prior knowledge is available. We also investigate the potential of combining signals with similar frequency components to form an overdetermined system of equations, and use least squares solution to estimate the spectral frequency. Results show that this new method is ab...
Source: International Journal of Data Mining and Bioinformatics - June 13, 2009 Category: Bioinformatics Authors: Choong MK, Levy D, Yan H Tags: Int J Data Min Bioinform Source Type: journals

Double iterative optimisation for metabolic network-based drug target identification.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
We present novel and scalable algorithms for finding a set of enzymes, whose inhibition stops the production of a given set of target compounds, while eliminating minimal number of non-target compounds. Experimental results are presented for the E. coli metabolic network to demonstrate the accuracy and efficiency of our iterative method. PMID: 19517985 [PubMed - in process]
Source: International Journal of Data Mining and Bioinformatics - June 13, 2009 Category: Bioinformatics Authors: Song B, Sridhar P, Kahveci T, Ranka S Tags: Int J Data Min Bioinform Source Type: journals

Discovering implicit associations among critical biological entities.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
We propose an approach to predicting implicit gene-disease associations based on the inference network, whereby genes and diseases are represented as nodes and are connected via two types of intermediate nodes: gene functions and phenotypes. To estimate the probabilities involved in the model, two learning schemes are compared; one baseline using co-annotations of keywords and the other taking advantage of free text. Additionally, we explore the use of domain ontologies to complement data sparseness and examine the impact of full text documents. The validity of the proposed framework is demonstrated on the benchmark da...
Source: International Journal of Data Mining and Bioinformatics - June 13, 2009 Category: Bioinformatics Authors: Seki K, Mostafa J Tags: Int J Data Min Bioinform Source Type: journals

Irrelevant gene elimination for partial least squares based dimension reduction by using feature probes.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
It is hard to analyse gene expression data which has only a few observations but with thousands of measured genes. Partial Least Squares based Dimension Reduction (PLSDR) is superior for handling such high dimensional problems, but irrelevant features will introduce errors into the dimension reduction process. Here, feature selection is applied to filter the data and an algorithm named PLSDRg is described by integrating PLSDR with gene elimination, which is performed by the indication of t-statistic scores on standardised probes. Experimental results on six microarray data sets show that PLSDRg is effective and reliabl...
Source: International Journal of Data Mining and Bioinformatics - June 13, 2009 Category: Bioinformatics Authors: Zeng XQ, Li GZ, Wu GF, Yang JY, Yang MQ Tags: Int J Data Min Bioinform Source Type: journals

A hybrid graph-theoretic method for mining overlapping functional modules in large sparse protein interaction networks.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
Modular architecture, which encompasses groups of genes/proteins involved in elementary biological functional units, is a basic form of the organisation of interacting proteins. Here, we propose a method that combines the Line Graph Transformation (LGT) and clique percolation-clustering algorithm to detect network modules, which may overlap each other in large sparse PPI networks. The resulting modules by the present method show a high coverage among yeast, fly, and worm PPI networks, respectively. Our analysis of the yeast PPI network suggests that most of these modules have well-biological significance in context of ...
Source: International Journal of Data Mining and Bioinformatics - June 13, 2009 Category: Bioinformatics Authors: Zhang S, Liu HW, Ning XM, Zhang XS Tags: Int J Data Min Bioinform Source Type: journals

Predicting protein-protein interfaces as clusters of optimal docking area points.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
Desolvation property is used here to predict protein-protein binding sites exploiting the fact that lower-valued 'optimal docking area' ODA (Fernandez-Recio et al., 2005) points form cluster at the interface. The proposed method involves two steps; clustering the ODA points and representing ODA points by average ODA values. On 51 nonredundant proteins, results show the success rate improved considerably. Considering only significant ODA, the previous ODA method has obtained a success rate of 65% with overall success rate of 39%. The proposed method improved the overall success rate to 61%. Further, comparable results w...
Source: International Journal of Data Mining and Bioinformatics - June 13, 2009 Category: Bioinformatics Authors: Arafat Y, Kamruzzaman J, Karmakar GC, Fernandez-Recio J Tags: Int J Data Min Bioinform Source Type: journals

An on demand data integration model for biological databases.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
This paper presents a user-centric biological query system for information integration and knowledge acquisition from distributed, semantically heterogeneous data sources. The proposed system, BioXBase, extracts user requested query information over the internet from multiple biological sources and organises this information into a homogeneous unified view to the user. This entire process is done in real time on-the-fly. The BioXBase system has improved the results retrieved by 30% compared to a system that has only a local database. The BioXBase system is further enhanced by 20% while combining the results with a loca...
Source: International Journal of Data Mining and Bioinformatics - June 13, 2009 Category: Bioinformatics Authors: Palakal M, Naidu P Tags: Int J Data Min Bioinform Source Type: journals

Finding new core promoter elements using backward-looking strategies.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
Core Promoter Elements (CPEs) were key players in transcription initiation. Identifying CPEs is crucial for understanding gene expression. In this paper, a framework for finding new CPEs was proposed. An experiment was performed on the sequences of Eukaryotic Promoter Database (EPD). From the results, the known CPEs were all recovered; in addition, five new motifs were discovered in Drosophila and three in human. By comparing the results with currently known CPEs, it is shown that the proposed system is feasible and reliable, and these new CPEs are worth of further exploration. PMID: 19432374 [PubMed - indexed for MEDLINE]
Source: International Journal of Data Mining and Bioinformatics - June 13, 2009 Category: Bioinformatics Authors: Huang YF, Jhan YC, Liou SW Tags: Int J Data Min Bioinform Source Type: journals

A cube framework for incorporating inter-gene information into biological data mining.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
Large volumes of microarray data are registered daily in public repositories such as SMD (Belkin and Niyogi, 2003) and GEO (Ashburner et al., 2000). Such repositories have quickly become a community resource. However, due to the inherent heterogeneity of the microarray experiments, the data generated from different experiments could not be directly integrated and hence the resources have not been fully utilised. To address this problem, we propose a new microarray integration framework that achieves high-quality integration through exploiting invariant features such as relative information among genes. We also show how...
Source: International Journal of Data Mining and Bioinformatics - June 13, 2009 Category: Bioinformatics Authors: Lin KM, Kang J, Shin H, Lee J Tags: Int J Data Min Bioinform Source Type: journals

22nd annual ACM symposium on applied computing.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
PMID: 19432372 [PubMed - indexed for MEDLINE]
Source: International Journal of Data Mining and Bioinformatics - June 13, 2009 Category: Bioinformatics Authors: Palakal M Tags: Int J Data Min Bioinform Source Type: journals

Classification techniques with minimal labelling effort and application to medical reports.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
There are a number of approaches to classify text documents. Here, we use Partially Supervised Classification (PSC) and argue that it is an effective and efficient approach for real-world problems. PSC uses a two-step strategy to cut down on the labelling effort. There are a number of methods that have been proposed for each step. An evaluation of various methods is conducted using real-world medical documents. The results show that using EM to build the classifier yields better results than SVM. We also experimentally show that careful selection of a subset of features to represent the documents can improve performanc...
Source: International Journal of Data Mining and Bioinformatics - November 27, 2008 Category: Bioinformatics Authors: Saad FH, Bell GD, de la Iglesia B Tags: Int J Data Min Bioinform Source Type: journals

A Bayesian framework for knowledge driven regression model in micro-array data analysis.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
We presented a full Bayesian framework to effectively exploit the similarity information of the input variables for linear regression. Empirical studies with gene expression data show that the regression errors can be reduced significantly by incorporating the similarity information derived from gene ontology. PMID: 19024497 [PubMed - in process]
Source: International Journal of Data Mining and Bioinformatics - November 27, 2008 Category: Bioinformatics Authors: Jin R, Si L, Chan C Tags: Int J Data Min Bioinform Source Type: journals

Sparse p-norm Nonnegative Matrix Factorization for clustering gene expression data.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
Nonnegative Matrix Factorization (NMF) is a powerful tool for gene expression data analysis as it reduces thousands of genes to a few compact metagenes, especially in clustering gene expression samples for cancer class discovery. Enhancing sparseness of the factorisation can find only a few dominantly coexpressed metagenes and improve the clustering effectiveness. Sparse p-norm (p > 1) Nonnegative Matrix Factorization (Sp-NMF) is a more sparse representation method using high order norm to normalise the decomposed components. In this paper, we investigate the benefit of high order normalisation for clustering cancer...
Source: International Journal of Data Mining and Bioinformatics - November 27, 2008 Category: Bioinformatics Authors: Liu W, Yuan K Tags: Int J Data Min Bioinform Source Type: journals

Scoring and summarising gene product clusters using the Gene Ontology.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
We propose an approach for quantifying the biological relatedness between gene products, based on their properties, and measure their similarities using exclusively statistical NLP techniques and Gene Ontology (GO) annotations. We also present a novel similarity figure of merit, based on the vector space model, which assesses gene expression analysis results and scores gene product clusters' biological coherency, making sole use of their annotation terms and textual descriptions. We define query profiles which rapidly detect a gene product cluster's dominant biological properties. Experimental results validate our appr...
Source: International Journal of Data Mining and Bioinformatics - November 27, 2008 Category: Bioinformatics Authors: Denaxas SC, Tjortjis C Tags: Int J Data Min Bioinform Source Type: journals

Word Sense Disambiguation in biomedical ontologies with term co-occurrence analysis and document clustering.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
With more and more genomes being sequenced, a lot of effort is devoted to their annotation with terms from controlled vocabularies such as the GeneOntology. Manual annotation based on relevant literature is tedious, but automation of this process is difficult. One particularly challenging problem is word sense disambiguation. Terms such as 'development' can refer to developmental biology or to the more general sense. Here, we present two approaches to address this problem by using term co-occurrences and document clustering. To evaluate our method we defined a corpus of 331 documents on development and developmental bi...
Source: International Journal of Data Mining and Bioinformatics - November 27, 2008 Category: Bioinformatics Authors: Andreopoulos B, Alexopoulou D, Schroeder M Tags: Int J Data Min Bioinform Source Type: journals

Discovery of metabolite features for the modelling and analysis of high-resolution NMR spectra.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
This study presents three feature selection methods for identifying the metabolite features in nuclear magnetic resonance spectra that contribute to the distinction of samples among varying nutritional conditions. Principal component analysis, Fisher discriminant analysis, and Partial Least Square Discriminant Analysis (PLS-DA) were used to calculate the importance of individual metabolite feature in spectra. Moreover, an Orthogonal Signal Correction (OSC) filter was used to eliminate unnecessary variations in spectra. We evaluated the presented methods by comparing the ability of classification based on the features selec...
Source: International Journal of Data Mining and Bioinformatics - September 6, 2008 Category: Bioinformatics Authors: Cho HW, Kim SB, Jeong MK, Park Y, Miller NG, Ziegler TR, Jones DP Tags: Int J Data Min Bioinform Source Type: journals

Protein homology detection with biologically inspired features and interpretable statistical models.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
Computational classification of proteins using methods such as string kernels and Fisher-SVM has demonstrated great success. However, the resulting models do not offer an immediate interpretation of the underlying biological mechanisms. In this work, we propose a biologically motivated feature set combined with a sparse classifier, based on a small subset of positions and residues in protein sequences, for protein superfamily detection and show the performance of our models is comparable to that of the state-of-the-art methods on a benchmark dataset. The set of sparse critical features discovered by the models is consi...
Source: International Journal of Data Mining and Bioinformatics - September 6, 2008 Category: Bioinformatics Authors: Huang PH, Pavlovic V Tags: Int J Data Min Bioinform Source Type: journals

Large-scale Protein-Protein Interaction prediction using novel kernel methods.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
Knowledge of Protein-Protein Interactions (PPIs) can give us new insights into molecular mechanisms and properties of the cell. In this paper, we propose a novel domain-based kernel method to predict PPIs. A new kernel that measures the similarity between protein pairs based on a new feature representation is developed and applied to a large scale PPI database. Experimental results demonstrate its effectiveness. Furthermore, we evaluate the problem of cross-species PPI prediction and the effect of the number of negative samples on the performance of PPI predictions, which are two fundamental problems in most in silico ...
Source: International Journal of Data Mining and Bioinformatics - September 6, 2008 Category: Bioinformatics Authors: Chen XW, Han B, Fang J, Haasl RJ Tags: Int J Data Min Bioinform Source Type: journals

Handling gene redundancy in microarray data using Grey Relational Analysis.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
Gene selection is one of the important and frequently used techniques for microarray data classification. In this paper, we introduce a new metric to measure gene-class relevance and gene-gene redundancy. The new metric is based on Grey Relational Analysis (GRA), called Grey Relational Grade (GRG), and never used in gene selection before. Based on the GRG, we develop a new gene selection method, which uses GRG to group similar genes to clusters, and then select informative genes from each cluster to avoid redundancy. Experiments on public data sets demonstrate the effectiveness of the proposed method. PMID: 1876735...
Source: International Journal of Data Mining and Bioinformatics - September 6, 2008 Category: Bioinformatics Authors: Zhang LJ, Li ZJ, Chen HW Tags: Int J Data Min Bioinform Source Type: journals

Identification of Intrinsically Unstructured Proteins using hierarchical classifier.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
It is suggested that protein functions only when folded into a particular 3-D structure. Recently, many protein regions and some entire proteins have been identified with no definite tertiary structure, but presenting instead as dynamic, disorder ensembles under different physiochemical circumstances. These proteins and regions are known as Intrinsically Unstructured regions and Proteins (IUP). We constructed a Recursive Maximum Contrast Tree (RMCT) based classifier to identify IUP. The classifier has been benchmarked against industrial standard PONDR VLXT on out-of-sample data by external evaluators. The IUP predictor...
Source: International Journal of Data Mining and Bioinformatics - September 6, 2008 Category: Bioinformatics Authors: Yang JY, Yang MQ Tags: Int J Data Min Bioinform Source Type: journals

Message Passing Clustering (MPC): a knowledge-based framework for clustering under biological constraints.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
A new clustering algorithm, Message Passing Clustering (MPC), is proposed. MPC employs the concept of message passing to describe parallel and spontaneous clustering process by allowing data objects to communicate with each other. MPC also provides an extensible framework to accommodate additional features into clustering, such as adaptive feature weights scaling, stochastic cluster merging, and semi-supervised constraints guiding. Extensive experiments were performed using both simulation and real microarray gene expression and phylogenetic data. The results showed that MPC performed favourably to other popular cluste...
Source: International Journal of Data Mining and Bioinformatics - September 6, 2008 Category: Bioinformatics Authors: Geng H, Deng X, Ali HH Tags: Int J Data Min Bioinform Source Type: journals

A rule-based approach for RNA pseudoknot prediction.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
RNA plays a critical role in mediating every step of cellular information transfer from genes to functional proteins. Pseudoknots are functionally important and widely occurring structural motifs found in all types of RNA. Therefore predicting their structures is an important problem. In this paper, we present a new RNA pseudoknot structure prediction method based on term rewriting. The method is implemented using the Mfold RNA/DNA folding package and the term rewriting language Maude. In our method, RNA structures are treated as terms and rules are discovered for predicting pseudoknots. Our method was tested on 211 ps...
Source: International Journal of Data Mining and Bioinformatics - June 18, 2008 Category: Bioinformatics Authors: Fu XZ, Wang H, Harrison RW, Harrison WL Tags: Int J Data Min Bioinform Source Type: journals

An integrative approach for biological data mining and visualisation.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
We present a system to integrate data across multiple bioinformatics databases and enable mining across various conceptual levels of biological information. The results are represented as complex networks. Context dependent mining of these networks is achieved by use of distances. Our approach is demonstrated with three applications: full metabolic network retrieval with network topology study, exploration of properties and relationships of a set of selected proteins, and combined visualisation and exploration of gene expression data with related pathways and ontologies. PMID: 18399328 [PubMed - indexed for MEDLINE]
Source: International Journal of Data Mining and Bioinformatics - June 18, 2008 Category: Bioinformatics Authors: Gopalacharyulu PV, Lindfors E, Miettinen J, Bounsaythip CK, Oresic M Tags: Int J Data Min Bioinform Source Type: journals

Temporal representation for gene networks: towards a qualitative temporal data mining.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
Processing literature (i.e., text corpora) to capture gene regulation events is not easy and can be driven by the final data representation. We propose to build, manually, an example of temporal representation (whole gene networks for coat formation in Bacillus Subtilis). Our temporal representation is based on a generalised formal language theory (S-languages). We propose an algorithm to link bags of relations with representation, by ordering interactions. In this paper, starting from the network made manually from text data, we show that S-languages are quite relevant to encapsulate gene properties, and infer knowled...
Source: International Journal of Data Mining and Bioinformatics - June 18, 2008 Category: Bioinformatics Authors: Turenne N, Schwer SR Tags: Int J Data Min Bioinform Source Type: journals

Segmentation of short human exons based on spectral features of double curves.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
This paper presents a new segmentation method based on spectral analysis to locate borders between short protein coding regions and non-coding regions. We formulate the innovative double curve representation of a DNA sequence and apply local three-codon measurement on the discrete Fourier spectral features at 1/3 frequency to identify short protein coding regions. The proposed spectral segmentation method based on double curves requires no prior knowledge of the DNA data. Our simulation results show that the proposed spectral method greatly improves the accuracy of identifying short coding regions in DNA sequences comp...
Source: International Journal of Data Mining and Bioinformatics - June 18, 2008 Category: Bioinformatics Authors: Jiang R, Yan H Tags: Int J Data Min Bioinform Source Type: journals

Gene Regulatory Network modelling: a state-space approach.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
This study proposes a state-space model with control portion for inferring Gene Regulatory Networks (GRNs). The proposed model views genes as the observation variables, whose expression values depend on the current internal state variables and control variables, and views the means of clusters of gene expression as the control variables of the internal state equation. Bayesian Information Criterion (BIC) and Probabilistic Principal Component Analysis (PPCA) are used to estimate the internal states from observation data. The proposed approach is applied to two gene expression datasets. Computational results show that inferr...
Source: International Journal of Data Mining and Bioinformatics - June 18, 2008 Category: Bioinformatics Authors: Wu FX Tags: Int J Data Min Bioinform Source Type: journals

Biomedical text summarisation using concept chains.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
BioChainSumm is a biomedical text summariser utilising concept chaining (called BioChain) to link semantically-related concepts within biomedical text together. The BioChain process is adapted from existing lexical chaining approaches which chain semantically-related terms rather than concepts. The BioChain concept chains are used to identify salient candidate sentences which are extracted to produce a summary of the biomedical text. The Unified Medical Language System Metathesaurus and Semantic Network semantic resources identify related biomedical concepts. BioChainSumm is evaluated using the ROUGE system along with ...
Source: International Journal of Data Mining and Bioinformatics - January 1, 2007 Category: Bioinformatics Authors: Reeve LH, Han H, Brooks AD Tags: Int J Data Min Bioinform Source Type: journals

A Merge-Decoupling Dead End Elimination algorithm for protein side-chain conformation.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
We present a Merge-Decoupling DEE (MD-DEE) that further reduces the number of rotamers after SG-DEE. MD-DEE works by forming residue-pairs but is fast and, like SG-DEE, is practical even for large proteins. Our experiments show that MD-DEE achieves further reduction in residue elimination (up to 25%) after SG-DEE. PMID: 18402048 [PubMed - indexed for MEDLINE]
Source: International Journal of Data Mining and Bioinformatics - January 1, 2007 Category: Bioinformatics Authors: Chong KF, Leong HW Tags: Int J Data Min Bioinform Source Type: journals

A constraint logic programming approach to associate 1D and 3D structural components for large protein complexes.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
The paper describes a novel framework, constructed using Constraint Logic Programming (CLP) and parallelism, to determine the association between parts of the primary sequence of a protein and alpha-helices extracted from 3D low-resolution descriptions of large protein complexes. The association is determined by extracting constraints from the 3D information, regarding length, relative position and connectivity of helices, and solving these constraints with the guidance of a secondary structure prediction algorithm. Parallelism is employed to enhance performance on large proteins. The framework provides a fast, inexpen...
Source: International Journal of Data Mining and Bioinformatics - January 1, 2007 Category: Bioinformatics Authors: Dal Palù A, Pontelli E, He J, Lu Y Tags: Int J Data Min Bioinform Source Type: journals

Transductive learning with EM algorithm to classify proteins based on phylogenetic profiles.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
We proposed a novel method for protein classification based on phylogenetic profiles. Each protein's profile was extended with extra bits encoding the phylogenetic tree structure and the likelihood, in the form of weights on profile indices, of the protein's functional family membership in each of the reference genomes. The extended profiles were then integrated as part of a kernel of a support vector machine, which was trained in a transductive learning scheme using the EM algorithm to update the weights. Classification accuracy was greatly increased when tested on the proteome of Saccharomyces cerevisiae using the MI...
Source: International Journal of Data Mining and Bioinformatics - January 1, 2007 Category: Bioinformatics Authors: Craig RA, Liao L Tags: Int J Data Min Bioinform Source Type: journals

Simulating the cellular passive transport of glucose using a time-dependent extension of Gillespie algorithm for stochastic pi-calculus.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
Realistic simulations of the biological systems evolution require a mathematical model of the stochasticity of the involved processes and a formalism for specifying the concurrent nature of the biochemical interactions. A time-dependent extension of the Gillespie algorithm implementing the race condition of the stochastic pi-calculus formalism satisfies both these requirements. This paper formulates those modifications to the original Gillespie algorithm necessary when the time dependence of the reaction propensity is due to changes either of volume or temperature. This re-formulation has been incorporated in the frame...
Source: International Journal of Data Mining and Bioinformatics - January 1, 2007 Category: Bioinformatics Authors: Lecca P Tags: Int J Data Min Bioinform Source Type: journals

Exploring alternative knowledge representations for protein secondary-structure prediction.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
Methods for 3-class secondary-structure prediction are thought to be reaching the highest achievable accuracy. Their accuracy on beta-sheet residue class is considerably lower than for the other two classes. We analysed the relevance of 315 individual input attributes for a predictor with the usual framework of using sequence-profile based data with an input window of fixed size. We propose two alternative knowledge representations with significantly smaller sets of input attributes. We also investigated the possibility of exploiting the prediction of connected pairs of beta-sheet residues and the prediction of residue...
Source: International Journal of Data Mining and Bioinformatics - January 1, 2007 Category: Bioinformatics Authors: Midic U, Dunker AK, Obradovic Z Tags: Int J Data Min Bioinform Source Type: journals

Granular kernel trees with parallel genetic algorithms for drug activity comparisons.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
With the growing interests of biological data prediction and chemical data prediction, more powerful and flexible kernels need to be designed so that the prior knowledge and relationships within data can be expressed effectively in kernel functions. In this paper, Granular Kernel Trees (GKTs) are proposed and parallel Genetic Algorithms (GAs) are used to optimise the parameters of GKTs. In applications, SVMs with new kernel trees are employed for drug activity comparisons. The experimental results show that GKTs and evolutionary GKTs can achieve better performances than traditional RBF kernels in terms of prediction ac...
Source: International Journal of Data Mining and Bioinformatics - January 1, 2007 Category: Bioinformatics Authors: Jin B, Zhang YQ, Wang B Tags: Int J Data Min Bioinform Source Type: journals

Prediction of Protein Secondary Structure with two-stage multi-class SVMs.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
Bioinformatics techniques to Protein Secondary Structure (PSS) prediction mostly depend on the information available in amino acid sequences. In this paper, we propose a two-stage Multi-class Support Vector Machine (MSVM) approach, where the second MSVM predictor is introduced at the output of the first stage MSVM to capture the contextual relationship among secondary structure elements in order to minimise the generalisation error in the prediction. By using position-specific scoring matrices generated by PSI-BLAST, the two-stage MSVM approach achieves Q3 accuracies of 78.0% and 76.3% on the RS126 dataset of 126 non-h...
Source: International Journal of Data Mining and Bioinformatics - January 1, 2007 Category: Bioinformatics Authors: Nguyen MN, Rajapakse JC Tags: Int J Data Min Bioinform Source Type: journals

A parallel edge-betweenness clustering tool for Protein-Protein Interaction networks.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
The increasing availability of protein-protein interaction graphs (PPI) requires new efficient tools capable of extracting valuable biological knowledge from these networks. Among the wide range of clustering algorithms, Girvan and Newman's edge betweenness algorithm showed remarkable performances in discovering clustering structures in several real-world networks. Unfortunately, their algorithm suffers from high computational cost and it is impractical for inputs of the size of large PPI networks. Here we report on a novel parallel implementation of Girvan and Newman's clustering algorithm that achieves almost linear ...
Source: International Journal of Data Mining and Bioinformatics - January 1, 2007 Category: Bioinformatics Authors: Yang Q, Lonardi S Tags: Int J Data Min Bioinform Source Type: journals

Simulation study in Probabilistic Boolean Network models for genetic regulatory networks.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
Probabilistic Boolean Network (PBN) is widely used to model genetic regulatory networks. Evolution of the PBN is according to the transition probability matrix. Steady-state (long-run behaviour) analysis is a key aspect in studying the dynamics of genetic regulatory networks. In this paper, an efficient method to construct the sparse transition probability matrix is proposed, and the power method based on the sparse matrix-vector multiplication is applied to compute the steady-state probability distribution. Such methods provide a tool for us to study the sensitivity of the steady-state distribution to the influence of...
Source: International Journal of Data Mining and Bioinformatics - January 1, 2007 Category: Bioinformatics Authors: Zhang SQ, Ching WK, Ng MK, Akutsu T Tags: Int J Data Min Bioinform Source Type: journals

Text analysis of MEDLINE for discovering functional relationships among genes: evaluation of keyword extraction weighting schemes.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
One of the key challenges of microarray studies is to derive biological insights from the gene-expression patterns. Clustering genes by functional keyword association can provide direct information about the functional links among genes. However, the quality of the keyword lists significantly affects the clustering results. We compared two keyword weighting schemes: normalised z-score and term frequency-inverse document frequency (TFIDF). Two gene sets were tested to evaluate the effectiveness of the weighting schemes for keyword extraction for gene clustering. Using established measures of cluster quality, the results...
Source: International Journal of Data Mining and Bioinformatics - January 1, 2006 Category: Bioinformatics Authors: Liu Y, Navathe SB, Pivoshenko A, Dasigi VG, Dingledine R, Cilia BJ Tags: Int J Data Min Bioinform Source Type: journals

State-space approach with the maximum likelihood principle to identify the system generating time-course gene expression data of yeast.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
We use linear Gaussian state-space models to analyse time-course gene expression data of yeast. They are modelled to be generated from hidden state variables in a system. To identify the system, we estimate parameters of the model by EM algorithm and determine the dimension of the state variable by BIC. PMID: 18402043 [PubMed - indexed for MEDLINE]
Source: International Journal of Data Mining and Bioinformatics - January 1, 2006 Category: Bioinformatics Authors: Yamaguchi R, Higuchi T Tags: Int J Data Min Bioinform Source Type: journals

Kernel design for RNA classification using Support Vector Machines.email this articleEmail this article to a colleague. save this article to My ClippingsSave this article to My Clippings. discuss this articleDiscuss or comment on this article.
Support Vector Machines (SVMs) are a state-of-the-art machine learning tool widely used in speech recognition, image processing and biological sequence analysis. An essential step in SVMs is to devise a kernel function to compute the similarity between two data points. In this paper we review recent advances of using SVMs for RNA classification. In particular we present a new kernel that takes advantage of both global and local structural information in RNAs and uses the information together to classify RNAs. Experimental results demonstrate the good performance of the new kernel and show that it outperforms existing k...
Source: International Journal of Data Mining and Bioinformatics - January 1, 2006 Category: Bioinformatics Authors: Wang JT, Wu X Tags: Int J Data Min Bioinform Source Type: journals