Publication View

Received on: (2008)

Abstract
Full Paper Kernel methods, like the well-known Support Vector Machine (SVM), have gained a growing interest during the last years for designing QSAR/QSPR models having a high predictive strength. One of the key concepts of SVMs is the usage of a so-called kernel function, which can be thought of as a special similarity measure. In this paper we consider kernels for molecular structures, which are based on a graph representation of chemical compounds. The similarity score is calculated by computing an optimal assignment of the atoms from one molecule to those of another one, including information on specific chemical properties, membership to a substructure (e.g. aromatic ring, carbonyl group, etc.) and neighborhood for each atom. We show that by using this kernel we can achieve a generalization performance comparable to a classical model with a few descriptors, which are a-priori known to be relevant for the problem, and significantly better results than with and without performing an automatic descriptor selection. For this purpose we investigate ADME classification and regression datasets for predicting bioavailability (Yoshida), human intestinal absorption (HIA), blood-brain-barrier (BBB) penetration and a dataset consisting of 4 different inhibitor classes (SOL). We further explore the effect of combining our kernel with a problem dependent descriptor set. We also demonstrate the usefulness of an extension of our method to a reduced graph representation of molecules, in which certain structural features, like e.g. rings, donors or acceptors, are represented as a single node in the molecular graph. 1

Publication details
Download http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.73.3162
Source http://www-ra.informatik.uni-tuebingen.de/publikationen/2005/froehlich05QSAR&CombSci.pdf
Contributors CiteSeerX
Repository CiteSeerX - Scientific Literature Digital Library and Search Engine (United States)
Keywords molecular graph mining, graph representation, reduced graph representation, molecular similarity, Kernel Methods, Support Vector Machines Abbreviations, Support Vector Machine – SVM, Human Intestinal Absorption – HIA, Blood Brain Barrier – BBB, Search and Optimization of Lead Structures – SOL
Type text
Language English
Relation 10.1.1.15.9362, 10.1.1.11.2062, 10.1.1.30.525, 10.1.1.127.6527, 10.1.1.30.3875, 10.1.1.3.8934, 10.1.1.33.5447, 10.1.1.16.1922, 10.1.1.122.7088, 10.1.1.42.1588, 10.1.1.102.7476, 10.1.1.90.7556, 10.1.1.18.8133, 10.1.1.113.103, 10.1.1.28.576, 10.1.1.3.9076, 10.1.1.95.6608, 10.1.1.53.757, 10.1.1.84.5276, 10.1.1.60.6128, 10.1.1.1.8977