Mass spectrometry tools and metabolite-specific databases for molecular identification in metabolomics


Brown M, Dunn WB, Dobson P, Patel Y, Winder CL, Francis-McIntyre S, Begley P, Carroll K, Broadhurst D, Tseng A, Swainston N, Spasic I, Goodacre R, Kell DB

Analyst. 2009;134(7):1322-32

Go to article 

Abstract
The chemical identification of mass spectrometric signals in metabolomic applications is important to provide conversion of analytical data to biological knowledge about metabolic pathways. The complexity of electrospray mass spectrometric data acquired from a range of samples (serum, urine, yeast intracellular extracts, yeast metabolic footprints, placental tissue metabolic footprints) has been investigated and has defined the frequency of different ion types routinely detected. Although some ion types were expected (protonated and deprotonated peaks, isotope peaks, multiply charged peaks) others were not expected (sodium formate adduct ions). In parallel, the Manchester Metabolomics Database (MMD) has been constructed with data from genome scale metabolic reconstructions, HMDB, KEGG, Lipid Maps, BioCyc and DrugBank to provide knowledge on 42,687 endogenous and exogenous metabolite species. The combination of accurate mass data for a large collection of metabolites, theoretical isotope abundance data and knowledge of the different ion types detected provided a greater number of electrospray mass spectrometric signals which were putatively identified and with greater confidence in the samples studied. To provide definitive identification metabolite-specific mass spectral libraries for UPLC-MS and GC-MS have been constructed for 1,065 commercially available authentic standards. The MMD data are available at http://dbkgroup.org/MMD/.

The cellular uptake of pharmaceutical drugs is mainly carrier-mediated and is thus an issue not so much of biophysics but of systems biology

Kell, D.B. and Dobson, P.D. 
Systems Chemistry (Beilstein Institute). 2009; pp.149-168

Go to article 

Abstract
It is widely believed that most drug molecules are transported across the phospholipid bilayer portion of biological membranes via passive diffusion at a rate related to their lipophilicity (expressed as log P, a calculated c log P or as log D, the octanol:water partition coefficient). However, studies of this using purely phospholipid bilayer membranes have been very misleading since transfer across these typically occurs via the solvent reservoirs or via aqueous pore defects, neither of which are prevalent in biological cells. Since the types of biophysical forces involved in the interaction of drugs with lipid membranes are no different from those involved in their interaction with proteins, arguments based on lipophilicity also apply to drug uptake by membrane transporters or carriers. A similar story attaches to the history of mechanistic explanations of the mode of action of general anaesthetics (narcotics). Carrier-mediated and active uptake of drugs is far more common than is usually assumed. This has considerable implications for the design of libraries for drug discovery and development, as well as for chemical genetics/genomics and systems chemistry.

Implications of the dominant role of transporters in drug uptake by cells

Dobson PD, Lanthaler K, Oliver SG, Kell DB.

Curr Top Med Chem. 2009; 9(2):163-81

Go to article 

Abstract
Drug entry into cells was previously believed to be via diffusion through the lipid bilayer of the cell membrane, with the contribution to uptake by transporter proteins being of only marginal importance. Now, however, drug uptake is understood to be mainly transporter-mediated. This suggests that uptake transporters may be a major determinant of idiosyncratic drug response and a site at which drug-drug interactions occur. Accurately modelling drug pharmacokinetics is a problem of Systems Biology and requires knowledge of both the transporters with which a drug interacts and where those transporters are expressed in the body. Current physiology-based pharmacokinetic models mostly attempt to model drug disposition from the biophysical properties of the drug, drug uptake by diffusion being thereby an implicit assumption. It is clear that the incorporation of transporter proteins and their drug interactions into such models will greatly improve them. We discuss methods by which tissue localisations and transporter interactions can be determined. We propose a yeast-based transporter expression system for the initial screening of drugs for their cognate transporters. Finally, the central importance of computational modelling of transporter substrate preferences by structure-activity relationships is discussed.

'Metabolite-likeness' as a criterion in the design and selection of pharmaceutical drug libraries

Dobson PD, Patel Y, Kell DB.
Drug Discov Today. 2009;14(1-2):31-40

This paper is the subject of a DDT Feature.

Go to article 

Abstract
Present drug screening libraries are constrained by biophysical properties that predict desirable pharmacokinetics and structural descriptors of 'drug-likeness' or 'lead-likeness'. Recent surveys, however, indicate that to enter cells most drugs require solute carriers that normally transport the naturally occurring intermediary metabolites and many drugs are likely to interact similarly. The existence of increasingly comprehensive summaries of the human metabolome allows the assessment of the concept of 'metabolite-likeness'. We compare the similarity of known drugs and library compounds to naturally occurring metabolites (endogenites) using relevant cheminformatics molecular descriptor spaces in which known drugs are more akin to such endogenites than are most library compounds.

A consensus yeast metabolic network reconstruction obtained from a community approach to systems biology

Herrgård MJ, Swainston N, Dobson P, Dunn WB, Arga KY, Arvas M, Blüthgen N, Borger S, Costenoble R, Heinemann M, Hucka M, Le Novère N, Li P, Liebermeister W, Mo ML, Oliveira AP, Petranovic D, Pettifer S, Simeonidis E, Smallbone K, Spasić I, Weichart D, Brent R, Broomhead DS, Westerhoff HV, Kirdar B, Penttilä M, Klipp E, Palsson BØ, Sauer U, Oliver SG, Mendes P, Nielsen J, Kell DB 

Nat Biotechnol. 2008; 26(10):1155-60

Go to article 

Abstract
Genomic data allow the large-scale manual or semi-automated assembly of metabolic network reconstructions, which provide highly curated organism-specific knowledge bases. Although several genome-scale network reconstructions describe Saccharomyces cerevisiae metabolism, they differ in scope and content, and use different terminologies to describe the same chemical entities. This makes comparisons between them difficult and underscores the desirability of a consolidated metabolic network that collects and formalizes the 'community knowledge' of yeast metabolism. We describe how we have produced a consensus metabolic network reconstruction for S. cerevisiae. In drafting it, we placed special emphasis on referencing molecules to persistent databases or using database-independent forms, such as SMILES or InChI strings, as this permits their chemical structure to be represented unambiguously and in a manner that permits automated reasoning. The reconstruction is readily available via a publicly accessible database and in the Systems Biology Markup Language (http://www.comp-sys-bio.org/yeastnet). It can be maintained as a resource that serves as a common denominator for studying the systems biology of yeast. Similar strategies should benefit communities studying genome-scale metabolic networks of other organisms.

Carrier-mediated cellular uptake of pharmaceutical drugs: an exception or the rule?

Dobson P.D. and Kell D.B.
Nat Rev Drug Discov. 2008; 7(3):205-20

Go to article 

Abstract
It is generally thought that many drug molecules are transported across biological membranes via passive diffusion at a rate related to their lipophilicity. However, the types of biophysical forces involved in the interaction of drugs with lipid membranes are no different from those involved in their interaction with proteins, and so arguments based on lipophilicity could also be applied to drug uptake by membrane transporters or carriers. In this article, we discuss the evidence supporting the idea that rather than being an exception, carrier-mediated and active uptake of drugs may be more common than is usually assumed - including a summary of specific cases in which drugs are known to be taken up into cells via defined carriers - and consider the implications for drug discovery and development.

The band assignment parser: a tool to identify band assignments in research publications


Paul D. Dobson and Ewan W. Blanch
Applied Spectroscopy. 2007; 61(3):346-7

Go to article

No abstract

A Simple Approach to Normalization for Spectroscopic Data Mining


Paul D. Dobson, Andrew J. Doig, Ewan W. Blanch
Applied Spectroscopy. 2005; 59(4): 542-544

Link to article

No abstract

Length preferences and periodicity in beta-strands. Antiparallel edge beta-sheets are more likely to finish in non-hydrogen bonded rings

Penel S, Morrison R, Dobson PD, Mortishire-Smith R, Doig A.
Protein engineering. 2003;16(12):957-961.

Go to article

Abstract
We analysed the length distributions of different types of beta-strand in a high resolution, non-homologous set of 500 protein structures, finding differences in their mean lengths. Antiparallel edge strands in strand-turn-strand motifs show a preference for an even number of residues. This propensity is enhanced if the length is corrected for beta-bulges, which insert an extra residue into the strand. Residues in antiparallel edge beta-strands alternate between being in hydrogen bonded and non-hydrogen bonded rings. Antiparallel edges with an even number of residues are more likely to have their final beta residue in a non-hydrogen bonded ring. This suggests that non-hydrogen bonded rings are intrinsically more stable than hydrogen bonded rings, perhaps because its side chain packing is closer. Therefore, we suggest that a simple way to increase beta-hairpin stability, or the stability of an antiparallel edge strand, is to have a non-hydrogen bonded ring at the end of the strand.

Prediction of protein function in the absence of significant sequence similarity


Paul D. Dobson P, Yudong Cai, Benjamin J. Stapley, Doig A.
Current Medicinal Chemistry. 2004;11(16):2135-2142

Go to article 

Abstract
Tremendous progress in DNA sequencing has yielded the genomes of a host of important organisms. The utilisation of these resources requires understanding of the function of each gene. Standard methods of functional assignment involve sequence alignment to a gene of known function; however such methods often fail to find any significant matches. Here we discuss a number of recent alternative methods that may be of use when sequence alignment fails. Function can be defined in a number of ways including E.C. number and MIPS and KEGG functional classes. Phylogenetic profiles show the pattern of presence or absence of a protein between genomes. Protein-protein interactions can be identified by searching for interacting pairs of proteins that are fused to a single protein chain in another organism. The gene neighbour method uses the observation that if the genes that encode two proteins are close on a chromosome, the proteins tend to be functionally related. More general methods use sequence properties such as amino acid composition, mean hydrophobicity, predicted secondary structure and post-translational modification sites. Data mining methods devise rules in the form of IF...THEN statements that make predictions of function using sequence based attributes, predicted secondary structure and sequence similarity. Finally, structural features can be used, after modelling the structure of a protein from its sequence or solving its structure. Protein fold class can be strongly indicative of function, while other structural features, such as secondary structure content, cleft size and 3D structural motifs are also useful.

Predicting enzyme class from protein structure without alignments


Paul D. Dobson, Andrew J. Doig
Journal of  Molecular Biology. 2005;345(1):187-199

Go to article

Abstract
Methods for predicting protein function from structure are becoming more important as the rate at which structures are solved increases more rapidly than experimental knowledge. As a result, protein structures now frequently lack functional annotations. The majority of methods for predicting protein function are reliant upon identifying a similar protein and transferring its annotations to the query protein. This method fails when a similar protein cannot be identified, or when any similar proteins identified also lack reliable annotations. Here, we describe a method that can assign function from structure without the use of algorithms reliant upon alignments. Using simple attributes that can be calculated from any crystal structure, such as secondary structure content, amino acid propensities, surface properties and ligands, we describe each enzyme in a non-redundant set. The set is split according to Enzyme Classification (EC) number. We combine the predictions of one-class versus one-class support vector machine models to make overall assignments of EC number to an accuracy of 35% with the top-ranked prediction, rising to 60% accuracy with the top two ranks. In doing so we demonstrate the utility of simple structural attributes in protein function prediction and shed light on the link between structure and function. We apply our methods to predict the function of every currently unclassified protein in the Protein Data Bank.

Distinguishing enzyme structures from non-enzymes without alignments


Paul D. Dobson, Andrew J. Doig
Journal of Molecular Biology. 2003;330(4):771-783.

Go to article 

Abstract
The ability to predict protein function from structure is becoming increasingly important as the number of structures resolved is growing more rapidly than our capacity to study function. Current methods for predicting protein function are mostly reliant on identifying a similar protein of known function. For proteins that are highly dissimilar or are only similar to proteins also lacking functional annotations, these methods fail. Here, we show that protein function can be predicted as enzymatic or not without resorting to alignments. We describe 1178 high-resolution proteins in a structurally non-redundant subset of the Protein Data Bank using simple features such as secondary-structure content, amino acid propensities, surface properties and ligands. The subset is split into two functional groupings, enzymes and non-enzymes. We use the support vector machine-learning algorithm to develop models that are capable of assigning the protein class. Validation of the method shows that the function can be predicted to an accuracy of 77% using 52 features to describe each protein. An adaptive search of possible subsets of features produces a simplified model based on 36 features that predicts at an accuracy of 80%. We compare the method to sequence-based methods that also avoid calculating alignments and predict a recently released set of unrelated proteins. The most useful features for distinguishing enzymes from non-enzymes are secondary-structure content, amino acid frequencies, number of disulphide bonds and size of the largest cleft. This method is applicable to any structure as it does not require the identification of sequence or structural similarity to a protein of known function.