INTERNATIONAL INTERNATIONAL MEDICAL UNIVERSITY
Introduction to Bioinformatics Drug Design Using Bioinformatics Aniqah Zulfa Binti Abdul Latif MB0710029885
Medical Biotechnology 1/10
Table of Contents Content
Page
Introduction
2
Drug Discovery
3
Bioinformatics Approach to Drug
4
Design Sequence Annotation Databases
5
Structure Prediction
6
High-Throughput and Virtual
7
Screening 1. High-Throughput Screening 2. Virtual Screening a. Docking Screening b. Similarity Screening
ADMET
10
Future Outlooks
11
References
11
1
INTRODUCTION
Bioinformatics is a field that associates computer science with the pure science field such as biology, chemistry and medicine. They play an important role in organizing, managing and interpreting data from biological information. Terms like genomics and proteomics are the backbone of the field of bioinformatics. In this essay, let us dive in to see the development of the bioinformatics in drug designing process. [1] Bioinformatics has grown so well that its presence has transformed the customary approaches of the drug designing and development. In our time, the approaches to the drug designing and development have been increasingly favoring the computational methodologies. Methods such as high-throughput screening, microarray, two-dimensional (2D) gel experiments, large-scale mass spectrophotometry and chemical library screens are acknowledged due to its contribution in introducing many potential and reliable drugs to the community. Despite the molecular and chemical understanding of certain drug development and designing, these methods too have been used to speed up the overall process of drug discovery.[2]
Cited 1 - http://www.ittc.ku.edu/bioinfo_seminar/F07.html
Since, there is a drastic increase of computational usage in scientific researches, the major challenge scientists face nowadays is not in collecting the data, but, in the interpreting, analyzing, recovering and also in the storage of the data. Most of the scientific data are collected in large-scale database. Such databases contain many experimental results, gene sequences, mutations and millions of nucleotide polymorphisms. For example, GenBank contains 39,000,000 genomes, 43 billion bases and occupying 100 gigabytes of disk space. There are more than 1,000 viral genomes, 200 bacterial genomes and more than a dozen eukaryotic genomes have been sequenced. Finally, database called PubMed 2
contained 15 million abstracts from more than 4,600 journals occupying more than 40 gigabytes of textual data. [3] Scientists have been working side by side with computer scientists to help in managing this so
called “data explosion”. Thus, this collaboration has led to the rise of two new arena in information science; bioinformatics and cheminformatics. Cheminformatics touches more on the chemistry basis and in case of drug designing; chemistry is the backbone of it. Hence, from the collaboration of both fields, scientist is able to predict the pharmaceutical importance of a drug by retrieving and visualizing the storage experimental data. [4] In this essay, we will take a look on the bioinformatics features that are significant in pharmaceutical researches, specifically in drug designing and development. Since the features of bioinformatics in pharmaceutical researches are so wide, we will only concentrate on the bioinformatics tools that apply on the important pharmaceutical factors, especially the structural prediction of the target drugs. Besides that, we will also discuss on the prediction and understanding of the metabolism and toxicity of the drugs using bioinformatics resources and relevant software. [5]
Drug Discovery
For a drug to have high efficacy and potency it should be as specific as possible and the side
effects are as low as possible. Therefore, good chemists should be able to identify the drug’s target before designing it. The drug should be design according to the specificity of the drug’s target and its action. For example, the protein protease; an enzyme that catabolized proteins. Protease is an important enzyme that helps in many metabolic activities in the body. However, it also plays an important role in human 3
diseases. Take as example, the Human Immunodeficiency Virus (HIV) in AIDS. This virus makes use of protease to break down healthy proteins and use them as a precursor for the development of new viruses. In case of osteoporosis, osteoclast cells that stick onto the bone surface produce proteases that make bones more fragile. Therefore, in the case of protease, the drugs that are design should be specifically act to inhibit the actions of the enzyme protease. However, the major challenge is to have enough specificity and lower the possible side effects of the drug. [6] Earlier, most of the human genomes were still unknown and not yet discovered. Thus, the drug
development had been constrained to a small percentage of possible drug’s targets. Thanks to bioinformatics, the task of selecting drug targets are highly lightened as more and more genome sequences were identified and stored in the genes databases.[7] In dealing with the drug design process, it is also important to understand the function of the proteins that make up that particular drug. In order to achieve this, bioinfomaticist will perform a computational analysis that can predict the three-dimensional structure (3D) of the proteins. Important software tools can be used in order to generate the 3D structure with a desired epitopes coordinates.[8]
Bioinformatics Approach to Drug Design In the bioinformatics of drug design, it can take numerous of approaches in order to develop significant and reliable drugs. The approaches include: [9]
Identification and characterization of gene
Analysis and finding of promoter
Identification of transcription factor binding site
Identification and characterization of protein
Molecular phylogeny
Determination Determination of protein structure
Identification and analysis of splice site
Analysis of genome and proteome
Determination Determination of protein structure
Simulation of biochemical
Analysis of DNA microarray
Analysis and identification of motif
4
Sequence Annotation Databases In pharmaceutical research, it is important to understand and interpret the gene and protein sequences of particular organism in order to have an overview on the possible protein drug targets. For example, the regular sequencing of bacteria, parasites and other pathogenic organisms can really help scientist to identify its pathogenicity. Moreover, performing sequences on mammalian ’s genomes has helped in categorizing various drug-metabolizing enzymes and the gene information is used widely to study and understand protein expressions in many pharmacology and toxicology experiments. From the sequence annotation data, we are able to predict the proteins in which the drug acts upon, the mechanisms of the drug and the metabolism of the drug. [10] There are two main providers that offer sequence annotation data and they are:
National Centre for Biotechnology Biotechnology Information (NCBI)
European Bioinformatics Institute (EBI)
In general, NCBI offers data that is DNA-rich and EBI offers protein-rich information. Some of other sequence annotation databases are as follow: Databases GenBank GenBank Stats Ensembl EntrezGene UCSC-GoldenPath RefSeq SwissProt UniProt TrEMBL GeneCards Mouse genome database (MGD) Rat genome database (RGD) MAGPIE/BLUEJAY SymAtlas CypAlleles DB Directory of P-450 containing systems Cytochrome P-450 interaction table Human membrane transporter database (HMTD) Transporter page Human ABC transporter database
5
Structure Prediction
One of the applications of bioinformatics in drug designing processes is to achieve an understanding about the connection between the amino acid sequence and protein’s 3D structure. The structure of the protein can give the overview of how the protein will function. As a result, the most vital approach that needs to be taken in consideration is the identification and the classification of protein. This is due to the need to visualize the 2D and 3D structure of a particular protein. Hence, through this method that protein structure shall be predicted. [11] The process of drug designing is facilitated by understanding the structure of the target protein. The prediction starts by identifying the amino acid sequences and genes before going to the purified protein. Thus, this results in more accurate prediction of the protein. [11]
Thanks to bioinformatics, there have been various databases that offer lists of 3D structure of various proteins and macromolecules. For example for such databases are, molecular modeling database (MMDB) and protein data bank (PDB). [12]
The methods in in which the structure of the proteins is predicted are categorized into three standard methods. They are: Ab initio / de novo prediction
Homology modeling
Fold recognition (threading)
De novo prediction is used when the protein sequences have little or no structure similar to it. It is
done based on the chemistry and physics of the protein structure. Secondly, the prediction based on 6
homology modeling is done by comparing with homologous sequence which in turn will produce similar structures. However, not all homologous sequence will produce the similar structure that we need. Thirdly, the threading method or fold recognition method is used to predict the protein structure when two proteins have similar three-dimensional structure but they have distinct primary sequence. Hence, this method can verify the unknown structural alignment. MAMMOTH and SCOP are some of the programs that are used in structural structural alignment. alignment. [12]
High-Throughput and Virtual Drug Screening The next step after the drug’s identification, structure prediction and functional recognition, they need to be tested for their efficiency in vivo as well as in vitro. Therefore, there are several approaches that can be done in order to put the drugs to screening. They can be classified into high-throughput screening and virtual drug screening.[13]
High-Throughput Screening High-throughput screening is the traditional approach that is done upon a drug to recognize its activities. This method involves the use of chemicals that are tested systematically upon the drugs in vitro. The whole process of high-throughput screening is an automated process whereby 100,000 molecules can be screened per day. The media that the drugs are tested upon could include the use of organism or cellbased testing.[13] Virtual Drug Screening
Virtual drug screening is an expensive yet precise approach for the testing of drug’s activity. This method uses different and unrelated databases which provide all the sequence and structure information of
genes. It uses the gene’s information and sequence to predict the 3D structure of proteins and give ideal virtual screening. The most precise virtual screening is achieved based on the accuracy and the degree of completion in data. [13] 7
Virtual screening includes several methods and two of them are:
Docking-Based Virtual Screening
Similarity-Based Virtual Screening
Docking-Based Virtual Screening This method of virtual screening includes the identification
and characterization of the binding sites of the drug’s target proteins. The surface of the proteins that make up the
drug’s targets can be visualized by using modeling programs such as DOCK and AUTODOCK. Significantly, this programs use various databases such as ZINC to identify potential ligands that can bind to the binding sites of the proteins. Moreover, this approach of drug screening
visualizes the protein’s side chains conformation in the selection of ligands and character ized them as conserved or non-conserved. Conserved side chains are mainly found in various proteins’ binding sites and therefore are non-specific. On the other hand, non-conserved side chains are expected to be more specific. Thus, we need to identify the degree of specificity of the ligands that target the protein’s binding sites. [14] The significant of this in drug designing is that, if we assume that the drugs that we want to design is the ligands, hence, we can use this approach to know the degree of specificity of the drugs that
we designed on the targeted protein’s binding sites. [14]
8
Similarity-Based Similarity-Based Virtual Screening
This method of virtual screening includes the small molecule alignment in which test ligands are screened through known ligands databases and the most similar known ligands can be the reference to the
test ligands. The similarity of the ligand’s alignments is scored based on the molecular groups overlapping. Examples of the programs that make use of this concept are GASO and FlexS. [15]
In addition to ligand’s alignment, the identification identification of the ligand’s binding site structure can also be used to recognize the possible drug targets. This approach makes use both the protein structure databases as well as the ligand binding affinity databases. Binding Database is one of the examples of ligand binding affinity database. From these two concepts, we shall examine proteins that have comparable functions and from the fact that proteins which have similar functions could also possess the similar binding areas, we can predict the interaction of the ligands and the targets of our drugs of interest. Relibase is a tool that is used to analyze the reaction of ligand binding and can sets out the significant data which includes the binding pockets conformation, interactions of water molecules and the degree of specificity of the ligands. [15] On top of that, databases such as Comprehensive Medicinal Chemistry Database and MACCS-II Drug Data Report are specialized designed screening libraries in which they are able to give the performance report of the drug molecules in vivo. Such databases also include the chemical properties of the drug molecules for instance the properties of the hydrogen binding, log P, the molecular weight and also the possible attachment of certain functional groups. [15]
9
ADMET Bioinformatics play a very important role in terms of describing the ADMET of drugs in drug designing and development. Many clinical trials of drugs failed to describe the ADMET of the drugs in such details. This is due to the fact that the ADMET of a drug is an extremely complicated picture whereby scientists need to understand the mechanisms of action of the compound from its entrance to the digestive tract to its target. In between, many chemical reactions are taking place and each details of it are crucial in the predicting the ADMET of that compound. [16]
Therefore, what bioinformatics do in this case is just to predict the ADMET based on the collected data for instance, the size of the compound, lipophilicity lipophilicity properties and the presence of probable functional groups. From this information, QSAR (Quantitative Structure-Activity Relationship) model can be build. QSAR model is a process attempt to quantitatively associate the structural and properties of
process. [16] a compound with a well-defined process; in this cas e, it’s a biological process.
There are various QSAR programs that are created to specifically predict the ADMET of a compound. For example of such programs is ADMET Predictor from Simulation Plus. [16]
1 - ADMET Predictive Software
Due to the fact that predicting the performance of such complex system, these ADMET prediction tools are able to give 60 to 70% of accuracy. On the other hand, certain toxicity models somehow give more reliable results. This is because; toxicity models are designed for only one specific type of toxicity.
10
Future Outlook of Bioinformatics in Drug Designing Someday, it is not impossible to expect that data collected are not limited to the molecular basis of organisms but their physiological and even their epidemiological information can be collected and interpret. Possibly, this information can give more accurate interpretation of a specific disease in the aspect of populations, racial or ethnic groups. From this information, we can predict the likelihood of probable adverse effects, toxicity, and the pharmacokinetics in the distribution of population if the data are incorporated with the high-throughput high-throughput in vitro ADMET. Due to the fact that there is a drastic increase in the development of bioinformatics, it is most expected that there will be a new innovative era of medical and health sciences.
References 1
Special issue: Biological databases, Nucleic Acids Res., 29, 1 (2001).
2
K. Rutherford, J. Parkhill, J. Crook, T. Horsnell , et al., Bioinformatics, 16, 944 (2000).
3
A.A. Schaffer, Y.I. Wolf, C.P. Ponting, E.V. Koonin , et al., Bioinformatics, 15, 1000 (1999).
4
M.J. Callow, S. Dudoit, E.L. Gong, T.P. Speed , et al., Genome Res., 10, 2022 (2000).
5
http://www.genomicglossaries.com/content/chapterinfosourcestext.asp
6
http://www.scfbio-iitd.res.in/tutorial/drugdiscovery.htm
7
http://www.b-eye-network.com/view/852
8
http://www.pharmainfo.net/reviews/computer-aided-drug-design-andbioinformatics-current-too bioinformatics-current-tool-designing l-designing
9
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1609333/
10
http://www.vls3d.com/courses_talk/Villoutreix_intro_drug_design.pdf
11
http://www.mrc-lmb.cam.ac.uk/genomes/madanm/pdfs/medinfo.pdf
12
Odriguez R., Chinea G., Lopez N., Pon T., andVriend G. 1998.Homology modeling, model and software evolution: Three related resources. Bioinformatics 14:523-528
13
http://biospectrumindia.ciol.com/content/careers/10306091.asp
14
Ortiz, A. R., Gomez-Puertas, P., Leo-Macias, A., et al. (2006) Computational approaches to model ligand selectivity in drug design. Curro Top Med. Chern. 6(1). 41-55.
15
http://www.slideshare.net/bknanjwade/applications-of-bioinformatics-indrug-discovery-and-proce drug-discovery-and-process-presentati ss-presentation on
16
http://phobos.ramapo.edu/~pbagga/binf/binf_future.htm
11
12