Bioinformatics And Super Computers Computer Science

Add: 24-11-2017, 10:19   /   Views: 149

Cancer is a common disease today that has affected and consumed many lives.

It is marked by uncontrolled growth of body cells that would result in tumor formation and sometimes invasion into other body parts.

Presently cancer is being treated with Chemotherapy or Radiotherapy other than operations.

But all these techniques have severe side effects.

In the effort of finding a better medicine for cancer, scientists are trying to make use of a fundamental biochemical property of cancer cells called Warburg effect.

The high glucose consumption of tumor cells even in an oxygen-rich environment is referred to as the Warburg effect.

A normal cell has a low rate of glycolysis followed by oxidation of pyruvate in mitochondria.

But in a cancer cell, there is high rate of glycolysis followed by lactic acid fermentation in the cytosol and a predominant production of energy.

One of the novel way of killing the cancer cell would be altering the cancer cells biochemically because of which there is a change in its metabolic rate.

That is by using the right compound, the binding energy in the cancer cells can be altered, with which the glycolysis process can be suppressed.

This results in preferential killing of cancer cells without affecting the normal cells.

PFKFB3 inhibitor is one such compound that is believed to cause preferential killing of cancer cells.

The biological department faculty at LSU is performing a structure based design of the drug which behaves like PFKFB3 inhibitor.

Molecular docking based virtual screening is an important tool in drug discovery that is used to significantly reduce the number of possible chemical compounds to be tested.

But the ligand database is vast and virtual screening for structure based design is a very complex and would require high computational power.

Doing this on a simple machine would ample amount of time.

Hence, the two main requirements for such huge programs are:

The ability to launch parallel docking jobs through a queuing system.

The ability to process millions of components in reasonable time.

This requirement can be attained using Supercomputers, which has huge computational power to fit the needs of the biologists.

1.2 Motivation

Autodock is a broadly used docking program developed at the Scripps Research Institute.

It is used for estimating the binding energy of small molecules or drug candidates with a receptor when the 3D structure is given.

Processing the given ligand database on a single CPU using Autodock would take about 500,000 hours, that is around firfty Seven years.

Running Autodock in parallel on approximately six hundred CPUs would produce millions of parallel single processor jobs and could reduce this time to around a month.

But, as Autodock is a single processor job, only a single core in multi core machines is utilized.

As most of the machines are quad core, around three cores are left idle on each machine.

For complete utilization of resources, the Autodock software needs to be wrapped in scripts that would run Autodock in parallel on the four cores.

Also, pre processing steps of dividing the given database and post processing requirements of collecting data are to be addressed.

Further, submitting jobs on cluster machines using command line would require the biologists to undergo a huge learning curve.

So, a graphical user interface is required for easy usage of the resources.

1.3Outline of Our Work

In order to satisfy the requirements mentioned in the section 1.2, we have designed a mechanism to submit jobs on cluster machines, that utilizes the processors completely.

For this the Autodock software was studied, and it was observed that Autodock has four major steps of conversion.

.pdb files are converted to .pdbq which is further converted into .dpf and .dlg files.

In batch jobs, normally each level of processing of different files is done at the same time.That is all the files would be converted from .pdb to .pdbq before the next file conversion takes place.

But a better solution would be to streamline each job into a set of batch jobs.

This is depicted in the figure below.

The wrapper scripts designed for this purpose carry out a few pre docking and post docking steps to make this possible.

Pre Docking - Ligand Input formats are in the form of sdf, mol2 or pdbq.

If the formats are sdf or mol2, babel software is applied to convert it into pdbq format.

This is now divided into N blocks.

Parallel Docking: Each block is allotted to a CPU for processing

Post Docking : Collects the ligands from the consolidated list.

We observed that the DOVIS software provided the same functionality as the designed mechanism and we have been using the DOVIS software with a few modifications to run on LONI machines.

Further, we are developing a web portal that would activate the user credentials, take input files and parameters from the user, perform job submission and return the status of the job perform.

There is also functionality to transfer files and add resources to the portal.

So the user can perform most of the operations from the portal instead of working on the command line.

Chapter 2

Project BackGround

2.1 Structure-based drug design and Virtual Screening

Structure based design is anticancer drug design, that starts with the discovery of a specific protein within cancer cells that contributes to the growth and survival of cancer tumor.

Firstly a synthetic chemical compound is found, that binds to the protein selected and alters its biological functions.

There are millions of possible element combinations for this and the elements selected for testing clinically would need considerable optimization to enhance their anticancer activity.

After this the researchers add or remove atoms to the chemical starting point to fine tune the molecule into a shape that better fits the protein in the binding site.

The structure based design of PFKFB3 is further described in the diagram below.

Virtual Screening: Virtual screening an analog of biological screening over computers.

It is used to score, rank and filter a set of structures using one or more computational procedures.

A large database of ligands undergo the virtual screening process and a few compounds from them are selected.

It is further used to select which libraries to synthesize and what compounds to purchase from an external source.

There are different ways to perform virtual screening:

Use a previously derived mathematical model that predicts the biological activity of each structure

Run substructure queries to eliminate molecules with undesirable functionality

Use a docking program to ID structures predicted to bind strongly to the active site of a protein (if target structure is known).

This procedure is being followed currently for docking and the docking program used is Autodock4.

This protein-ligand docking aims to predict 3D structures when a molecule "docks" to a protein.

Functional Screening: Further, the collection of molecules selected for virtual screening are clinically tested.

The drug likeliness of the compound is analyzed.

The drug is designed based on the Absorption, Distribution, metabolism, Excretion and Toxicity(ADMET) of the drug.

Lead Molecules: Lead molecules lead to increase in complexity during optimization phase.

So the lead likeliness of the compound built is also tested.

Structure Guided Optimization: Further the hydrogen bonding descriptors are calculated with the count of number of donors and acceptors.

Modeling solubility, octanol/water partition coefficient, and blood-brain barrier permeability is also a requirement in calculating hydrogen bonding descriptors.

Then the polar surface area is calculated, for mainly estimating the oral absorption and brain penetration of the the compound.

Higher Level Test: The compound found after all these tests is further tested using clincal and animal specimen.

2.2 AutoDock

2.2.1 Introduction

Autodock is a broadly used docking program developed at the Scripps Research Institute.

It is used for estimating the binding energy of small molecules or drug candidates with a receptor when the 3D structure is given.

Autodock is mainly divided as a collection of two programs:

Program to perform Docking of a ligand with the grids specified.

These are the scripts that are required to be implemented.

They are displayed in boxes in the fig

Program to calculate the grids used for docking.

Input and Output Files:

In case of ligand input, *.sdf or *.pdb files are taken as input.

Protein file is the *.pdb file that is being computed by the biologists.

2.2.2 Steps involved in running Autodock

Each of the scripts has a specific functionality.

The diagram below describes the functionality of the compound.

Ex01.csh-Ex03.csh are used for converting the format and computing the ligand files.

Ex04-Ex07.csh are used for computing gridmaps from the protein files.

This step is being performed manually by the biologists in our case.

Ex08.csh and Ex09.csh, autodock is run on the files computed earlier to give the output.

Fig1: Description and Flow of Autodock Scripts

2.3 DOVIS Software

2.3.1 Introduction

DOVIS is an open source software used to perform large-scale virtual screening of small molecules in parallel on Linux clusters, using Autodock 4.0.

Input and Output files:

Input files given to the software are of pdb or mol2 format for proteins and sdf format for ligands.

Output files are docked ligand structure in sdf format.

Also, a score list is provided to list the highest binding energy in each group of ligands.

2.3.2 Steps to Run Dovis and Changes Made

DOVIS is run in a sequential two step process.

Setting -up a DOVIS project directory

Executing a DOVIS docking run.

A DOVIS project directory contains subdirectories and parameter files generated with default parameters.

First the energy grids are calculated and then parallel docking processes are launched.

The shaded area shows the steps being run in parallel.

Fig 3: DOVIS Workflow

2.4 Grid Software

2.4.1 Globus Toolkit

Globus is an open source Grid middleware package that bundles together services and libraries that allow users and other application software to perform basic tasks such as resource monitoring, discovery, security and file management.

"Its core services, interfaces and protocols allow users to access remote resources as if they were located within their own machine room while simultaneously preserving local control over who can use resources and when."[ref] The Globus ToolKit has two implementations, one using C-based Unix services and the other using the web services reference framework (WSRF) in C and Java.

Both of these are installed on resources that are part of the Grids at Teragrid and LONI.

These services are used by GridPortlets that being used for job submission.

2.4.2 Schedulers

Globus provides jobmanagers that interface with PBS to execute batch jobs submitted

to Globus.

This makes it possible to submit parallel MPI jobs as well as serial batch jobs using


AIX-based grid resources at Teragrid and LONI use LoadLeveler, an IBM solution for scheduling and resource management.

The LoadLeveler package provides jobmanagers that can be installed so that applications using Globus can submit all types of jobs to the LoadLeveler resource manager.

2.5 Portal Related Software

2.5.1 Java Cog Kit

Java Cog Kit allows Grid users, Grid application developers, and Grid administrators to use, program, and administer Grids from a higher-level framework.

The Java Cog Kit makes it possible to use Globus tool kit as a Java API in the services written.

2.5.2 GridSphere Portal Framework

GridSphere is a JSR-168 compliant portal framework for third-party portlet application developers.

It has some out-of-the-box portlets such as Login Portlet, Layout Manager Portlet, Profile Settings Portlet and more administration portlets such as Personalized Layout Manger, Groups, Roles, etc.

Grisphere Portlet API is being provided by the framework that enables developers to build portal applications for grids easily.

It is also completely JSR-168 complaint.

The following features are more specific to the gridsphere framework.

Portal presentation description is described in XML files(layout.xml, group.xml) which are to be modified to create customized portal layouts.

Portal configurations files are also in XML format(portlet.xml, web.xml), which are to be modified to describe what portlets are to be displayed.

Built-in support for Role Based Access Control separating users into guests, users, admins, and super users

Portlet service model allows creation of"user services," where service methods can be limited according to user rights.

There are a group of built in portlets in the gridsphere framework for carryout the administrative tasks like portlet and layout changes, user management etc.

It also has portlets that provide the basic functionality, which are listed below:

Login Portlet: This portlet is used to restrict access to the Portal.

Welcome Portlet

Settings Portlet: It has two components.

One is Profile Settings where user can modify his/her contact information.


Group Membership component allows user to manage his/her membership to all authorized portlets.

The portlets group that are to be displayed in his account are to be selected here.

Without this the user cannot see the portlets even after deploying them.

Layout Portlet: Here a new layout can be created as a separate tab and the theme and portlets to be displayed can be chosen here.

Administration Portlets: These are a set of six portlets for the administrators to change portlet settings.

Portlets: This portlet is used to deploy portal applications and configure various options such as login, authentication modules, error handling, etc.

This portlet also shows all the users currently logged onto the portal.

Users: This portlet is used to create a new user if the portal uses GridSphere-based local authentication system.

It displays all the users that have ever logged into the portal and their contact details.

Groups: This portlet is used to create a group and associate a single or set of portlets with it.

It allows the Administrator to specify the role that is required to access the group.

This portlet is also used for managing the existing groups.

Roles: This Portlet is used to create new roles and associate users with those roles.

Layouts: This Portlet is used to edit the Portal header and footer information, choose a default global theme, edit the guest user layout and edit the layout associated with each of the groups.

Messaging: This Portlet is used to configure messaging services like AOL AIM, Cingular, SMS and Email.

This components can be used for notifications by any of the portlets.

2.5.3 GridPortlets

GridSphere can be integrated with Grid portal toolkit called GridPortlets to create custom Grid-based portal applications that abstract the details of underlying Grid technologies.

The GridPortlets services provide functionality for resources, proxy credentials, remote files, jobs, and they support persistent information about credentials, resources, and jobs submitted by users.

The GridPortlet service API currently supports Globus Toolkit[ref].

GridPortlets comes with five well-designed, easy-to-use Globus-based portlets: resource registry, resource browser, credential management, job submission, and file management.

These portlets can be changed and extended for customization based on the portal requirements.

Resource Registry Portlet: This is a portlet that is available only to Administrators and is used to configure the grid resources that the users can access and use to carry out their grid tasks.

Grid resources including the portal's host resource, proxy server and portal credentials are configured in this portlet.

GRAM resources and other hardware and software details can also be configured in this portlet.

Credential Retrieval Portlet: This portlet lets the user retrieve credentials from the MyProxy[ref] server that is configured in Resource Registry Portlet to accept the user's grid credentials and gain single sign-on access to all the Grid resources.

Resource Browser Portlet: This portlet has three unique sub portlets that show system information, services running on each of the resources and queue and scheduler information on all grid resources to which the user has access.

File Browser Portlet: This portlets allows users to browse and manage physical files on Gridresources.

It uses GridFTP.

Job Submission Portlet: This portlet allows users to submit and manage jobs on their resources using Globus-PreWS and Globus-WS.

Chapter 3

Related Work

3.1 Similar Computing Architectures Studied

3.2 Similar BioInformatic Softwares

3.3 AutoDock Vina

3.2.1 Introduction

3.2.2 Comparision of Autodock Vina with Autodock 4

3.2.3 Possible Use of Autodock Vina in the System

Chapter 4

Design and Architecture of the Software System

4.1 Technologies Used

For Running Autodock on cluster machines, a virtual screening tool, DOVIS 2.0 is used.

This would intern used Autodock4 for virtual screening and Open Babel to convert files into the required format.

For building the portal Gridsphere 2.2.10 framework is used and Gridportlets 1.4 are plugged into it.

Tomcat 5.5.25 is used as the Portlet container and Ant 1.7.0 are used as a build tool.

4.2 Architecture of the Software System

Fig:System Architecture

Chapter 5


5.1 MVC Model

To develop portlets, we write the code in the lines of MVC Architecture.

In MVC Architecture, the code is divided into separate classes where each of the classes fall in a particular category.

It is seen that each of the classes mentioned for Model, View or Controller are left separate.


The model is a collection of Java classes that form a software application intended to store, and optionally separate, data.

A single front end class that can communicate with any user interface (for example: a console, a graphical user interface, or a web application).


The view is represented by a JavaServer Page, with data being transported to the page in the HttpServletRequest or HttpSession.


The Controller servlet communicates with the front end of the model and loads the HttpServletRequest or HttpSession with appropriate data, before forwarding the HttpServletRequest and Response to the JSP using a RequestDispatcher.

5.2 Portal Implementation

The DOVIS portal would have mainly have Credential Management Service, Job Submission and File management and Resource management services.

Credential Management Service:

In order to access the machine from the portal the user should have trusted certificates.

The certificate proxy confirms that you are authorized by a trusted authority to access grid resources.

This certificate should be invoked from the portal.

Job Submission Service:

This is the service now being developed.

The job submission service submits the DOVIS job from the portal.

The job submission mechanism in DOVIS portal can be divided into three parts.

Running the script

Taking Imput Parameters

Running the script.

File Management Service:

The file management service would help accessing the files from the portal .

This will help the users see the files on the machine.


7.Future Work