BioNetBuilder Plugin

Instructions for adding a new mySQL interaction database to BioNetBuilder

Iliana Avila-Campillo July 14th, 2006
Updated: Kevin Drew July 18th, 2006

Text version

Overview

To add a new interactions database to BioNetBuilder you need to:

1. Create and populate a mySQL database that stores the database's interactions

2. Write a Java class that implements org.isb.bionet.datasource.interactions.InteractionsDataSource or that extends org.isb.bionet.datasource.interactions.SimpleInteractionsSource

3. If you wish to allow users to customize parameters for retrieving interactions from the new database (from a user interface that pops-up from the BioNetBuilder wizard) write a Java class that implements org.isb.bionet.gui.InteractionsSourceGui

4. Add a few lines of Java code to class org.isb.bionet.gui.wizard.EdgeSourcesPanel (we explain more in this document)

5. Compile, jar, and re-start the BioNetBuilder server

As you can see, this is an involved process that requires experience with Java programming, basic mySQL operations (creating a database and populating it), and experience with the SQL language.


Detailed Instructions

1. Create and populate a mySQL database that stores interactions
There is plenty of documentation on how to create mySQL databases and populate them here: http://dev.mysql.com/doc/refman/5.0/en/

In particular, look here: http://dev.mysql.com/doc/refman/5.0/en/database-use.html

You most likely want to have the default BioNetBuilder interaction databases available (BIND, DIP, Prolinks, HPRD, and KEGG) in addition to your new database. If this is the case, you will need to do one of the following:

- Send an email to one of the BioNetBuilder server administrators (contacts are at the bottom of this document) and see if they are willing to include your newly created database and needed additional code in the BioNetBuilder server (they will most likely be happy to, since this is something that benefits everyone).

- Set up your own BioNetBuilder server so that you can manage it and add your own new databases. We provide instructions on how to do this on a different document. Go to the BioNetBuilder website: http://err.bio.nyu.edu/cytoscape/bioNetBuilder

Most publicly available interaction databases, like BIND or DIP, provide easy to parse flat files of their databases. You will need to download these files, and most likely write scripts to parse them and obtain the interaction information you wish to store in your database. I used Perl and the "DBI" Perl module (which you can obtain here: http://search.cpan.org/~timb/DBI/DBI.pm) to call mySQL methods to create tables and populate them. The scripts are available in the BioNetBuilder release under the "dbScripts" directory.

A few publicly available databases provide mySQL dumps. These usually contain ALL of the data in their database, of which you may only use a small fraction. I recommend only storing in your database the information that you need for interactions. This will save you space, result in faster SQL queries, and easier to write SQL code.

Interactions usually have the following fields:

- interactor 1
- interactor 2
- type of interaction
- species of interactor 1, species of interactor 2 (most of the time the same, but not necessarily)
- additional information about this interaction like the method used to infer it, publication IDs, etc.

So a simple interactions table in mySQL may look like this:

mysql> desc interactions;

+-----------------+-------------+------+-----+---------+-------+
| Field           | Type        | Null | Key | Default | Extra |
+-----------------+-------------+------+-----+---------+-------+
| id              | varchar(25) | YES  |     | NULL    |       |
| i1              | varchar(15) | YES  | MUL | NULL    |       |
| interactionType | varchar(2)  | YES  |     | NULL    |       |
| i2              | varchar(15) | YES  |     | NULL    |       |
| taxid1          | int(11)     | YES  | MUL | NULL    |       |
| taxid2          | int(11)     | YES  | MUL | NULL    |       |
+-----------------+-------------+------+-----+---------+-------+

In fact, if the interactions table for your new database looks like this, the following steps in the process will be much easier since I have written a Java class to query this table (org.isb.bionet.datasource.interactions.SimpleInteractionsSource, more on this later).

If you want to have additional information about interactions, your table may look like this:
mysql> desc interactions;

+------------------+--------------+------+-----+---------+-------+
| Field            | Type         | Null | Key | Default | Extra |
+------------------+--------------+------+-----+---------+-------+
| id               | varchar(25)  | YES  | MUL | NULL    |       |
| i1               | varchar(25)  | YES  |     | NULL    |       |
| interactionType  | varchar(2)   | YES  |     | NULL    |       |
| i2               | varchar(25)  | YES  |     | NULL    |       |
| taxid1           | int(11)      | YES  | MUL | NULL    |       |
| taxid2           | int(11)      | YES  | MUL | NULL    |       |
| primaryPubMed    | int(11)      | YES  |     | NULL    |       |
| secondaryPubMeds | varchar(100) | YES  |     | NULL    |       |
| detectionMethod  | varchar(10)  | YES  |     | NULL    |       |
+------------------+--------------+------+-----+---------+-------+

In which case, you can still use the Java class I wrote to query the earlier simpler table, except you may have to expand it to implement a few more specialized methods (more on this later).

If the design of the tables above do not meet your needs, you can create a table that looks nothing like them, or use more than one table if you wish. In this case, you will need to implement class org.isb.bionet.datasource.interactions.InteractionsDataSource.

One last step you MUST not forget is to add an entry to the table named "db_name" in the database "bionetbuilder_info" (which exists in the default BioNetBuilder server, whether you are using ours, or your own). This table contains a unique name for a database, like "bind" and a name of the mySQL database that currently holds the interactions in that database, like "bind0". This is in place so that if you are updating the bind interactions by running a script that takes too long, you can do so in a new database called "bind1". When "bind1" is completely updated, you can set "bind" in "db_name" to point to "bind1" so that the server uses this newly updated database and you can still hold on to the older "bind0" if you wish to do so.

A quick word on IDs for interactors. If you can, try to use one of these IDs to store your interactors:

- RefSeq protein accessions (for example, NP_013739)
- GI RefSeq amino acid sequence numbers (if you do use these, preceed each ID with "GI:")
BioNetBuilder's server has translation ID tables for these IDs, so you won't have to create your own translation tables if you use a different ID. If you do use a different ID, you will have to do more work than described in this document. There are examples of what to do in this case in the BioNetBuilder code. You can study the code for the KEGG or the Prolinks databases, since neither of them uses one of the ID types above to store their interactors. You will also need to study the code of the synonyms handler: org.isb.bionet.datasource.synonyms.SQLSynonymsHandler.

2. Write a Java class that implements org.isb.bionet.datasource.interactions.InteractionsDataSource or that extends org.isb.bionet.datasource.interactions.SimpleInteractionsSource

I already explained when to use SimpleInteractionsSource, or InteractionsDataSource in the previous step. As a reminder, extend class SimpleInteractionsSource if your interactions table looks like one of the simple interaction table descriptions above. Implement InteractionsDataSource if your table (or tables) that hold the interactions are different to the examples above. Class SimpleInteractionsSource itself implements InteractionsDataSource assuming the basic table structure shown above.

InteractionsDataSource contains methods like (pseudocode):

- getIDtype(): gets the ID type used to store the interactors (for example, GI, RefSeq, etc)
- getAllInteractions(String species)
- getNumAllInteractions(String species)
- getAllInteractions(String species, Hashtable args)
- getFirstNeighbors(Vector interactors, String species)
- getNumFirstNeighbors(Vector interactors, String species)
- getFirstNeighbors(Vector interactors, String species, Hashtable args)
- getConnectingInteractions(Vector interactors, String species)
- etc.

You will notice when you explore this interface this general pattern:

- method(species, otherArgs)
- getNumMethod(species, otherArgs)
- method(species, otherArgs, Hashtable args)
This last version of "method" takes an additional Hashtable than the first version. This Hashtable's purpose is to contain specific arguments that a particular interactions source understands. These are obtained from a user interface where users specify parameter values for a particular database. For example, an entry in this table may look like this:

"INTERACTION_TYPES" -> "pp,pd"

Indicating that the user only wishes to obtain interactions with interaction types equal to "pp" or "pd". This is a simple example. The idea is to provide a general way of passing arguments to the server that come from a particular client.

To learn more about how to implement InteractionsDataSource, or extend SimpleInteractionsSource, study these classes:

- org.isb.bionet.datasource.interactions.BindInteractionsSource (extends SimpleInteractionsSource)
- org.isb.bionet.datasource.interactions.KeggInteractionsSource (implements InteractionsDataSource)

3. Write a Java class that implements org.isb.bionet.gui.InteractionsSourceGui

Most interaction databases contain specific information that users may want to use to filter the interactions they retrieve for their networks. Examples are:

- interaction types
- p-value thresholds
- methods used to infer interactions (for example, two hybrid)
- more specific parameters, like the threshold parameter for KEGG interactions

If you wish to allow users to specify "filters" to retrieve interactions, you can implement InteractionsSourceGui. This interface only contains ONE method:

public Hashtable getArgsTable();

This method is called by BioNetBuilder's client to send to the server a Hashtable of arguments for the particular interactions database that is being queried. You do not have to worry about how this Hashtable makes it to the right place at the right time to the server. All you have to do is make a pretty dialog, implement getArgsTable(), and make sure that the class you wrote in the previous step 2 uses the arguments in this table to query interactions. This Hashtable is the same Hashtable I mentioned in the previous step in "method(args, otherArgs, Hashtable args)".

Once more, if you are like most programmers, you will understand this better by looking at code:

- org.isb.bionet.gui.BindGui
- org.isb.bionet.datasource.interactions.BindInteractionsSource (to see how Hashtable args is handled)

4. Add a few lines of Java code to class org.isb.bionet.gui.wizard.EdgeSourcesPanel

This is the only step that makes me feel uncomfortable since it could be avoided with a better design (like an object factory!). This is in our TODO list.

For now, just read this, and do what I say. Open in an editor class org.isb.bionet.gui.wizard.EdgeSourcesPanel. Search for this string (or a substring of it): "INSERT CODE FOR YOUR OWN DATABASE HERE".

You will see an if-else statement that basically creates a GUI (which you implemented in the previous step 3) for each database source, stores it in a map, and attaches an action listener to a button that pops-up the GUI. Just look at a couple of code blocks within one of the if statements, and you will know exactly what to do for your own database.

5. Compile and re-start server

The BioNetBuilder plugin has a build.xml file with ant tasks to compile, create jars, create Java Docs, etc.

To compile simply type:
ant compile

Then make a new jar of the plugIn:
ant jar

The new "bionetbuilder.jar" file should be copied to the lib/ directory of the Tomcat servlet 'bionet_server' (ie. webapps/bionet_server/WEB-INF/lib/). Restart the Tomcat server. If the BioNetBuilder server administrators agreed to add your new database and code to the server, they do all these steps for you (compile, jar, re-start). If you set up your own server by following the instructions posted in www.db.systemsbiology.net/cytoscape/bioNetBuilder, you know how to re-start the server already.

CONTACTS

The default server is housed at New York Universtiy. The principal database administrator is Kevin Drew, his email is kdrew [at] nyu.edu

Please send any questions or suggestions to Kevin Drew: kdrew [at] nyu.edu
Last updated 09.15.2006