Adding additional databases to BioQuery.

The BioQuery application is designed to make it as simple as possible to add additional databases to the list of accessible data sources. This is possible because the Query package is designed as a framework that models a query, but does not know the database-specific details of how to submit the query. By creating 1 java file and modifying 1 text file, any moderately skilled java programmer can add a new database to this application. The existing code does not need to be modified or recompiled.

To add a new database follow these steps:

  1. First make sure you have the BioQuery client application installed on a test machine you can use for development. You can download the client installer here
  2. Next you need to identify your target database and determine how to query it over a network. You should understand the network connections and the format it uses to receive and execute queries. This will be different for every database. Databases with a web interface are often easy to query by creating a URL containing the search terms, or by placing them in the body of POST request.
  3. Create a prototype java class that can take an example boolean search, convert it to a format the database understands, submits it over the internet, and prints out the results. If you can do this, you've got it made.
  4. Turn your prototype into a class that extends QuerySubmitter. Make sure the abstract methods are implemented. A QuerySubmitter class should do 3 things when given a search string:
  5. Add the details of your new query type to the file: querys.xml. This will include the database name, result formats, and searchable fields. Also include the full class name of your new QuerySubmitter class. See the file querys.xml.
  6. Compile your QuerySubmitter class and put the class file in the bioquery directory of your client application. When you run the client, it should be available. If you want to receive automatic updates from your new database, you must also update the querys.xml file and add your new class file to the server. If you're using our server, send us your new Query and we'll put it up.

Details and Troubleshooting

 

Database Considerations

The Query framework is designed to construct boolean searches. For example:
(calcium[TITLE] AND calmodulin[TITLE]) OR (kinase[ALL] AND atpase[ALL]). Any new QuerySubmitter class should be able to translate this into a format the database can understand. This may involve standardizing the boolean operators, taking out spaces, making joins to produce the same effect as the parentheses, or even mapping the logic into SQL. These efforts should be transparent to the user.

The QuerySubmitter can make whatever network connections are required to contact the database. The NCBIQuerySubmitter installed with the BioQuery application makes only http connections (and will thus work behind a firewall) and does not require any non-standard java classes or packages, keeping the query package portable. This is the ideal, but may not always be possible.

 

Formatting Results

The databases installed with BioQuery use a number of different return formats. In the file querys.xml these are coupled with a file extension that indicates how to display these formats. BioQuery currently only supports displaying plain text and html, so the 2 file extension types are: txt and html. When an NCBIQuery returns results in GenBank format or in XML, they are usually embedded in an html document. It is up to the QuerySubmitter to parse and correctly format the text or html. The BioQuery user interface can correctly display html and make external links from absolute addresses (by calling the computer's browser). However, it cannot run javascript or enable relative links (it doesn't know the parent address). Hence, parsing of html pages and expanding relative links is usually the task of the QuerySubmitter. If you're adventurous enough to want to expand the GUI as well as the query package, you can add custom subclasses of DataView that will display any kind of data and use different file extensions.

 

Location of Files

The querys.xml file is located in the META-INF directory under the BioQuery directory on the client, and in the META-INF directory under the base directory of the virtual machine on the server (see Server Installation Instructions for details). You can create and test your Query on the client, but will have to modify the server to get the Auto-Update feature working. If you're using our server, just send us the new Query and we'll host it for you. Write us for more details.

Your new subclass of QuerySubmitter should be placed in the BioQuery directory on the client, and in the bioquery->WEB-INF->classes directory on the server.

 

Use the current files as Examples

When expanding the set of databases BioQuery can search, you only need to work with 2 files: your new subclass of QuerySubmitter, and querys.xml. The existing entries in querys.xml are self-explanatory, so use them as a guide. You can also use the existing NCBIQuerySubmitter as a guide. However, this QuerySubmitter can query 8 different databases, making it a bit more complicated. You do not need to recompile the source code for the original BioQuery program to get your extension to work, but you should download the the souce code from our website for examples and for a better understanding of the program.

 

Adding a new database does not require writing much code, it just takes a conceptual understanding of the query package. This page is just a primer. For more details and support in your efforts, please write us at: support@bioquery.org

 

 

Developer Guide.

BioQuery Home.