The Pathway Tools Advanced Query Pages
The Structured Advanced Query Page
and the Free Form Advanced Query Page
Note: available on this Web site are
two webinars on Using the Structured Advanced Query Page.
The Advanced Query Pages allow you to write queries to extract
data from Pathway/Genome Data Bases (PGDBs), hosted on a Pathway Tools
server. A complex database query is a database expression that selects a
subset of data from a PGDB by specifying constraints on the values of data
fields, by combining information from different regions
of the DB, and by operating on data fields. Example: "Find Reactions
that have a Reactant that is a Small-Molecule and the Common-Name of the Reactant is ATP."
There are two different interfaces for formulating queries.
- The Structured Advanced Query Page: This is the
initial page provided when you first click the Advanced Query button
(see Section 2).
- The Free Form Advanced Query Page: This is accessible from
the Structured Advanced Query Page by clicking the similarly
labeled wide button at the top of the web page (see Section 3).
Both interfaces use an underlying query language called
BioVelo — but to use the Structured Advanced Query Page
does not require the user to know this language because this page
translates your input to BioVelo. The Free Form Advanced Query
Page gives full and direct access to the BioVelo language.
You can switch back and forth between the Structured page and the
Free Form page, using the button near the top of the pages. However,
this switching will not modify the contents of the pages. In
particular, you can enter a query on one page, submit it, and then
switch to the other page and submit a different query. But, the output
format selection is shared between the two pages.
The databases queried by BioVelo contain objects
belonging to various classes: metabolic pathways, reactions,
proteins, genes, and so on. Each class has a set of attributes
associated with it. For example, the class Proteins
has attributes that include pI (its isoelectric
point), and Gene (the gene encoding the protein). That
means that each protein object (instance) of this class has the
attribute Gene, although in some objects, the
attribute may have no value. The Pathway Tools schema (ontology) is
described in several documents. The most comprehensive is the Pathway
Tools User's Guide, which is available as part of the Pathway Tools
software download package. See also several publications listed on the
BioCyc publications page.
The better you know the Pathway Tools schema, the more adept you will be
at writing BioVelo queries, because you will need to know what classes to
base your queries on, and which attributes to filter in your queries.
have been tested with Internet Explorer 6 and 7, Mozilla/Firefox 1.5, and Safari
2.0.3. The functionality of these pages varies slightly from
browser to browser.
2. The Structured Advanced Query Page
The Structured Advanced Query Page has been designed to facilitate
writing simple as well as complex queries. This page is formed
dynamically and its content expands depending on your selections. This
interface lets you formulate a query without knowing the underlying
query language (BioVelo). When you submit your query, it is
translated into BioVelo before being sent to the server. As mentioned before, this
page does not provide complete access to the BioVelo language,
which is richer in capability than this interface provides. But the
page allows a powerful range of queries to be formulated. The
Free Form Advanced Query Page
(see Section 3)
provides full access to the BioVelo language.
The Structured Advanced Query Page contains two main sections.
Your first step should be to specify the query in the section labeled
Specify your query below. Only after specifying the query
should you specify the output contents of the query in the section labeled
Specify the contents of the output of your query below.
The selection of the query output contents depends on
the specified query.
The default output format is HTML format for viewing
in your Web browser. The format can be changed by
selecting the radio button labeled Text Tabulated
instead, which may be preferable in some cases. The wide button at
the bottom of the page labeled Submit Query should be
clicked only after specifying your query, the desired output format,
and the contents of the output.
To get started quickly here are some query examples, and descriptions of how to build those
queries using the Structured Advanced Query Page.
More details on using the Structured Advanced Query Page are given in the next subsections.
Example Query 1: Find all the proteins of E. coli K-12.
Starting from the initial page, you select the database
E. coli K-12, and the selector next to search
for to Proteins. The column 1
in the contents of the output section is pre-selected to
NAME which is good selection for this query. Clicking
the Submit Query button, the query is sent and a new
browser window will open displaying the result (this may take a while
depending on the server) -- that is a table of one column of all known
proteins defined in the PGDB for E. coli K-12. Clicking on a
protein name will bring you to the PGDB information page about the
Example Query 2: Find all the proteins of E. coli K-12 for which the
DNA-FOOTPRINT-SIZE is smaller than 10.
As in example 1, you select database E. coli K-12 and
Proteins for the first two selectors. Since we want to
add a condition to this search, click the add a condition; a Where clause will
appear. Select DNA-FOOTPRINT-SIZE next to the
Where; then select is smaller than
from the selector next to it (the operator "<"). Enter the value
10 in the last free input text box. Since you would
probably like to see the value of DNA-FOOTPRINT-SIZE
for each proteins, add an output column by clicking add
column in the bottom part of the web page and selecting
DNA-FOOTPRINT-SIZE from the pull-down menu of this new
column. Submit the query by clicking the submit button.
This query is intended to select transcription factors, and although it
scans all proteins in the PGDB, only transcription factors will have the
DNA-FOOTPRINT-SIZE attribute set.
Example Query 3: Find all E. coli K-12 proteins that have No
information and 2006 in the comment attribute, meaning
that curators found no information about this protein during literature searches
performed in 2006.
As in example 1, you select database E. coli K-12 and
Proteins for the first two selectors, then click the
add a condition; a
Where clause will appear. You then select
COMMENT from the selector next to
Where (COMMENT is an attribute of
protein objects) and contains the substring from the next selector. In the last box of
this line, which is a free text input box, not a selector, you enter
No information. The repeat operator should already be
selected to at least one element of. On the next line
there is a selector box with add a condition; select
and from it. A new term appears on its right;
you essentially do the same operations as on the first line but enter
2006 in the free text input box on the right.
Finally, click the Submit Query button; the proteins
that satisfy this query will be displayed in a new browser window.
Example Query 4: Search for all pathways in MetaCyc
that are in the taxonomic range of metazoa.
You select database MetaCyc and class Pathways,
then click the add a condition; a Where clause will appear.
You then select Taxonomic-Range from the selector next to
Where. The repeat operator for some object ...
will automatically appear on the left of the Taxonomic-Range
attribute and a we have subcondition will be created underneath
them. There is a repeat operator, since the attribute Taxonomic-Range
is a list of objects, not a single value. This subcondition applies to the objects of
the Taxonomic-Range attribute. Enter metazoa
in the green box located to the right of the attribute NAME
that was automatically selected when the we have
subcondition was created. Finally, click the Submit Query button;
Example Query 5: Search for all reactions in
MetaCyc that have D-glucose on the left (reactant) and
D-glucose-6-phosphate on the right (product).
The next subsections explain the Structured Advanced Query Page in more detail.
In the initial state of the Structured Advanced Query Page, only one simple search
component with two selectors (also called pull-down
menus) are shown: a database selector and a class
selector. You can select the desired database and class by using
these pull-down menus. You submit such a query
by clicking the Submit Query button. This is a global
search for all objects of the given class for the given database.
This will typically return many results.
Note that the class selector shows the class names in a
hierarchical manner to present the subclass relation between
them. That is, if class S is a subclass of class T,
S would be shown underneath T and indented to the right
with at least one dash. For example, the class of
reactions is divided into the smaller subclasses of
small-molecule-reactions. Subclasses can themselves
have subclasses, which are shown indented several times with several
Each class name is followed by a number in parentheses. This is the
number of instances that exist in the class, for the database
selected, available on the server. For example, --Genes
(4819) says that there are 4819 genes in the database
selected (e.g., MetaCyc). If the server is busy, this
number may be absent for a few seconds when you first access the web
page as the server needs to calculate it once a database is
Often there is a need to add one or more conditions to the search
to select a subset of all instances of a class. You can add
conditions to a search by clicking the button add a condition, causing a where clause to
open up below the two main selectors. (The clicked button will also
change to a different state with the label remove condition.) The where clause will show one
term. A term is composed of a left operand, that is a
pull-down menu (typically the attribute NAME is
preselected), a relational operation, that is a second pull-down menu,
and a right operand that is initially a free input text
box. The right operand is initially a pull-down menu if the left
operand attribute is of type enumerated or Boolean. You can formulate
a condition on the selected attribute by selecting the appropriate
operation and a value for the right operand. For example, to have the
condition that the attribute NAME contains the
substring tr you select the relational operator
contains the substring and enter tr in
the green text box. (Do not enter the surrounding double quotes for a
string; this is automatically inserted by the user interface when the
query is sent to the server.)
When selecting an attribute in a condition, or the query output,
the list shown is in increasing alphabetical order. Moreover, any
attribute that refers to another object, or list of objects, is shown
with a light blue background. These attributes allow you, among other
things, to go from one class of objects to another class of objects.
For example, the attribute Product of class
Genes refers to a list of Polypeptides
or RNA. The background color when trying to select it,
from an attribute selector, is light blue. (note: the color is actually
modifiable from the style sheet of the server. It may vary from one
server to another.) When selecting such an attribute, a subcondition
is created to specify a condition based on the attribute(s)
of the object or objects of this selected attribute.
There are several string relational operators available when an
attribute is a string. Typical relational operators used are "is equal
to", "contains the substring", etc. It is also possible to use the
more complex "is similar to (regular expression)" or "is not similar
to (regular expression)". In this case, the right operand entered in
the green box should be a regular expression. The regular expression
syntax follows the Perl language syntax rules (See Perl regexp at Wikipedia
). For example, the regular expression t[a-z]*b
corresponds to all strings that contains the letter 't' followed by
any number of lower-case letters and have eventually a 'b'. A string is similar
to this regular expression if it contains a substring that
matches the regular expression. If you want to search for strings that
entirely match the regular expression, and not one of its proper
substrings, you must use the beginning and end Perl regular expression
operators, '^' and '$' respectively. For example,
^t[a-z]*b$ matches all strings that start with a 't',
is followed by any number of lower-case letters, and end with a 'b',
but not strings that only have such strings as
proper substrings: trpb matches this regular
expression, but not trpba as it does not end with a
'b'. Note that all the letters in a regular expression are
A variable selector is provided for the left operand if more than one
variable is active at the location of the term. The right operand will
have a button on its right with the label switch to variable
entry under the same condition. If clicked, it will modify the right
operand into a variable/attribute pair of pull-down menus (selectors). See Subsection
2.8 for more information about this
button. See Subsection 2.10 for more information
In the case of a right operand as a variable/attribute pair of
selectors, the list of attributes that can be selected depends on the
type of the left operand: the list of attributes shown depends on the
type of the right variable selected, and some attributes may be
grayed out since their type is such that no valid operation can be
done with the left attribute.
Several conditions can be added by selecting a logical operator from
the pull-down menu labeled add a condition. There are four
provided logical operators: and, or, and not, or
not. When selecting an operator, an initial term is created to its
right. The add a condition button will always be at the
bottom of the list of conditions. To remove one specific condition
(i.e., a term), you can use the pull-down menu of the term and select remove condition.
The grouping of the terms is as follows: the first two terms are
combined together; then this combined term is combined with the third,
and so on. That is, if the terms that appear from top to bottom are
written down from left to right, the operations are done from left to right.
No other grouping is available.
Some database attributes can contain a list of possible values.
For example, the attribute APPEARS-IN-BINDING-REACTIONS of class
GENES has type list of Binding-Reactions. That is,
this attribute has a value that is a list of objects belonging to the
class Binding-Reactions. (This list may be empty depending on
the database selected.)
For attributes that can have lists of values, a repeat operator
(e.g., for some object ...) is provided on its left.
An appropriate repeat operator should be selected before adding any
condition for this attribute.
For attributes of type list of some class, the repeat operators
are at least one object of, every object
of, exactly one object of, for no
object of, the number of objects of,
for some object ..., for all objects
..., for exactly one object ..., for
no objects .... These are explained in more detail in the
next section. Note that a repeat operator name ending with ellipsis
(...) means that, once selected, a sub-condition will open up where a
specific attribute can be selected.
For attributes of type list which are not objects (e.g.,
string) the repeat operators are similarly named by replacing
object for element except for operator with ellipsis (e.g., for some object ...)
which exists only for objects, not elements. For example, the attribute
SYNONYMS has the type list of string -- in this case the
first repeat operator is at least one element of.
In this section we describe the repeat operators associated with
objects, but the descriptions apply also to non-objects (e.g., numbers,
strings) as well. All the repeat operators other than the for each
object ... can be applied to any attribute of type list.
The four repeat operators at least one object of, every
object of, exactly one object of, and for no object of
are similar. Once selected, the term on its left is used to specify a
condition to be met by a certain number of elements. For example, in the case of
operator at least one object of, the number of elements
satisfying the condition must be greater than 0. For every object
of, this number must be equal to the number of elements in the
list, that is, all of them. For exactly one object of, this number
is 1. Finally, for for no object of, this number is 0.
The operator the number of objects of counts the number of
objects in the list of the attribute and compares it to a value
provided as the right operand. The desired relation (e.g., is equal to)
should be selected. The condition will be true if the number of
objects in the list of the attribute satisfies the relation.
In some cases an attribute of an object is a list of objects.
For example, the attribute product of an object of the
class Genes is a list of Polypeptides
or RNA; if you want to search through every gene which
does not have RNA as a product, you are interested in applying a
condition to all object of the list of products. This is a nested
search through a list of objects inside of another search (e.g.,
genes). This is what the following repeat operators will allow
you to do.
The repeat operator for all objects ... is provided for
attributes that are a list of some class -- not for a list of other types
like strings or numbers. For example, the attribute REACTION-LIST of
Pathways as type list of generalized-reactions -- the
for all objects ... is provided for it. On the other hand, the
attribute NAMES of class GENES has type list of
string so the operator for all objects ... is not provided for
it. Note that this is not a limitation of the BioVelo language but
rather a design decision to introduce some simplicity to the graphical
When for all objects ... is selected, it creates an initial
conditional expression that starts with the text we have and
introduces a new variable. It is essentially a where clause,
where the list of objects of the attribute are iteratively bound to
the new introduced variable. The conditional expression can refer to
this new variable and any previous ones already active. The condition
will be true if for every object of the list the condition is satisfied.
There are some cases where the attribute of an object is also an
object (i.e., a single object, not a list of objects). In this case,
there is a need to be able to access the attributes of this
object. The is an object ...
purpose. (Note: this is a rare case since most attributes that have an
object, have a list of objects, not a single object.)
For example, enzyme) has a single value which is an object.
When is an object ... is selected, a conditional expression
similar to a where clause is open under the attribute. It
also introduces a new variable. This variable can be used in the
conditional expression to refer to the object bound to the attribute
2.8 Referencing an Attribute for the Right Operand
Most of the time, the right operand to an operation can be freely
entered. For example, a number (e.g. 10) can be entered by simply
typing it in the box provided as a right operand to is greater
than. But there are situations where you want to compare to
the attribute of an object. In this case, the button labeled
switch to variable entry is provided. If clicked, two
selectors will replace the free entry box. One selector allows you to
select a variable, the other an attribute. The button that you just
clicked should now be labeled switch to constant
entry so that you can return the free entry box if you wan
Essentially, each search component allows you to search different
classes in the same or different databases. Each search component does
an iterative search of the objects of a class. By combining several
search components, a multidimensional Cartesian search is performed. For
example, if the first search component is done over proteins of
E. coli K-12, and the second search component is done over genes of the same
organism, the search is potentially over all combination of proteins
and genes. More precisely, all the conditions of the first search
component must be true before the second search component starts.
By clicking the button labeled insert a new search component
here a new search component is introduced at the location of
the button. A search component is visually delineated by a rectangular
box around it. The order of the search components is important. You
can remove a search component, but not the first one, by clicking the
x icon on its right.
For efficiency, it is important to order your search components
appropriately by specifying the first search component as the most
restrictive. Indeed, for the example above, this is potentially a time consuming query,
since E. coli K-12 has more than 4500 genes and more than 5100
proteins: the search space is potentially 4500 x 5100 (over 22
million) pairs of proteins and genes. Nevertheless, it is possible to
do multidimensional search if the number of satisfied
combinations is reasonable (less than 10000 say). If your search is
too time consuming the server may stop processing your search with an
error message to that effect.
When more than one search component is specified, variables are
introduced. The first search component is always associated with
the variable x1. The other search components use variables
with a higher index (e.g., x3). The variables allow
cross-referencing between the search components and in the output
2.10 Using Variables
Variables are introduced to reference different objects in one query.
For example, if two search components are specified, the first one
has its main objects (the objects from the class specified in the head
of the search component) associated with x1. We also say that
the objects are bound to x1. The second search component
will have a different variable name, say x2. If a where clause
is added to it, the variable x1 can be used in it to refer to the
current object from the first search component.
Some operations introduce variables in a query, such as adding
search components, or using one of the operators for each
object ... or is an object .... The variable
names are prefixed by the letter x as in
x1 or x2. These variables are
automatically introduced by the interface when they are needed; you
cannot change their names and there is no need to do so.
The interface takes care of adding a pull-down menu to select a
variable next to an attribute pull-down menu when such a variable
selection makes sense. The list of selectable variables is always
complete and non redundant. That is, you can select any variable in
such pull-down menus without worrying about a syntactical error in the
resulting query; when no such pull-down menu is available it is not
possible to reference such a variable.
The output contents will also provide pull-down menus to select a
variable if more than one variable exist in your query.
When using some of the repeat operators (e.g., for every element
of) internal variables will be automatically created. These are not
directly visible in the interface although they can be seen in the
translated BioVelo query when the submit button is clicked. You do not have to
know their names but it can be instructive to see how they are used if
you want to better understand BioVelo.
2.11 Controlling the Presentation of Query Results
The result of your query will be a table. The number of
rows of this table is the number of objects that satisfied your
query. The number of columns is user specified in the contents of
the output section of the web page. Initially, this section has only
one column specified with the attribute NAME preselected,
which means the unique PGDB identifier of each object. The
button add a column will add a column to this section. It
specifies an additional column in the result. In the resulting table,
this column will contain the value of the selected attribute. You can
select an attribute of your choice other than NAME.
If two or more search components exist in your query
specification, a variable selector is present in each column. You
should select the desired variable and its attribute.
You can remove a column by clicking its x icon. All columns to its
right are moved to the left and a renumbering of the columns is done.
That is, if you have four columns and you delete the second column,
the third column becomes the second one and the fourth becomes the
third one with their headers renumbered correspondingly.
2.12 The Output Formats
This subsection applies to both the Structured Advanced Query Page
and the Free Form Advanced Query Page.
There are two possible output formats: HTML and
Tab Delimited Text. The desired output format of your
query can be selected by clicking the radio button next to
Tab Delimited Text. When you load the initial Advanced Query
web page the default is the HTML format.
The HTML format provides links to the Pathway Tools display page for
each object found. This is the format preferred by most users.
The Tab Delimited Text format creates a text
formatted table whose columns are separated by the tab character. The web page returned
has a MIME type of text/plain and can be saved as a parsable text file that
can be imported into a spreadsheet for offline processing.
Query submission is always performed by clicking the Submit
Query button at the bottom of the page. If some simple error
is detected (e.g., an input box is empty) an error box will be
displayed. You should correct this error and retry. The result of the
query will be displayed in a new Web page. When this page opens up, it is blank,
and depending on the complexity of your query, it may take some time
before the results are shown. If the server detects an error in your
query (e.g., a syntactical error, type errors) it will send back an
3. The Free Form Advanced Query Page
This form allows you to enter more advanced queries than the
Structured Advanced Query Page because the full BioVelo query language is accessible from
the Free Form page. The Free Form page is more complex to use since it requires knowledge of the syntax
of the BioVelo language. Please consult BioVelo
for the syntax and semantics
of the BioVelo language.
The Free Form page can be reached by clicking the button labeled
Switch to the Free Form Advanced Query Page near the top of
the initial page. (If this button says Switch to the
Structured Advanced Query Page,
you are already on the Free Form Advanced Query Page.)
In this form, the query must be entered in the text box area on the
left -- this is the query text box area. The text box on the right
contains a list of query examples.
A query can be any BioVelo expression. Such expressions have more power
than the Structured Advanced Query Page, where only tables can be returned. In
most cases you will formulate a query that starts with [ and ends
with ] -- such queries return results in a table. But you can also write a
query in the form of a tuple (i.e., of the form (..., ...)), or even a query that will
return a single numerical value as in #dbs.
For a table resulting from a query of the form [...], the head of
the query, that is what comes before the colon :, is either an
expression that is not a tuple or a tuple of two or more expressions.
In the former case, this will return a table of one column; in the latter
case this will return a table of as many columns as there are
expressions in the tuple. Since the head of the query determines the
number of columns, the Free Form Advanced Query Page does not provide an output content
section as in the Structured Advanced Query Page.
A query that is a tuple will return as many results as there are expressions
in the tuple. It can thus return several tables.
Under these two text areas is a row of selectable options. These
are not used for formulating the query, but are provided as
reference documentation. Selectors are provided for the available
database names, class names, attribute names, and operators of the
BioVelo language. These can be primarily used as a reference
for the right spelling of database names (note: database names are in
parentheses, they do not have spaces in them, next to the species name
which are usually longer and may have spaces in them), class names,
attribute names and operators. For some browsers, a small yellow box
(i.e., a tooltip) appears when you hover the mouse pointer over the
attribute and operator options of these selectors. The tooltips
work for Mozilla/Firefox 1.5 and the Safari 2.0.3 browser -- but do not work
for IE 6 and 7.
Selecting a class name will change the list of attributes.
Under these selectors you can specify the output format to
either HTML (the default) or Tab Delimited
Text. Consult the Subsection 2.12 of the Structured Advanced Query
Page for more information on these output formats.
Click the Submit Button at the bottom of the page
to submit your query. A new browser window will open containing the
query result, or with a an error report in case of an error found in
the query. You can edit or reenter a completely different new query
in the query form and submit again. A new web page result will appear,
allowing you to compare different results from different queries.
You should be able to cut and paste a query, or any parts of it,
into the query text box area. You could store your queries in a
separate document on your computer and copy them back in the query
text box area.
For the complete BioVelo query language syntax and semantics please
consult the BioVelo Documentation.