Download a Pathway as SBML

Download a Pathway as SBML#

In this tutorial we will show how can to export an enviPath Pathway to SBML in a few lines of code. The pathway that we will download is the Deprenyl pathway from EAWAG-SLUDGE.

For this tutorial we will import libsbml, a Python API that facilitates the generation of SBML documents from various data formats.

In the following cell we will import this package as well as enviPath_python and we will define some constants that will help us with the forecoming code.

from libsbml import *
from enviPath_python import enviPath
from enviPath_python.objects import Pathway
from enviPath_python import enums

HOST_INSTANCE = "https://envipath.org/"
eP = enviPath(HOST_INSTANCE)
pwid = HOST_INSTANCE + 'package/7932e576-03c7-4106-819d-fe80dc605b8a/pathway/b21b1d65-e0d1-4060-b890-45bf3713924a' # Deprenyl sludge
package = 'SLUDGE'
pathway = Pathway(eP.requester, id=pwid)

We will be using some helper functions to ensure data integrity, while following the best practices applied on the tutorial createSimpleModel.py. The helper functions are briefly described as follows:

  • check(value, message) method: Ensures that the method used to update the SBML document executes successfully, otherwise message will be returned.

  • get_xml_from_scenarios(node) method: allows to automatically add the additional information contained in the scenarios of a given node as SBML Annotations.

  • is_float(value) method: Returns True if the passed value is casteable as a float.

  • get_valid_id(ID) and get_original_id(ID) methods: encode and decode an enviPath-shapped URL into a SBML-valid format, respectively.

def check(value, message):
    """If 'value' is None, prints an error message constructed using
    'message' and then exits with status code 1.  If 'value' is an integer,
    it assumes it is a libSBML return status code.  If the code value is
    LIBSBML_OPERATION_SUCCESS, returns without further action; if it is not,
    prints an error message constructed using 'message' along with text from
    libSBML explaining the meaning of the code, and exits with status code 1.
    """
    if value == None:
        raise SystemExit('LibSBML returned a null value trying to ' + message + '.')
    elif type(value) is int:
        if value == LIBSBML_OPERATION_SUCCESS:
            return
        else:
            err_msg = 'Error encountered trying to ' + message + '.' \
                      + 'LibSBML returned error code ' + str(value) + ': "' \
                      + OperationReturnValue_toString(value).strip() + '"'
            raise SystemExit(err_msg)
    else:
        return

def get_xml_from_scenarios(node):
    """Parses the scenarios contained in our nodes in XML syntax"""
    xml_string = ""
    for scenario in node.get_scenarios():
        valid_id = get_valid_id(scenario.get_id())
        tmp_ai_string = ""
        for ai in scenario.get_additional_information():
            tmp_string = ""
            for (key, value) in ai.params.items():
                if value is None or value == "":
                    continue
                elif not is_float(value):
                    value = get_valid_id(value)
                tmp_string += "<" + key + ">" + str(value) + "</" + key + ">"
            tmp_string = "<" + ai.name + ">" + tmp_string + "</" + ai.name + ">"
            tmp_ai_string += tmp_string
        xml_string += "<" + valid_id + ">" + tmp_ai_string + "</" + valid_id + ">"
    if xml_string != "":
        xml_string = "<scenarios>" + xml_string + "</scenarios>"
    return xml_string


def is_float(value):
    """Whether the value can be casted as float or not"""
    try:
        float(value)
        return True
    except ValueError:
        return False


def get_valid_id(ID):
    """Parses the ID to replace non-valid-SBML characters as the only valid
    special character '_' """
    ID = ID.replace(HOST_INSTANCE, "")
    valid_id = ""
    for char in ID:
        if (char.isdigit() or char.isalpha()):
            valid_id += char
        else:
            valid_id += "_"
    return valid_id


def get_original_id(ID):
    """Gets an enviPath URL that has been processed with 'get_valid_id' method
    and return the original URL"""
    original_id = ""
    random_chars = []
    for sequence in ID.split("_"):
        try:
            enums.Endpoint(sequence)
            random_chars_string = "-".join(random_chars)
            if len(random_chars) > 0:
                original_id += random_chars_string + "/" + sequence + "/"
            else:
                original_id += sequence + "/"
            random_chars = []
        except ValueError:
            random_chars.append(sequence)
    if len(random_chars) > 0:
        original_id += "-".join(random_chars)
    return HOST_INSTANCE + original_id

We first will create an SBMLDocument object passing the corresponding level and version, in this tutorial we will use 2 and 1, respectively. After that one must create a model and set its Id, which should be unique for the whole SBML Document.

Eventually the compartment must be created. A compartment can be understood as the matrix where the chemicals interact with each other. In our case we define this to be a the Deprenyl pathway, because all biodegradation reactions are happening under the same conditions. A compartment has 2 mandatory parameters:

  • An Id, which has to be unique for the whole SBML Document

  • A boolean named Constant, which defines whether the compartment changes over time

A more detailed explanation those objects can be found on the SBML website of Model and Compartment

document = SBMLDocument(2, 1) # (SBML level, version)

model = document.createModel()
check(model, 'create model')
check(model.setId("My_Pathway"), "name Model as 'My_Pathway'")

# Create compartment
c1 = model.createCompartment()
compartment_id = get_valid_id(pathway.get_id())
check(c1, 'create compartment')
check(c1.setId(compartment_id), 'set compartment id')
# Mandatory on version 3
# check(c1.setConstant(True), 'set compartment "constant"')

With a compartment created we can start adding species and reactions to it. With enviPath-python it is very easy to achieve this, we will loop over each node on a pathway using pathway.get_nodes() and we will extract the URL of the node, the name, the external references to PubChem, KEGG and ChEBI and the information stored on all the scenarios using the methods, node.get_id() (and the helper function get_valid_id(ID)), node.get_name(), structure.get_pubchem_references() and structure.get_external_references(), and the helper function get_xml_from_scenarios(node), respectively.

A SBML Species has a set of mandatory fields to be set, these are:

  • The ID: a unique identifier for the whole SBML Document

  • On SBML version 3:

    • The boolean HasOnlySubstanceUnits: which states whether there are only substances

    • The boolean setConstant and setBoundaryCondition: which determines whether and how the quantity of that species can vary during a simulation

More parameters are accepted by SBML and the methods to set them and their descriptions can be found on the SBML Species page

# Create species
for node in pathway.get_nodes():
    cpd = model.createSpecies()
    name = node.get_name()
    ID = get_valid_id(node.get_id())
    
    check(cpd, 'create compound {}'.format(name))
    check(cpd.setName(name), 'set name {}'.format(name))
    # Mandatory:
    check(cpd.setId(ID),'set id {}'.format(ID))
    check(cpd.setMetaId(ID),'set id {}'.format(ID))
    check(cpd.setCompartment(compartment_id), 'set compartment')
    # Mandatory on version 3
    # check(cpd.setBoundaryCondition(True),     'set "boundaryCondition"')
    # check(cpd.setConstant(True), 'set compartment "constant"')
    # check(cpd.setHasOnlySubstanceUnits(False), 'set "hasOnlySubstanceUnits"')

    structure = node.get_default_structure()

    for link in structure.get_pubchem_references():
        cv = CVTerm()
        check(cv.setQualifierType(BIOLOGICAL_QUALIFIER), "Adding the type of qualifier")
        check(cv.setBiologicalQualifierType(BQB_IS_VERSION_OF), "Adding the biological qualifier type")
        check(cv.addResource(link), "Adding the resource to the CV")
        check(cpd.addCVTerm(cv), "Adding CV to the corresponding substance")

    for links in structure.get_external_references().values():
        for link in links:
            cv = CVTerm()
            check(cv.setQualifierType(BIOLOGICAL_QUALIFIER), "Adding the type of qualifier")
            check(cv.setBiologicalQualifierType(BQB_IS_VERSION_OF), "Adding the biological qualifier type")
            check(cv.addResource(link), "Adding the resource to the CV")
            check(cpd.addCVTerm(cv), "Adding CV to the corresponding substance")


    xml_string = get_xml_from_scenarios(node)
    attribute_xml = XMLNode.convertStringToXMLNode(xml_string)
    check(cpd.appendAnnotation(attribute_xml), 'set annotation')

Hide code cell output

---------------------------------------------------------------------------
JSONDecodeError                           Traceback (most recent call last)
File ~/checkouts/readthedocs.org/user_builds/envipath-python/envs/develop/lib/python3.10/site-packages/requests/models.py:976, in Response.json(self, **kwargs)
    975 try:
--> 976     return complexjson.loads(self.text, **kwargs)
    977 except JSONDecodeError as e:
    978     # Catch JSON-related errors and raise as requests.JSONDecodeError
    979     # This aliases json.JSONDecodeError and simplejson.JSONDecodeError

File ~/.asdf/installs/python/3.10.17/lib/python3.10/json/__init__.py:346, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    343 if (cls is None and object_hook is None and
    344         parse_int is None and parse_float is None and
    345         parse_constant is None and object_pairs_hook is None and not kw):
--> 346     return _default_decoder.decode(s)
    347 if cls is None:

File ~/.asdf/installs/python/3.10.17/lib/python3.10/json/decoder.py:337, in JSONDecoder.decode(self, s, _w)
    333 """Return the Python representation of ``s`` (a ``str`` instance
    334 containing a JSON document).
    335 
    336 """
--> 337 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    338 end = _w(s, end).end()

File ~/.asdf/installs/python/3.10.17/lib/python3.10/json/decoder.py:355, in JSONDecoder.raw_decode(self, s, idx)
    354 except StopIteration as err:
--> 355     raise JSONDecodeError("Expecting value", s, err.value) from None
    356 return obj, end

JSONDecodeError: Expecting value: line 3 column 1 (char 2)

During handling of the above exception, another exception occurred:

JSONDecodeError                           Traceback (most recent call last)
Cell In[4], line 2
      1 # Create species
----> 2 for node in pathway.get_nodes():
      3     cpd = model.createSpecies()
      4     name = node.get_name()

File ~/checkouts/readthedocs.org/user_builds/envipath-python/envs/develop/lib/python3.10/site-packages/enviPath_python/objects.py:2293, in Pathway.get_nodes(self)
   2287 def get_nodes(self) -> List[Node]:
   2288     """
   2289     Retrieves the nodes of the Pathway
   2290 
   2291     :return: a List of Node objects
   2292     """
-> 2293     nodes = self._get('nodes')
   2295     # Remove pseudo nodes
   2296     non_pseudo_nodes = []

File ~/checkouts/readthedocs.org/user_builds/envipath-python/envs/develop/lib/python3.10/site-packages/enviPath_python/objects.py:83, in enviPathObject._get(self, field)
     72 """
     73 Tries to get a field of the object. As objects are only initialized with 'name' and 'id' all other
     74 fields must be fetched from the enviPath instance. This fetches data only once.
   (...)
     80 :return: The value of the field.
     81 """
     82 if not self.loaded:
---> 83     obj_fields = self._load()
     84     for k, v in obj_fields.items():
     85         setattr(self, k, v)

File ~/checkouts/readthedocs.org/user_builds/envipath-python/envs/develop/lib/python3.10/site-packages/enviPath_python/objects.py:130, in enviPathObject._load(self)
    124 def _load(self):
    125     """
    126     Fetches data from the enviPath instance via the enviPathRequester provided at objects creation.
    127 
    128     :return: json containing the server response.
    129     """
--> 130     res = self.requester.get_request(self.id).json()
    131     return res

File ~/checkouts/readthedocs.org/user_builds/envipath-python/envs/develop/lib/python3.10/site-packages/requests/models.py:980, in Response.json(self, **kwargs)
    976     return complexjson.loads(self.text, **kwargs)
    977 except JSONDecodeError as e:
    978     # Catch JSON-related errors and raise as requests.JSONDecodeError
    979     # This aliases json.JSONDecodeError and simplejson.JSONDecodeError
--> 980     raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)

JSONDecodeError: Expecting value: line 3 column 1 (char 2)

Finally we add the reactions of the pathway using the method pathway.get_edges() and adding the information stored in there in SBML Reaction objects. There are a few mandatory fields for a Reaction and additional ones can be set as explained on the SBML Reaction page, in this tutorial we will set:

  • The ID: as explained before, this is a unique identifier for the SBML Document

  • For version 3:

    • The boolean setReversible: to indicate whether the reaction is reversible or not

    • The boolean setFast: indicates whether a reaction occurs on a vastly faster time scale than others in the system.

# Create reactions
reaction_ids = []
for edge in pathway.get_edges():
    rxn_id = get_valid_id(edge.get_id())

    r = model.createReaction()
    check(r, 'create reaction')
    check(r.setId(rxn_id), 'set reaction id')
    check(r.setMetaId(rxn_id),'set id {}'.format(rxn_id))
    check(r.setName(get_valid_id(edge.get_name())), 'set reaction name')
    # Mandatory on version 3
    # check(r.setReversible(False), 'set reaction reversibility flag')
    # check(r.setFast(False), 'set reaction "fast" attribute')

    # Substrates
    for reactant_node in edge.get_start_nodes():
        reactant = r.createReactant()
        check(reactant, 'create reactant')
        check(reactant.setSpecies(get_valid_id(reactant_node.get_id())), 'assign reactant species')
        # Mandatory on version 3
        # check(reactant.setConstant(True), 'set "constant" on reactant')

    # Products
    for product_node in edge.get_end_nodes():
        product = r.createProduct()
        check(product, 'create product')
        check(product.setSpecies(get_valid_id(product_node.get_id())), 'assign product species')
        # Mandatory on version 3
        # check(product.setConstant(True), 'set "constant" on product')

    # Rhea references
    for link in edge.get_reaction().get_rhea_references():
        cv = CVTerm()
        check(cv.setQualifierType(BIOLOGICAL_QUALIFIER), "Adding the type of qualifier")
        check(cv.setBiologicalQualifierType(BQB_IS_VERSION_OF), "Adding the biological qualifier type")
        check(cv.addResource(link), "Adding the resource to the CV")
        check(cpd.addCVTerm(cv), "Adding CV to the corresponding substance")

libsbml enables to save the generated document on .sbml file on a straight-forward manner, as shown below

filename = f'pathway_{get_valid_id(pathway.get_name())}.sbml'
writeSBMLToFile(document, filename)
1

After this tutorial we have been able to write a syntactically correct SBML file, this however does not mean that the SBML file is valid. We follow the validateSBML.py tutorial to ensure that no core errors exists in our SBML file.

sbmlDoc  = readSBML(filename)
errors   = sbmlDoc.getNumErrors()

seriousErrors = False
numReadErr  = 0
numReadWarn = 0
errMsgRead  = ""

if errors > 0:
    print(f"The SBML file contains {len(errors)} errors")
    for i in range(errors):
        severity = sbmlDoc.getError(i).getSeverity()
        if (severity == LIBSBML_SEV_ERROR) or (severity == LIBSBML_SEV_FATAL):
            seriousErrors = True
            numReadErr += 1
        else:
            numReadWarn += 1
        errMsgRead = sbmlDoc.getErrorLog().toString()
        for message in errMsgRead.split("\n\n"):
            print(message + "\n")

And finally we can check whether some minor warnings are found

failures = sbmlDoc.checkConsistency()

numCCErr  = 0
numCCWarn = 0
if failures > 0:
    isinvalid = False
    for i in range(failures):
        severity = sbmlDoc.getError(i).getSeverity()
        if (severity == LIBSBML_SEV_ERROR) or (severity == LIBSBML_SEV_FATAL):
            isinvalid = True
        else:
            numCCWarn += 1
        if isinvalid:
            self.numinvalid += 1
    
    errMsgCC = sbmlDoc.getErrorLog().toString()
    for message in errMsgCC.split("\n\n"):
        print(message + "\n")
        
line 5: (80501 [Warning]) As a principle of best modeling practice, the size of a <compartment> should be set to a value rather than be left undefined. Doing so improves the portability of models between different simulation and analysis systems, and helps make it easier to detect potential errors in models.
 The <compartment> with the id 'package_7932e576_03c7_4106_819d_fe80dc605b8a_pathway_b21b1d65_e0d1_4060_b890_45bf3713924a' does not have a 'size' attribute, nor is its initial value set by an <initialAssignment> or <assignmentRule>.

line 8: (80601 [Warning]) As a principle of best modeling practice, the <species> should set an initial value (amount or concentration) rather than be left undefined. Doing so improves the portability of models between different simulation and analysis systems, and helps make it easier to detect potential errors in models.
 The <species> with the id 'package_7932e576_03c7_4106_819d_fe80dc605b8a_pathway_b21b1d65_e0d1_4060_b890_45bf3713924a_node_97edb6b1_3dda_4c73_89a0_1a1d9f6d95c8' does not have an 'initialConcentration' or 'initialAmount' attribute, nor is its initial value set by an <initialAssignment> or <assignmentRule>.

line 91: (80601 [Warning]) As a principle of best modeling practice, the <species> should set an initial value (amount or concentration) rather than be left undefined. Doing so improves the portability of models between different simulation and analysis systems, and helps make it easier to detect potential errors in models.
 The <species> with the id 'package_7932e576_03c7_4106_819d_fe80dc605b8a_pathway_b21b1d65_e0d1_4060_b890_45bf3713924a_node_ca6c81fa_2c27_4650_8925_78b9a487ffe1' does not have an 'initialConcentration' or 'initialAmount' attribute, nor is its initial value set by an <initialAssignment> or <assignmentRule>.

line 174: (80601 [Warning]) As a principle of best modeling practice, the <species> should set an initial value (amount or concentration) rather than be left undefined. Doing so improves the portability of models between different simulation and analysis systems, and helps make it easier to detect potential errors in models.
 The <species> with the id 'package_7932e576_03c7_4106_819d_fe80dc605b8a_pathway_b21b1d65_e0d1_4060_b890_45bf3713924a_node_c41850f6_6826_45b6_ade8_c9bb90fe8e8b' does not have an 'initialConcentration' or 'initialAmount' attribute, nor is its initial value set by an <initialAssignment> or <assignmentRule>.

line 257: (80601 [Warning]) As a principle of best modeling practice, the <species> should set an initial value (amount or concentration) rather than be left undefined. Doing so improves the portability of models between different simulation and analysis systems, and helps make it easier to detect potential errors in models.
 The <species> with the id 'package_7932e576_03c7_4106_819d_fe80dc605b8a_pathway_b21b1d65_e0d1_4060_b890_45bf3713924a_node_0d36bada_1d91_4cad_8660_b5c8184da110' does not have an 'initialConcentration' or 'initialAmount' attribute, nor is its initial value set by an <initialAssignment> or <assignmentRule>.

line 340: (80601 [Warning]) As a principle of best modeling practice, the <species> should set an initial value (amount or concentration) rather than be left undefined. Doing so improves the portability of models between different simulation and analysis systems, and helps make it easier to detect potential errors in models.
 The <species> with the id 'package_7932e576_03c7_4106_819d_fe80dc605b8a_pathway_b21b1d65_e0d1_4060_b890_45bf3713924a_node_bd568328_003f_4645_ab9f_2fd4a0927ae0' does not have an 'initialConcentration' or 'initialAmount' attribute, nor is its initial value set by an <initialAssignment> or <assignmentRule>.

line 423: (80601 [Warning]) As a principle of best modeling practice, the <species> should set an initial value (amount or concentration) rather than be left undefined. Doing so improves the portability of models between different simulation and analysis systems, and helps make it easier to detect potential errors in models.
 The <species> with the id 'package_7932e576_03c7_4106_819d_fe80dc605b8a_pathway_b21b1d65_e0d1_4060_b890_45bf3713924a_node_8e895627_e58b_422a_8482_5a21098f007e' does not have an 'initialConcentration' or 'initialAmount' attribute, nor is its initial value set by an <initialAssignment> or <assignmentRule>.

line 506: (80601 [Warning]) As a principle of best modeling practice, the <species> should set an initial value (amount or concentration) rather than be left undefined. Doing so improves the portability of models between different simulation and analysis systems, and helps make it easier to detect potential errors in models.
 The <species> with the id 'package_7932e576_03c7_4106_819d_fe80dc605b8a_pathway_b21b1d65_e0d1_4060_b890_45bf3713924a_node_d7cccc60_6c42_4418_a0c8_60631a35b275' does not have an 'initialConcentration' or 'initialAmount' attribute, nor is its initial value set by an <initialAssignment> or <assignmentRule>.

line 816: (80601 [Warning]) As a principle of best modeling practice, the <species> should set an initial value (amount or concentration) rather than be left undefined. Doing so improves the portability of models between different simulation and analysis systems, and helps make it easier to detect potential errors in models.
 The <species> with the id 'package_7932e576_03c7_4106_819d_fe80dc605b8a_pathway_b21b1d65_e0d1_4060_b890_45bf3713924a_node_6a8f948e_fd6b_4cee_8fea_60d1204901e1' does not have an 'initialConcentration' or 'initialAmount' attribute, nor is its initial value set by an <initialAssignment> or <assignmentRule>.