Download a Pathway as SBML

Download a Pathway as SBML#

In this tutorial we will show how can to export an enviPath Pathway to SBML in a few lines of code. The pathway that we will download is the Deprenyl pathway from EAWAG-SLUDGE.

For this tutorial we will import libsbml, a Python API that facilitates the generation of SBML documents from various data formats.

In the following cell we will import this package as well as enviPath_python and we will define some constants that will help us with the forecoming code.

from libsbml import *
from enviPath_python import enviPath
from enviPath_python.objects import Pathway
from enviPath_python import enums

HOST_INSTANCE = "https://envipath.org/"
eP = enviPath(HOST_INSTANCE)
pwid = HOST_INSTANCE + 'package/7932e576-03c7-4106-819d-fe80dc605b8a/pathway/b21b1d65-e0d1-4060-b890-45bf3713924a' # Deprenyl sludge
package = 'SLUDGE'
pathway = Pathway(eP.requester, id=pwid)

We will be using some helper functions to ensure data integrity, while following the best practices applied on the tutorial createSimpleModel.py. The helper functions are briefly described as follows:

  • check(value, message) method: Ensures that the method used to update the SBML document executes successfully, otherwise message will be returned.

  • get_xml_from_scenarios(node) method: allows to automatically add the additional information contained in the scenarios of a given node as SBML Annotations.

  • is_float(value) method: Returns True if the passed value is casteable as a float.

  • get_valid_id(ID) and get_original_id(ID) methods: encode and decode an enviPath-shapped URL into a SBML-valid format, respectively.

def check(value, message):
    """If 'value' is None, prints an error message constructed using
    'message' and then exits with status code 1.  If 'value' is an integer,
    it assumes it is a libSBML return status code.  If the code value is
    LIBSBML_OPERATION_SUCCESS, returns without further action; if it is not,
    prints an error message constructed using 'message' along with text from
    libSBML explaining the meaning of the code, and exits with status code 1.
    """
    if value == None:
        raise SystemExit('LibSBML returned a null value trying to ' + message + '.')
    elif type(value) is int:
        if value == LIBSBML_OPERATION_SUCCESS:
            return
        else:
            err_msg = 'Error encountered trying to ' + message + '.' \
                      + 'LibSBML returned error code ' + str(value) + ': "' \
                      + OperationReturnValue_toString(value).strip() + '"'
            raise SystemExit(err_msg)
    else:
        return

def get_xml_from_scenarios(node):
    """Parses the scenarios contained in our nodes in XML syntax"""
    xml_string = ""
    for scenario in node.get_scenarios():
        valid_id = get_valid_id(scenario.get_id())
        tmp_ai_string = ""
        for ai in scenario.get_additional_information():
            tmp_string = ""
            for (key, value) in ai.params.items():
                if value is None or value == "":
                    continue
                elif not is_float(value):
                    value = get_valid_id(value)
                tmp_string += "<" + key + ">" + str(value) + "</" + key + ">"
            tmp_string = "<" + ai.name + ">" + tmp_string + "</" + ai.name + ">"
            tmp_ai_string += tmp_string
        xml_string += "<" + valid_id + ">" + tmp_ai_string + "</" + valid_id + ">"
    if xml_string != "":
        xml_string = "<scenarios>" + xml_string + "</scenarios>"
    return xml_string


def is_float(value):
    """Whether the value can be casted as float or not"""
    try:
        float(value)
        return True
    except ValueError:
        return False


def get_valid_id(ID):
    """Parses the ID to replace non-valid-SBML characters as the only valid
    special character '_' """
    ID = ID.replace(HOST_INSTANCE, "")
    valid_id = ""
    for char in ID:
        if (char.isdigit() or char.isalpha()):
            valid_id += char
        else:
            valid_id += "_"
    return valid_id


def get_original_id(ID):
    """Gets an enviPath URL that has been processed with 'get_valid_id' method
    and return the original URL"""
    original_id = ""
    random_chars = []
    for sequence in ID.split("_"):
        try:
            enums.Endpoint(sequence)
            random_chars_string = "-".join(random_chars)
            if len(random_chars) > 0:
                original_id += random_chars_string + "/" + sequence + "/"
            else:
                original_id += sequence + "/"
            random_chars = []
        except ValueError:
            random_chars.append(sequence)
    if len(random_chars) > 0:
        original_id += "-".join(random_chars)
    return HOST_INSTANCE + original_id

We first will create an SBMLDocument object passing the corresponding level and version, in this tutorial we will use 2 and 1, respectively. After that one must create a model and set its Id, which should be unique for the whole SBML Document.

Eventually the compartment must be created. A compartment can be understood as the matrix where the chemicals interact with each other. In our case we define this to be a the Deprenyl pathway, because all biodegradation reactions are happening under the same conditions. A compartment has 2 mandatory parameters:

  • An Id, which has to be unique for the whole SBML Document

  • A boolean named Constant, which defines whether the compartment changes over time

A more detailed explanation those objects can be found on the SBML website of Model and Compartment

document = SBMLDocument(2, 1) # (SBML level, version)

model = document.createModel()
check(model, 'create model')
check(model.setId("My_Pathway"), "name Model as 'My_Pathway'")

# Create compartment
c1 = model.createCompartment()
compartment_id = get_valid_id(pathway.get_id())
check(c1, 'create compartment')
check(c1.setId(compartment_id), 'set compartment id')
# Mandatory on version 3
# check(c1.setConstant(True), 'set compartment "constant"')

With a compartment created we can start adding species and reactions to it. With enviPath-python it is very easy to achieve this, we will loop over each node on a pathway using pathway.get_nodes() and we will extract the URL of the node, the name, the external references to PubChem, KEGG and ChEBI and the information stored on all the scenarios using the methods, node.get_id() (and the helper function get_valid_id(ID)), node.get_name(), structure.get_pubchem_references() and structure.get_external_references(), and the helper function get_xml_from_scenarios(node), respectively.

A SBML Species has a set of mandatory fields to be set, these are:

  • The ID: a unique identifier for the whole SBML Document

  • On SBML version 3:

    • The boolean HasOnlySubstanceUnits: which states whether there are only substances

    • The boolean setConstant and setBoundaryCondition: which determines whether and how the quantity of that species can vary during a simulation

More parameters are accepted by SBML and the methods to set them and their descriptions can be found on the SBML Species page

# Create species
for node in pathway.get_nodes():
    cpd = model.createSpecies()
    name = node.get_name()
    ID = get_valid_id(node.get_id())
    
    check(cpd, 'create compound {}'.format(name))
    check(cpd.setName(name), 'set name {}'.format(name))
    # Mandatory:
    check(cpd.setId(ID),'set id {}'.format(ID))
    check(cpd.setMetaId(ID),'set id {}'.format(ID))
    check(cpd.setCompartment(compartment_id), 'set compartment')
    # Mandatory on version 3
    # check(cpd.setBoundaryCondition(True),     'set "boundaryCondition"')
    # check(cpd.setConstant(True), 'set compartment "constant"')
    # check(cpd.setHasOnlySubstanceUnits(False), 'set "hasOnlySubstanceUnits"')

    structure = node.get_default_structure()

    for link in structure.get_pubchem_references():
        cv = CVTerm()
        check(cv.setQualifierType(BIOLOGICAL_QUALIFIER), "Adding the type of qualifier")
        check(cv.setBiologicalQualifierType(BQB_IS_VERSION_OF), "Adding the biological qualifier type")
        check(cv.addResource(link), "Adding the resource to the CV")
        check(cpd.addCVTerm(cv), "Adding CV to the corresponding substance")

    for links in structure.get_external_references().values():
        for link in links:
            cv = CVTerm()
            check(cv.setQualifierType(BIOLOGICAL_QUALIFIER), "Adding the type of qualifier")
            check(cv.setBiologicalQualifierType(BQB_IS_VERSION_OF), "Adding the biological qualifier type")
            check(cv.addResource(link), "Adding the resource to the CV")
            check(cpd.addCVTerm(cv), "Adding CV to the corresponding substance")


    xml_string = get_xml_from_scenarios(node)
    attribute_xml = XMLNode.convertStringToXMLNode(xml_string)
    check(cpd.appendAnnotation(attribute_xml), 'set annotation')

Hide code cell output

The following warnings appeared while parsing the scenario https://envipath.org/package/7932e576-03c7-4106-819d-fe80dc605b8a/scenario/1013997b-748c-49ba-8912-0329aa0f426b
Error when trying to parse AerationTypeAdditionalInformation, raised error The value shaking on circulatin shaking table at 160 rpm is not one of the allowed values ['stirring', 'shaking', 'bubbling air', 'bubbling air and stirring', 'other']
The following warnings appeared while parsing the scenario https://envipath.org/package/7932e576-03c7-4106-819d-fe80dc605b8a/scenario/ced968d3-1c01-43c2-b055-206fe9eff177
Error when trying to parse AerationTypeAdditionalInformation, raised error The value shaking on circulatin shaking table at 160 rpm is not one of the allowed values ['stirring', 'shaking', 'bubbling air', 'bubbling air and stirring', 'other']
The following warnings appeared while parsing the scenario https://envipath.org/package/7932e576-03c7-4106-819d-fe80dc605b8a/scenario/e240a6ee-f3e6-4c50-8739-d65f64beaf2c
Error when trying to parse AerationTypeAdditionalInformation, raised error The value shaking on circulatin shaking table at 160 rpm is not one of the allowed values ['stirring', 'shaking', 'bubbling air', 'bubbling air and stirring', 'other']
The following warnings appeared while parsing the scenario https://envipath.org/package/7932e576-03c7-4106-819d-fe80dc605b8a/scenario/56099608-4330-4fc5-9201-2d49e323e531
Error when trying to parse AerationTypeAdditionalInformation, raised error The value shaking on circulatin shaking table at 160 rpm is not one of the allowed values ['stirring', 'shaking', 'bubbling air', 'bubbling air and stirring', 'other']
The following warnings appeared while parsing the scenario https://envipath.org/package/7932e576-03c7-4106-819d-fe80dc605b8a/scenario/1a41af8a-9b94-48b5-b988-bb60b2bd2d8a
Error when trying to parse AerationTypeAdditionalInformation, raised error The value shaking on circulatin shaking table at 160 rpm is not one of the allowed values ['stirring', 'shaking', 'bubbling air', 'bubbling air and stirring', 'other']
The following warnings appeared while parsing the scenario https://envipath.org/package/7932e576-03c7-4106-819d-fe80dc605b8a/scenario/3efd4941-0b42-42ae-a312-62d6c3e8461a
Error when trying to parse AerationTypeAdditionalInformation, raised error The value shaking on circulatin shaking table at 160 rpm is not one of the allowed values ['stirring', 'shaking', 'bubbling air', 'bubbling air and stirring', 'other']
The following warnings appeared while parsing the scenario https://envipath.org/package/7932e576-03c7-4106-819d-fe80dc605b8a/scenario/57b2c6ba-e055-43f4-b3b9-b296a7137878
Error when trying to parse AerationTypeAdditionalInformation, raised error The value shaking on circulatin shaking table at 160 rpm is not one of the allowed values ['stirring', 'shaking', 'bubbling air', 'bubbling air and stirring', 'other']
The following warnings appeared while parsing the scenario https://envipath.org/package/7932e576-03c7-4106-819d-fe80dc605b8a/scenario/29a0ec3d-041f-428c-a155-e59911fc14f6
Error when trying to parse AerationTypeAdditionalInformation, raised error The value shaking on circulatin shaking table at 160 rpm is not one of the allowed values ['stirring', 'shaking', 'bubbling air', 'bubbling air and stirring', 'other']

Finally we add the reactions of the pathway using the method pathway.get_edges() and adding the information stored in there in SBML Reaction objects. There are a few mandatory fields for a Reaction and additional ones can be set as explained on the SBML Reaction page, in this tutorial we will set:

  • The ID: as explained before, this is a unique identifier for the SBML Document

  • For version 3:

    • The boolean setReversible: to indicate whether the reaction is reversible or not

    • The boolean setFast: indicates whether a reaction occurs on a vastly faster time scale than others in the system.

# Create reactions
reaction_ids = []
for edge in pathway.get_edges():
    rxn_id = get_valid_id(edge.get_id())

    r = model.createReaction()
    check(r, 'create reaction')
    check(r.setId(rxn_id), 'set reaction id')
    check(r.setMetaId(rxn_id),'set id {}'.format(rxn_id))
    check(r.setName(get_valid_id(edge.get_name())), 'set reaction name')
    # Mandatory on version 3
    # check(r.setReversible(False), 'set reaction reversibility flag')
    # check(r.setFast(False), 'set reaction "fast" attribute')

    # Substrates
    for reactant_node in edge.get_start_nodes():
        reactant = r.createReactant()
        check(reactant, 'create reactant')
        check(reactant.setSpecies(get_valid_id(reactant_node.get_id())), 'assign reactant species')
        # Mandatory on version 3
        # check(reactant.setConstant(True), 'set "constant" on reactant')

    # Products
    for product_node in edge.get_end_nodes():
        product = r.createProduct()
        check(product, 'create product')
        check(product.setSpecies(get_valid_id(product_node.get_id())), 'assign product species')
        # Mandatory on version 3
        # check(product.setConstant(True), 'set "constant" on product')

    # Rhea references
    for link in edge.get_reaction().get_rhea_references():
        cv = CVTerm()
        check(cv.setQualifierType(BIOLOGICAL_QUALIFIER), "Adding the type of qualifier")
        check(cv.setBiologicalQualifierType(BQB_IS_VERSION_OF), "Adding the biological qualifier type")
        check(cv.addResource(link), "Adding the resource to the CV")
        check(cpd.addCVTerm(cv), "Adding CV to the corresponding substance")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[5], line 32
     27     check(product.setSpecies(get_valid_id(product_node.get_id())), 'assign product species')
     28     # Mandatory on version 3
     29     # check(product.setConstant(True), 'set "constant" on product')
     30 
     31 # Rhea references
---> 32 for link in edge.get_reaction().get_rhea_references():
     33     cv = CVTerm()
     34     check(cv.setQualifierType(BIOLOGICAL_QUALIFIER), "Adding the type of qualifier")

File ~/checkouts/readthedocs.org/user_builds/envipath-python/envs/main/lib/python3.10/site-packages/enviPath_python/objects.py:1343, in Reaction.get_rhea_references(self)
   1337 def get_rhea_references(self) -> List[str]:
   1338     """
   1339     Retrieves the links to Rhea for the given reaction
   1340 
   1341     :return: A list of links to rhea with similar reactions
   1342     """
-> 1343     return self._get('rheaReferences')

File ~/checkouts/readthedocs.org/user_builds/envipath-python/envs/main/lib/python3.10/site-packages/enviPath_python/objects.py:88, in enviPathObject._get(self, field)
     86         self.loaded = True
     87 if not hasattr(self, field):
---> 88     raise ValueError('{} has no property {}'.format(self.get_type(), field))
     90 return getattr(self, field)

ValueError: Reaction has no property rheaReferences

libsbml enables to save the generated document on .sbml file on a straight-forward manner, as shown below

filename = f'pathway_{get_valid_id(pathway.get_name())}.sbml'
writeSBMLToFile(document, filename)
1

After this tutorial we have been able to write a syntactically correct SBML file, this however does not mean that the SBML file is valid. We follow the validateSBML.py tutorial to ensure that no core errors exists in our SBML file.

sbmlDoc  = readSBML(filename)
errors   = sbmlDoc.getNumErrors()

seriousErrors = False
numReadErr  = 0
numReadWarn = 0
errMsgRead  = ""

if errors > 0:
    print(f"The SBML file contains {len(errors)} errors")
    for i in range(errors):
        severity = sbmlDoc.getError(i).getSeverity()
        if (severity == LIBSBML_SEV_ERROR) or (severity == LIBSBML_SEV_FATAL):
            seriousErrors = True
            numReadErr += 1
        else:
            numReadWarn += 1
        errMsgRead = sbmlDoc.getErrorLog().toString()
        for message in errMsgRead.split("\n\n"):
            print(message + "\n")

And finally we can check whether some minor warnings are found

failures = sbmlDoc.checkConsistency()

numCCErr  = 0
numCCWarn = 0
if failures > 0:
    isinvalid = False
    for i in range(failures):
        severity = sbmlDoc.getError(i).getSeverity()
        if (severity == LIBSBML_SEV_ERROR) or (severity == LIBSBML_SEV_FATAL):
            isinvalid = True
        else:
            numCCWarn += 1
        if isinvalid:
            self.numinvalid += 1
    
    errMsgCC = sbmlDoc.getErrorLog().toString()
    for message in errMsgCC.split("\n\n"):
        print(message + "\n")
        
line 5: (80501 [Warning]) As a principle of best modeling practice, the size of a <compartment> should be set to a value rather than be left undefined. Doing so improves the portability of models between different simulation and analysis systems, and helps make it easier to detect potential errors in models.
 The <compartment> with the id 'package_7932e576_03c7_4106_819d_fe80dc605b8a_pathway_b21b1d65_e0d1_4060_b890_45bf3713924a' does not have a 'size' attribute, nor is its initial value set by an <initialAssignment> or <assignmentRule>.

line 8: (80601 [Warning]) As a principle of best modeling practice, the <species> should set an initial value (amount or concentration) rather than be left undefined. Doing so improves the portability of models between different simulation and analysis systems, and helps make it easier to detect potential errors in models.
 The <species> with the id 'package_7932e576_03c7_4106_819d_fe80dc605b8a_pathway_b21b1d65_e0d1_4060_b890_45bf3713924a_node_97edb6b1_3dda_4c73_89a0_1a1d9f6d95c8' does not have an 'initialConcentration' or 'initialAmount' attribute, nor is its initial value set by an <initialAssignment> or <assignmentRule>.

line 91: (80601 [Warning]) As a principle of best modeling practice, the <species> should set an initial value (amount or concentration) rather than be left undefined. Doing so improves the portability of models between different simulation and analysis systems, and helps make it easier to detect potential errors in models.
 The <species> with the id 'package_7932e576_03c7_4106_819d_fe80dc605b8a_pathway_b21b1d65_e0d1_4060_b890_45bf3713924a_node_ca6c81fa_2c27_4650_8925_78b9a487ffe1' does not have an 'initialConcentration' or 'initialAmount' attribute, nor is its initial value set by an <initialAssignment> or <assignmentRule>.

line 174: (80601 [Warning]) As a principle of best modeling practice, the <species> should set an initial value (amount or concentration) rather than be left undefined. Doing so improves the portability of models between different simulation and analysis systems, and helps make it easier to detect potential errors in models.
 The <species> with the id 'package_7932e576_03c7_4106_819d_fe80dc605b8a_pathway_b21b1d65_e0d1_4060_b890_45bf3713924a_node_c41850f6_6826_45b6_ade8_c9bb90fe8e8b' does not have an 'initialConcentration' or 'initialAmount' attribute, nor is its initial value set by an <initialAssignment> or <assignmentRule>.

line 257: (80601 [Warning]) As a principle of best modeling practice, the <species> should set an initial value (amount or concentration) rather than be left undefined. Doing so improves the portability of models between different simulation and analysis systems, and helps make it easier to detect potential errors in models.
 The <species> with the id 'package_7932e576_03c7_4106_819d_fe80dc605b8a_pathway_b21b1d65_e0d1_4060_b890_45bf3713924a_node_0d36bada_1d91_4cad_8660_b5c8184da110' does not have an 'initialConcentration' or 'initialAmount' attribute, nor is its initial value set by an <initialAssignment> or <assignmentRule>.

line 340: (80601 [Warning]) As a principle of best modeling practice, the <species> should set an initial value (amount or concentration) rather than be left undefined. Doing so improves the portability of models between different simulation and analysis systems, and helps make it easier to detect potential errors in models.
 The <species> with the id 'package_7932e576_03c7_4106_819d_fe80dc605b8a_pathway_b21b1d65_e0d1_4060_b890_45bf3713924a_node_bd568328_003f_4645_ab9f_2fd4a0927ae0' does not have an 'initialConcentration' or 'initialAmount' attribute, nor is its initial value set by an <initialAssignment> or <assignmentRule>.

line 423: (80601 [Warning]) As a principle of best modeling practice, the <species> should set an initial value (amount or concentration) rather than be left undefined. Doing so improves the portability of models between different simulation and analysis systems, and helps make it easier to detect potential errors in models.
 The <species> with the id 'package_7932e576_03c7_4106_819d_fe80dc605b8a_pathway_b21b1d65_e0d1_4060_b890_45bf3713924a_node_8e895627_e58b_422a_8482_5a21098f007e' does not have an 'initialConcentration' or 'initialAmount' attribute, nor is its initial value set by an <initialAssignment> or <assignmentRule>.

line 506: (80601 [Warning]) As a principle of best modeling practice, the <species> should set an initial value (amount or concentration) rather than be left undefined. Doing so improves the portability of models between different simulation and analysis systems, and helps make it easier to detect potential errors in models.
 The <species> with the id 'package_7932e576_03c7_4106_819d_fe80dc605b8a_pathway_b21b1d65_e0d1_4060_b890_45bf3713924a_node_d7cccc60_6c42_4418_a0c8_60631a35b275' does not have an 'initialConcentration' or 'initialAmount' attribute, nor is its initial value set by an <initialAssignment> or <assignmentRule>.

line 816: (80601 [Warning]) As a principle of best modeling practice, the <species> should set an initial value (amount or concentration) rather than be left undefined. Doing so improves the portability of models between different simulation and analysis systems, and helps make it easier to detect potential errors in models.
 The <species> with the id 'package_7932e576_03c7_4106_819d_fe80dc605b8a_pathway_b21b1d65_e0d1_4060_b890_45bf3713924a_node_6a8f948e_fd6b_4cee_8fea_60d1204901e1' does not have an 'initialConcentration' or 'initialAmount' attribute, nor is its initial value set by an <initialAssignment> or <assignmentRule>.