Download a Pathway as SBML#
In this tutorial we will show how can to export an enviPath Pathway to SBML in a few lines of code. The pathway that we will download is the Deprenyl pathway from EAWAG-SLUDGE.
For this tutorial we will import libsbml, a Python API that facilitates the generation of SBML documents from various data formats.
In the following cell we will import this package as well as enviPath_python and we will define some constants that will help us with the forecoming code.
from libsbml import *
from enviPath_python import enviPath
from enviPath_python.objects import Pathway
from enviPath_python import enums
HOST_INSTANCE = "https://envipath.org/"
eP = enviPath(HOST_INSTANCE)
pwid = HOST_INSTANCE + 'package/7932e576-03c7-4106-819d-fe80dc605b8a/pathway/b21b1d65-e0d1-4060-b890-45bf3713924a' # Deprenyl sludge
package = 'SLUDGE'
pathway = Pathway(eP.requester, id=pwid)
We will be using some helper functions to ensure data integrity, while following the best practices applied on the tutorial createSimpleModel.py. The helper functions are briefly described as follows:
check(value, message)method: Ensures that the method used to update the SBML document executes successfully, otherwisemessagewill be returned.get_xml_from_scenarios(node)method: allows to automatically add the additional information contained in the scenarios of a givennodeas SBML Annotations.is_float(value)method: ReturnsTrueif the passedvalueis casteable as a float.get_valid_id(ID)andget_original_id(ID)methods: encode and decode an enviPath-shapped URL into a SBML-valid format, respectively.
def check(value, message):
"""If 'value' is None, prints an error message constructed using
'message' and then exits with status code 1. If 'value' is an integer,
it assumes it is a libSBML return status code. If the code value is
LIBSBML_OPERATION_SUCCESS, returns without further action; if it is not,
prints an error message constructed using 'message' along with text from
libSBML explaining the meaning of the code, and exits with status code 1.
"""
if value == None:
raise SystemExit('LibSBML returned a null value trying to ' + message + '.')
elif type(value) is int:
if value == LIBSBML_OPERATION_SUCCESS:
return
else:
err_msg = 'Error encountered trying to ' + message + '.' \
+ 'LibSBML returned error code ' + str(value) + ': "' \
+ OperationReturnValue_toString(value).strip() + '"'
raise SystemExit(err_msg)
else:
return
def get_xml_from_scenarios(node):
"""Parses the scenarios contained in our nodes in XML syntax"""
xml_string = ""
for scenario in node.get_scenarios():
valid_id = get_valid_id(scenario.get_id())
tmp_ai_string = ""
for ai in scenario.get_additional_information():
tmp_string = ""
for (key, value) in ai.params.items():
if value is None or value == "":
continue
elif not is_float(value):
value = get_valid_id(value)
tmp_string += "<" + key + ">" + str(value) + "</" + key + ">"
tmp_string = "<" + ai.name + ">" + tmp_string + "</" + ai.name + ">"
tmp_ai_string += tmp_string
xml_string += "<" + valid_id + ">" + tmp_ai_string + "</" + valid_id + ">"
if xml_string != "":
xml_string = "<scenarios>" + xml_string + "</scenarios>"
return xml_string
def is_float(value):
"""Whether the value can be casted as float or not"""
try:
float(value)
return True
except ValueError:
return False
def get_valid_id(ID):
"""Parses the ID to replace non-valid-SBML characters as the only valid
special character '_' """
ID = ID.replace(HOST_INSTANCE, "")
valid_id = ""
for char in ID:
if (char.isdigit() or char.isalpha()):
valid_id += char
else:
valid_id += "_"
return valid_id
def get_original_id(ID):
"""Gets an enviPath URL that has been processed with 'get_valid_id' method
and return the original URL"""
original_id = ""
random_chars = []
for sequence in ID.split("_"):
try:
enums.Endpoint(sequence)
random_chars_string = "-".join(random_chars)
if len(random_chars) > 0:
original_id += random_chars_string + "/" + sequence + "/"
else:
original_id += sequence + "/"
random_chars = []
except ValueError:
random_chars.append(sequence)
if len(random_chars) > 0:
original_id += "-".join(random_chars)
return HOST_INSTANCE + original_id
We first will create an SBMLDocument object passing the corresponding level and version, in this tutorial we will use 2 and 1, respectively. After that one must create a model and set its Id, which should be unique for the whole SBML Document.
Eventually the compartment must be created. A compartment can be understood as the matrix where the chemicals interact with each other. In our case we define this to be a the Deprenyl pathway, because all biodegradation reactions are happening under the same conditions. A compartment has 2 mandatory parameters:
An Id, which has to be unique for the whole SBML Document
A boolean named Constant, which defines whether the compartment changes over time
A more detailed explanation those objects can be found on the SBML website of Model and Compartment
document = SBMLDocument(2, 1) # (SBML level, version)
model = document.createModel()
check(model, 'create model')
check(model.setId("My_Pathway"), "name Model as 'My_Pathway'")
# Create compartment
c1 = model.createCompartment()
compartment_id = get_valid_id(pathway.get_id())
check(c1, 'create compartment')
check(c1.setId(compartment_id), 'set compartment id')
# Mandatory on version 3
# check(c1.setConstant(True), 'set compartment "constant"')
With a compartment created we can start adding species and reactions to it. With enviPath-python it is very easy to achieve this, we will loop over each node on a pathway using pathway.get_nodes() and we will extract the URL of the node, the name, the external references to PubChem, KEGG and ChEBI and the information stored on all the scenarios using the methods, node.get_id() (and the helper function get_valid_id(ID)), node.get_name(), structure.get_pubchem_references() and structure.get_external_references(), and the helper function get_xml_from_scenarios(node), respectively.
A SBML Species has a set of mandatory fields to be set, these are:
The ID: a unique identifier for the whole SBML Document
On SBML version 3:
The boolean HasOnlySubstanceUnits: which states whether there are only substances
The boolean setConstant and setBoundaryCondition: which determines whether and how the quantity of that species can vary during a simulation
More parameters are accepted by SBML and the methods to set them and their descriptions can be found on the SBML Species page
# Create species
for node in pathway.get_nodes():
cpd = model.createSpecies()
name = node.get_name()
ID = get_valid_id(node.get_id())
check(cpd, 'create compound {}'.format(name))
check(cpd.setName(name), 'set name {}'.format(name))
# Mandatory:
check(cpd.setId(ID),'set id {}'.format(ID))
check(cpd.setMetaId(ID),'set id {}'.format(ID))
check(cpd.setCompartment(compartment_id), 'set compartment')
# Mandatory on version 3
# check(cpd.setBoundaryCondition(True), 'set "boundaryCondition"')
# check(cpd.setConstant(True), 'set compartment "constant"')
# check(cpd.setHasOnlySubstanceUnits(False), 'set "hasOnlySubstanceUnits"')
structure = node.get_default_structure()
for link in structure.get_pubchem_references():
cv = CVTerm()
check(cv.setQualifierType(BIOLOGICAL_QUALIFIER), "Adding the type of qualifier")
check(cv.setBiologicalQualifierType(BQB_IS_VERSION_OF), "Adding the biological qualifier type")
check(cv.addResource(link), "Adding the resource to the CV")
check(cpd.addCVTerm(cv), "Adding CV to the corresponding substance")
for links in structure.get_external_references().values():
for link in links:
cv = CVTerm()
check(cv.setQualifierType(BIOLOGICAL_QUALIFIER), "Adding the type of qualifier")
check(cv.setBiologicalQualifierType(BQB_IS_VERSION_OF), "Adding the biological qualifier type")
check(cv.addResource(link), "Adding the resource to the CV")
check(cpd.addCVTerm(cv), "Adding CV to the corresponding substance")
xml_string = get_xml_from_scenarios(node)
attribute_xml = XMLNode.convertStringToXMLNode(xml_string)
check(cpd.appendAnnotation(attribute_xml), 'set annotation')
Finally we add the reactions of the pathway using the method pathway.get_edges() and adding the information stored in there in SBML Reaction objects. There are a few mandatory fields for a Reaction and additional ones can be set as explained on the SBML Reaction page, in this tutorial we will set:
The ID: as explained before, this is a unique identifier for the SBML Document
For version 3:
The boolean setReversible: to indicate whether the reaction is reversible or not
The boolean setFast: indicates whether a reaction occurs on a vastly faster time scale than others in the system.
# Create reactions
reaction_ids = []
for edge in pathway.get_edges():
rxn_id = get_valid_id(edge.get_id())
r = model.createReaction()
check(r, 'create reaction')
check(r.setId(rxn_id), 'set reaction id')
check(r.setMetaId(rxn_id),'set id {}'.format(rxn_id))
check(r.setName(get_valid_id(edge.get_name())), 'set reaction name')
# Mandatory on version 3
# check(r.setReversible(False), 'set reaction reversibility flag')
# check(r.setFast(False), 'set reaction "fast" attribute')
# Substrates
for reactant_node in edge.get_start_nodes():
reactant = r.createReactant()
check(reactant, 'create reactant')
check(reactant.setSpecies(get_valid_id(reactant_node.get_id())), 'assign reactant species')
# Mandatory on version 3
# check(reactant.setConstant(True), 'set "constant" on reactant')
# Products
for product_node in edge.get_end_nodes():
product = r.createProduct()
check(product, 'create product')
check(product.setSpecies(get_valid_id(product_node.get_id())), 'assign product species')
# Mandatory on version 3
# check(product.setConstant(True), 'set "constant" on product')
# Rhea references
for link in edge.get_reaction().get_rhea_references():
cv = CVTerm()
check(cv.setQualifierType(BIOLOGICAL_QUALIFIER), "Adding the type of qualifier")
check(cv.setBiologicalQualifierType(BQB_IS_VERSION_OF), "Adding the biological qualifier type")
check(cv.addResource(link), "Adding the resource to the CV")
check(cpd.addCVTerm(cv), "Adding CV to the corresponding substance")
libsbml enables to save the generated document on .sbml file on a straight-forward manner, as shown below
filename = f'pathway_{get_valid_id(pathway.get_name())}.sbml'
writeSBMLToFile(document, filename)
1
After this tutorial we have been able to write a syntactically correct SBML file, this however does not mean that the SBML file is valid. We follow the validateSBML.py tutorial to ensure that no core errors exists in our SBML file.
sbmlDoc = readSBML(filename)
errors = sbmlDoc.getNumErrors()
seriousErrors = False
numReadErr = 0
numReadWarn = 0
errMsgRead = ""
if errors > 0:
print(f"The SBML file contains {len(errors)} errors")
for i in range(errors):
severity = sbmlDoc.getError(i).getSeverity()
if (severity == LIBSBML_SEV_ERROR) or (severity == LIBSBML_SEV_FATAL):
seriousErrors = True
numReadErr += 1
else:
numReadWarn += 1
errMsgRead = sbmlDoc.getErrorLog().toString()
for message in errMsgRead.split("\n\n"):
print(message + "\n")
And finally we can check whether some minor warnings are found
failures = sbmlDoc.checkConsistency()
numCCErr = 0
numCCWarn = 0
if failures > 0:
isinvalid = False
for i in range(failures):
severity = sbmlDoc.getError(i).getSeverity()
if (severity == LIBSBML_SEV_ERROR) or (severity == LIBSBML_SEV_FATAL):
isinvalid = True
else:
numCCWarn += 1
if isinvalid:
self.numinvalid += 1
errMsgCC = sbmlDoc.getErrorLog().toString()
for message in errMsgCC.split("\n\n"):
print(message + "\n")
line 5: (80501 [Warning]) As a principle of best modeling practice, the size of a <compartment> should be set to a value rather than be left undefined. Doing so improves the portability of models between different simulation and analysis systems, and helps make it easier to detect potential errors in models.
The <compartment> with the id 'package_7932e576_03c7_4106_819d_fe80dc605b8a_pathway_b21b1d65_e0d1_4060_b890_45bf3713924a' does not have a 'size' attribute, nor is its initial value set by an <initialAssignment> or <assignmentRule>.
line 8: (80601 [Warning]) As a principle of best modeling practice, the <species> should set an initial value (amount or concentration) rather than be left undefined. Doing so improves the portability of models between different simulation and analysis systems, and helps make it easier to detect potential errors in models.
The <species> with the id 'package_7932e576_03c7_4106_819d_fe80dc605b8a_pathway_b21b1d65_e0d1_4060_b890_45bf3713924a_node_97edb6b1_3dda_4c73_89a0_1a1d9f6d95c8' does not have an 'initialConcentration' or 'initialAmount' attribute, nor is its initial value set by an <initialAssignment> or <assignmentRule>.
line 91: (80601 [Warning]) As a principle of best modeling practice, the <species> should set an initial value (amount or concentration) rather than be left undefined. Doing so improves the portability of models between different simulation and analysis systems, and helps make it easier to detect potential errors in models.
The <species> with the id 'package_7932e576_03c7_4106_819d_fe80dc605b8a_pathway_b21b1d65_e0d1_4060_b890_45bf3713924a_node_ca6c81fa_2c27_4650_8925_78b9a487ffe1' does not have an 'initialConcentration' or 'initialAmount' attribute, nor is its initial value set by an <initialAssignment> or <assignmentRule>.
line 174: (80601 [Warning]) As a principle of best modeling practice, the <species> should set an initial value (amount or concentration) rather than be left undefined. Doing so improves the portability of models between different simulation and analysis systems, and helps make it easier to detect potential errors in models.
The <species> with the id 'package_7932e576_03c7_4106_819d_fe80dc605b8a_pathway_b21b1d65_e0d1_4060_b890_45bf3713924a_node_c41850f6_6826_45b6_ade8_c9bb90fe8e8b' does not have an 'initialConcentration' or 'initialAmount' attribute, nor is its initial value set by an <initialAssignment> or <assignmentRule>.
line 257: (80601 [Warning]) As a principle of best modeling practice, the <species> should set an initial value (amount or concentration) rather than be left undefined. Doing so improves the portability of models between different simulation and analysis systems, and helps make it easier to detect potential errors in models.
The <species> with the id 'package_7932e576_03c7_4106_819d_fe80dc605b8a_pathway_b21b1d65_e0d1_4060_b890_45bf3713924a_node_0d36bada_1d91_4cad_8660_b5c8184da110' does not have an 'initialConcentration' or 'initialAmount' attribute, nor is its initial value set by an <initialAssignment> or <assignmentRule>.
line 340: (80601 [Warning]) As a principle of best modeling practice, the <species> should set an initial value (amount or concentration) rather than be left undefined. Doing so improves the portability of models between different simulation and analysis systems, and helps make it easier to detect potential errors in models.
The <species> with the id 'package_7932e576_03c7_4106_819d_fe80dc605b8a_pathway_b21b1d65_e0d1_4060_b890_45bf3713924a_node_bd568328_003f_4645_ab9f_2fd4a0927ae0' does not have an 'initialConcentration' or 'initialAmount' attribute, nor is its initial value set by an <initialAssignment> or <assignmentRule>.
line 423: (80601 [Warning]) As a principle of best modeling practice, the <species> should set an initial value (amount or concentration) rather than be left undefined. Doing so improves the portability of models between different simulation and analysis systems, and helps make it easier to detect potential errors in models.
The <species> with the id 'package_7932e576_03c7_4106_819d_fe80dc605b8a_pathway_b21b1d65_e0d1_4060_b890_45bf3713924a_node_8e895627_e58b_422a_8482_5a21098f007e' does not have an 'initialConcentration' or 'initialAmount' attribute, nor is its initial value set by an <initialAssignment> or <assignmentRule>.
line 506: (80601 [Warning]) As a principle of best modeling practice, the <species> should set an initial value (amount or concentration) rather than be left undefined. Doing so improves the portability of models between different simulation and analysis systems, and helps make it easier to detect potential errors in models.
The <species> with the id 'package_7932e576_03c7_4106_819d_fe80dc605b8a_pathway_b21b1d65_e0d1_4060_b890_45bf3713924a_node_d7cccc60_6c42_4418_a0c8_60631a35b275' does not have an 'initialConcentration' or 'initialAmount' attribute, nor is its initial value set by an <initialAssignment> or <assignmentRule>.
line 816: (80601 [Warning]) As a principle of best modeling practice, the <species> should set an initial value (amount or concentration) rather than be left undefined. Doing so improves the portability of models between different simulation and analysis systems, and helps make it easier to detect potential errors in models.
The <species> with the id 'package_7932e576_03c7_4106_819d_fe80dc605b8a_pathway_b21b1d65_e0d1_4060_b890_45bf3713924a_node_6a8f948e_fd6b_4cee_8fea_60d1204901e1' does not have an 'initialConcentration' or 'initialAmount' attribute, nor is its initial value set by an <initialAssignment> or <assignmentRule>.