Download Additional Information data#
In this tutorial, we will show how to access the additional information data stored in scenarios within a given package. We will do this for the Atenolol Pathway as an example. We will first access one scenario associated with the compound Atenolol within the Atenolol pathway, and extract all the additional information (metadata) within the scenario. Afterwards we will show how to access the metadata for multiple scenarios within a package by extending the code in the provided example. Finally, we will explore how to analyze trends in the metadata using experimental location as an example.
We first import the relevant enviPath objects for this tutorial
from enviPath_python.enviPath import enviPath
from enviPath_python.objects import *
import pandas as pd
As in other tutorials, we instantiate the host and the package we want to work with
INSTANCE_HOST = "https://envipath.org/"
EAWAG_SLUDGE_DATA_PACKAGE = "https://envipath.org/package/7932e576-03c7-4106-819d-fe80dc605b8a"
eP = enviPath(INSTANCE_HOST)
pkg = Package(eP.requester, id=EAWAG_SLUDGE_DATA_PACKAGE)
As discussed, we will access the metadata contained on Atenolol pathway and display it.
First, we search the pathway:
atenolol_pathway = eP.search("Atenolol", pkg)["pathway"][0]
print(f"Pathway name: {atenolol_pathway.get_name()}")
# We're interested in Atenolol itself, so we fetch the root node at position 1
node = atenolol_pathway.get_nodes()[1]
---------------------------------------------------------------------------
JSONDecodeError Traceback (most recent call last)
File ~/checkouts/readthedocs.org/user_builds/envipath-python/envs/develop/lib/python3.10/site-packages/requests/models.py:976, in Response.json(self, **kwargs)
975 try:
--> 976 return complexjson.loads(self.text, **kwargs)
977 except JSONDecodeError as e:
978 # Catch JSON-related errors and raise as requests.JSONDecodeError
979 # This aliases json.JSONDecodeError and simplejson.JSONDecodeError
File ~/.asdf/installs/python/3.10.17/lib/python3.10/json/__init__.py:346, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
343 if (cls is None and object_hook is None and
344 parse_int is None and parse_float is None and
345 parse_constant is None and object_pairs_hook is None and not kw):
--> 346 return _default_decoder.decode(s)
347 if cls is None:
File ~/.asdf/installs/python/3.10.17/lib/python3.10/json/decoder.py:337, in JSONDecoder.decode(self, s, _w)
333 """Return the Python representation of ``s`` (a ``str`` instance
334 containing a JSON document).
335
336 """
--> 337 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
338 end = _w(s, end).end()
File ~/.asdf/installs/python/3.10.17/lib/python3.10/json/decoder.py:355, in JSONDecoder.raw_decode(self, s, idx)
354 except StopIteration as err:
--> 355 raise JSONDecodeError("Expecting value", s, err.value) from None
356 return obj, end
JSONDecodeError: Expecting value: line 3 column 1 (char 2)
During handling of the above exception, another exception occurred:
JSONDecodeError Traceback (most recent call last)
Cell In[3], line 1
----> 1 atenolol_pathway = eP.search("Atenolol", pkg)["pathway"][0]
2 print(f"Pathway name: {atenolol_pathway.get_name()}")
3 # We're interested in Atenolol itself, so we fetch the root node at position 1
File ~/checkouts/readthedocs.org/user_builds/envipath-python/envs/develop/lib/python3.10/site-packages/enviPath_python/enviPath.py:108, in enviPath.search(self, term, packages, method)
105 res = self.requester.get_request('{}search'.format(self.BASE_URL), params=params)
106 res.raise_for_status()
--> 108 data = res.json()
110 result = {}
111 for k, vals in data.items():
File ~/checkouts/readthedocs.org/user_builds/envipath-python/envs/develop/lib/python3.10/site-packages/requests/models.py:980, in Response.json(self, **kwargs)
976 return complexjson.loads(self.text, **kwargs)
977 except JSONDecodeError as e:
978 # Catch JSON-related errors and raise as requests.JSONDecodeError
979 # This aliases json.JSONDecodeError and simplejson.JSONDecodeError
--> 980 raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
JSONDecodeError: Expecting value: line 3 column 1 (char 2)
We access a scenario from the list of scenarios that are attached to Atenolol
scenarios = node.get_scenarios()
print(f"We have {len(scenarios)} scenarios for Atenolol")
scenario = scenarios[0]
print(f"In this first example, we will explore the metadata contained in {scenario.get_id()} , ")
print(f"with name {scenario.get_name()}")
print(f"Description: {scenario.get_description()}")
We have 19 scenarios for Atenolol
In this first example, we will explore the metadata contained in https://envipath.org/package/7932e576-03c7-4106-819d-fe80dc605b8a/scenario/10dccbcb-4ab6-4a3a-b653-77b909bc6675 ,
with name Helbling et al., 2012 (DOM3) (Related Scenario) - (00000)
Description: no description
Lastly, we extract all the additional information objects in that scenario
additional_information_list = scenario.get_additional_information()
for ai in additional_information_list:
print(f"\n{ai.name}")
for param in ai.params.keys():
print(f"\t{param}: {ai.params[param]}")
acidity
lowPh: 7.5
highPh: 7.5
acidityType:
unit: pH
biologicaltreatmenttechnology
biologicaltreatmenttechnology: nitrification & denitrification
unit:
bioreactor
bioreactortype: amber glass Schott bottles (loosely capped)
bioreactorsize: 100.0
unit: mL
finalcompoundconcentration
finalcompoundconcentration: 100
unit: μg/L
inoculumsource
inoculumsource: activated sludge from biological aeration basin
unit:
location
location: Switzerland (DOM3)
unit:
nitrogencontent
nitrogencontentType: NH₄-N
nitrogencontentInfluent: 24.9
unit: mg/L
originalsludgeamount
originalsludgeamount: 70
unit: mL
oxygendemand
oxygendemandType: Biological Oxygen Demand (BOD5)
oxygendemandInfluent: 320.0
oxygendemandEffluent:
unit: mg/L
phosphoruscontent
phosphoruscontentInfluent: 9.0
phosphoruscontentEffluent:
unit: mg/L
purposeofwwtp
purposeofwwtp: municipal WW
unit:
rateconstant
rateconstantorder: First order
rateconstantcorrected: sorption corrected & abiotic degradation corrected
rateconstantlower: 15.62
rateconstantupper: NaN
rateconstantcomment: r2 = 0.9934
unit: 1 / day
redox
redoxType: aerob
unit:
sludgeretentiontime
sludgeretentiontimeType: sludge retention time
sludgeretentiontime: 9.8
unit: d
solventforcompoundsolution
solventforcompoundsolution1: MeOH
solventforcompoundsolution2: None
solventforcompoundsolution3: None
unit:
sourceofliquidmatrix
sourceofliquidmatrix: none (sludge only)
unit:
temperature
temperatureMin: 20.0
temperatureMax: 20.0
unit: °C
tts
ttsStart: 12.4
ttsEnd: 12.4
unit: g/L
typeofaddition
typeofaddition: plating
unit:
typeofaeration
typeofaeration: shaking
unit:
In the following lines of code, we generalize this process to extract all the metadata of a package. Some lines are commented out to reduce the amount of requests and computation time. The user can download this tutorial on the upper-right corner and test those lines by themselves if desired. The underlying logic can be described as follows:
Declare a
datalist where we will store all the information retrievedLoop over each node on a pathway
Extract all the scenarios
For each scenario, get all the experimental data (additional information) and store it on the data list together with its SMILES, node, scenario and pathway IDs and the scenario description
Create a pandas DataFrame and use it to generate a .csv file with all the extracted data
# data = []
# for path in pkg.get_pathways():
# for node in path.get_nodes():
# scenarios = node.get_scenarios()
# for scenario in scenarios:
# temp_data = {"smiles": node.get_smiles(), "node_id": node.get_id(),
# "scenario_id": scenario.get_id(), "scenario_description": scenario.get_description(),
# "pathway_id": path.get_id()}
# temp_add_info = scenario.get_additional_information()
# for ai in temp_add_info:
# add_info = {ai.name + "_" + key: value for (key,value) in ai.params.items()}
# temp_data.update(add_info)
# data.append(temp_data)
# # save data
# raw_data = pd.DataFrame(data)
# raw_data.to_csv("../assets/additional_information_data.csv", sep='\t', index=False)
raw_data = pd.read_csv("../assets/additional_information_data.csv", sep="\t")
raw_data.head()
| smiles | node_id | scenario_id | scenario_description | pathway_id | acidity_lowPh | acidity_highPh | acidity_acidityType | acidity_unit | biologicaltreatmenttechnology_biologicaltreatmenttechnology | ... | oxygendemand_oxygendemandType | oxygendemand_oxygendemandInfluent | oxygendemand_oxygendemandEffluent | oxygendemand_unit | dissolvedorganiccarbon_dissolvedorganiccarbonStart | dissolvedorganiccarbon_dissolvedorganiccarbonEnd | dissolvedorganiccarbon_unit | volatiletts_volatilettsStart | volatiletts_volatilettsEnd | volatiletts_unit | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | C1=CC(=C(C=C1)N2CCNCC2)Cl | https://envipath.org/package/7932e576-03c7-410... | https://envipath.org/package/7932e576-03c7-410... | no description | https://envipath.org/package/7932e576-03c7-410... | 8.1 | 8.1 | NaN | pH | nitrification & denitrification & biological p... | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | C1=CC(=C(C=C1)N2CCNCC2)Cl | https://envipath.org/package/7932e576-03c7-410... | https://envipath.org/package/7932e576-03c7-410... | no description | https://envipath.org/package/7932e576-03c7-410... | 6.3 | 6.3 | NaN | pH | nitrification & denitrification & biological p... | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2 | C1=CC(=C(C=C1)N2CCNCC2)Cl | https://envipath.org/package/7932e576-03c7-410... | https://envipath.org/package/7932e576-03c7-410... | no description | https://envipath.org/package/7932e576-03c7-410... | 7.1 | 7.1 | NaN | pH | nitrification & denitrification & biological p... | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3 | CC12CCC3C4=CC=C(C=C4CCC3C2CCC1=O)O | https://envipath.org/package/7932e576-03c7-410... | https://envipath.org/package/7932e576-03c7-410... | https://doi.org/10.1023/A:1014117329403 | https://envipath.org/package/7932e576-03c7-410... | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 4 | CC12CCC3C4=CC=C(C=C4CCC3C2CCC1=O)O | https://envipath.org/package/7932e576-03c7-410... | https://envipath.org/package/7932e576-03c7-410... | https://doi.org/10.1023/A:1014117329403 | https://envipath.org/package/7932e576-03c7-410... | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 93 columns
Finally, we use the extracted data to analyze the locations of each experiment in EAWAG-SLUDGE. To do this we map similar locations to a common name, i.e. (Dübendorf, WWTP Duebendorf (ARA Neugut), Switzerland, …) -> Dübendorf, Switzerland
We see that there Dübendorf, Switzerland is the predominant location on our dataset. In the same way, one could analyze other relevant features, such as temperature, pH or half lives