Step 1: Nodes Collector Processor
You can use the NodesCollectorProcessor
class in both Python and the command line:
Available Nodes
Compound
BioAssay
Gene
Protein
Python Script Example
You can use the NodesCollectorProcessor
class within a Python script as follows:
from chemgraphbuilder.node_collector_processor import NodesCollectorProcessor
from chemgraphbuilder.setup_data_folder import SetupDataFolder
node_type = "Compound" # Change to "BioAssay", "Gene", or "Protein" as needed
enzyme_list = ['CYP2D6', 'CYP3A4']
# Initialize and setup the data directory before collecting any data
setup_folder = SetupDataFolder()
setup_folder.setup()
# Initialize the collector
collector = NodesCollectorProcessor(node_type=node_type, enzyme_list=enzyme_list, start_chunk=0)
# Collect and process the data
collector.collect_and_process_data()
# Close the connection
collector.close()
Command Line Interface (CLI) Example
You can use the NodesCollectorProcessor
class from the command line by executing the script with the necessary arguments:
setup-data-folder
collect-process-nodes --node_type Compound --enzyme_list CYP2D6,CYP3A4 --start_chunk 0 # the default start-chunk is 0
The node_type
argument can be one of "Compound"
, "BioAssay"
, "Gene"
, or "Protein"
, depending on the type of data you want to collect.
The enzyme_list
should be a comma-separated string of enzyme names.
The start_chunk
argument is optional and is used when collecting data for the Compound
type because the compound data is downloaded as chunks.
If you weren't able to download all the data at once, you can continue the download starting from the desired chunk.