Skip to content

Step 1: Nodes Collector Processor

You can use the NodesCollectorProcessor class in both Python and the command line:

Available Nodes

  • Compound
  • BioAssay
  • Gene
  • Protein

Python Script Example

You can use the NodesCollectorProcessor class within a Python script as follows:

from chemgraphbuilder.node_collector_processor import NodesCollectorProcessor
from chemgraphbuilder.setup_data_folder import SetupDataFolder


node_type = "Compound"  # Change to "BioAssay", "Gene", or "Protein" as needed
enzyme_list = ['CYP2D6', 'CYP3A4']

# Initialize and setup the data directory before collecting any data
setup_folder = SetupDataFolder()
setup_folder.setup()

# Initialize the collector
collector = NodesCollectorProcessor(node_type=node_type, enzyme_list=enzyme_list, start_chunk=0)

# Collect and process the data
collector.collect_and_process_data()

# Close the connection
collector.close()

Command Line Interface (CLI) Example

You can use the NodesCollectorProcessor class from the command line by executing the script with the necessary arguments:

setup-data-folder
collect-process-nodes --node_type Compound --enzyme_list CYP2D6,CYP3A4 --start_chunk 0 # the default start-chunk is 0

The node_type argument can be one of "Compound", "BioAssay", "Gene", or "Protein", depending on the type of data you want to collect. The enzyme_list should be a comma-separated string of enzyme names. The start_chunk argument is optional and is used when collecting data for the Compound type because the compound data is downloaded as chunks. If you weren't able to download all the data at once, you can continue the download starting from the desired chunk.