Execute a Manager
In this tutorial, you'll learn how to execute a QM manager and monitor it.
import os
import dotenv
from tqdm.auto import tqdm
import pandas as pd
from qcportal import PortalClient
_ = dotenv.load_dotenv("../../openfractal_test_secrets.env")
Launch a Manager¶
A manager or a worker is responisble to receive jobs from a backend QCFractal server, execute those jobs and report the results to the backend server.
Local¶
A manager is a QM worker that will perform any QM calculations provided by an Openfractal instance.
You first need to create a YAML config file manager_local.yml
:
base_folder: /tmp/qcf_compute
cluster: manager_demo_local_1
loglevel: INFO
logfile: null
update_frequency: 30
server:
fractal_uri: https://openfractal-test-pgzbs3yryq-uc.a.run.app
username: YOUR_USERNAME
password: YOUR_PASSWORD
verify: false
environments:
use_manager_environment: true
conda: []
apptainer: []
executors:
local:
type: local
# Common to all executors.
# Tags are used to filter the tasks that will be sent to the manager.
queue_tags: ["demo_tutorial"]
worker_init: []
scratch_directory: null
bind_address: null
cores_per_worker: 16
memory_per_worker: 16 # GB
extra_executor_options: {}
# Specific options for the local executor.
max_workers: 4
Then you can start a manager with:
qcfractal-compute-manager --config manager_local.yml
SLURM¶
A manager is a QM worker that will perform any QM calculations provided by an Openfractal instance.
You first need to create a YAML config file manager_slurm.yml
:
base_folder: /tmp/qcf_compute
cluster: manager_demo_slurm_1
loglevel: INFO
logfile: null
update_frequency: 30
server:
fractal_uri: https://openfractal-test-pgzbs3yryq-uc.a.run.app
username: YOUR_USERNAME
password: YOUR_PASSWORD
verify: false
environments:
use_manager_environment: true
conda: []
apptainer: []
executors:
slurm:
type: slurm
# Common to all executors.
queue_tags: ["demo_tutorial"]
worker_init: []
scratch_directory: null
bind_address: 127.0.0.1
cores_per_worker: 16
memory_per_worker: 16 # GB
extra_executor_options: {}
# Specific options for the SLURM executor.
walltime: "1:00:00"
exclusive: false
partition: null
account: null
workers_per_node: 7
max_nodes: 1
scheduler_options: []
Then you can start a manager with:
qcfractal-compute-manager --config manager_slurm.yml
Docker¶
With docker-compose
¶
You can use the following docker-compose configuration:
version: "3"
services:
opf_manager:
image: ghcr.io/opendrugdiscovery/openfractal-client:main
command: qcfractal-compute-manager
environment:
# General
QCF_COMPUTE_BASE_FOLDER: /tmp/
QCF_COMPUTE_CLUSTER: manager_demo_1
QCF_COMPUTE_LOGLEVEL: INFO
QCF_COMPUTE_UPDATE_FREQUENCY: 30
# Server
QCF_COMPUTE_SERVER: "{}" # somehow this is needed....
QCF_COMPUTE_SERVER_FRACTAL_URI: https://openfractal-test-pgzbs3yryq-uc.a.run.app
QCF_COMPUTE_SERVER_USERNAME: YOUR_USERNAME
QCF_COMPUTE_SERVER_PASSWORD: YOUR_PASSWORD
QCF_COMPUTE_SERVER_VERIFY: false
# Environment
QCF_COMPUTE_ENVIRONMENTS_USE_MANAGER_ENVIRONMENT: true
QCF_COMPUTE_ENVIRONMENTS_CONDA: "[]"
QCF_COMPUTE_ENVIRONMENTS_APPTAINER: "[]"
# Executors
QCF_COMPUTE_EXECUTORS: '{"local": {"type": "local", "queue_tags": ["demo_tutorial"], "cores_per_worker": 16, "memory_per_worker": 16, "max_workers": 4}}'
Then execute the container with:
# Execute in the background
docker-compose up -d
# Check logs
docker-compose logs -f
# Shutdown the manager
docker-compose down
With docker
¶
It is recommended and often more convenient to use docker-compose
. If you prefer to use docker
:
docker run --rm -ti \
-e QCF_COMPUTE_BASE_FOLDER="/tmp/" \
-e QCF_COMPUTE_CLUSTER="manager_demo_1" \
-e QCF_COMPUTE_LOGLEVEL="INFO" \
-e QCF_COMPUTE_UPDATE_FREQUENCY="30" \
-e QCF_COMPUTE_SERVER="{}" \
-e QCF_COMPUTE_SERVER_FRACTAL_URI="https://openfractal-test-pgzbs3yryq-uc.a.run.app" \
-e QCF_COMPUTE_SERVER_USERNAME="YOUR_USERNAME" \
-e QCF_COMPUTE_SERVER_PASSWORD="YOUR_PASSWORD" \
-e QCF_COMPUTE_SERVER_VERIFY="false" \
-e QCF_COMPUTE_ENVIRONMENTS_USE_MANAGER_ENVIRONMENT="true" \
-e QCF_COMPUTE_ENVIRONMENTS_CONDA="[]" \
-e QCF_COMPUTE_ENVIRONMENTS_APPTAINER="[]" \
-e QCF_COMPUTE_EXECUTORS='{"local": {"type": "local", "queue_tags": ["demo_tutorial"], "cores_per_worker": 16, "memory_per_worker": 16, "max_workers": 4}}' \
ghcr.io/opendrugdiscovery/openfractal-client:main qcfractal-compute-manager
HuggingFace Space¶
Follow the instructions at https://huggingface.co/spaces/hadim/openfractal-client-space/blob/main/README.md.
Monitor the managers¶
client = PortalClient(
address="https://openfractal-test-pgzbs3yryq-uc.a.run.app",
username=os.environ["OPENFRACTAL_USER_5_USERNAME"],
password=os.environ["OPENFRACTAL_USER_5_PASSWORD"],
)
client
PortalClient
- Server: openfractal-test
- Address: https://openfractal-test-pgzbs3yryq-uc.a.run.app/
- Username: monitor_default
# Check connected compute managers (workers)
managers = pd.DataFrame([m.dict() for m in client.query_managers()])
managers
id | name | cluster | hostname | username | tags | claimed | successes | failures | rejected | total_cpu_hours | active_tasks | active_cores | active_memory | status | created_on | modified_on | manager_version | programs | log_ | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 4 | manager_hadrien_local_1-boromir-eb6f7b1c-7db1-... | manager_hadrien_local_1 | boromir | compute_default | [demo_tutorial] | 8 | 2 | 0 | 0 | 0.498862 | 0 | 96 | 96.0 | ManagerStatusEnum.active | 2023-06-23 13:10:14.627308 | 2023-06-23 13:11:15.473101 | 0.50b12.post16+gee831184 | {'psi4': ['1.8'], 'rdkit': ['2023.3.2'], 'open... | None |
1 | 3 | manager_hadrien_local_1-boromir-a0282050-ab6d-... | manager_hadrien_local_1 | boromir | compute_default | [demo] | 6 | 6 | 0 | 0 | 0.052357 | 0 | 0 | 0.0 | ManagerStatusEnum.inactive | 2023-06-23 13:08:36.830162 | 2023-06-23 13:09:04.465528 | 0.50b12.post16+gee831184 | {'openmm': ['8.0.0'], 'rdkit': ['2023.3.2'], '... | None |
2 | 2 | manager_hadrien_local_1-gollum-226c03ff-f315-4... | manager_hadrien_local_1 | gollum | compute_default | [demo] | 24 | 23 | 1 | 0 | 0.184606 | 0 | 0 | 0.0 | ManagerStatusEnum.inactive | 2023-06-23 00:16:20.420013 | 2023-06-23 00:17:00.632159 | 0.50b12.post16+gee831184 | {'openmm': ['8.0.0'], 'rdkit': ['2023.3.1'], '... | None |
3 | 1 | manager_hadrien_local_1-gollum-ef38f2ac-b99a-4... | manager_hadrien_local_1 | gollum | compute_default | [demo] | 230 | 223 | 7 | 0 | 8.106712 | 0 | 0 | 0.0 | ManagerStatusEnum.inactive | 2023-06-23 00:02:43.885753 | 2023-06-23 00:15:11.932844 | 0.50b12.post16+gee831184 | {'openmm': ['8.0.0'], 'psi4': ['1.8'], 'rdkit'... | None |
Monitor your dataset¶
dataset_name = "dataset_demo_4321690179"
ds = client.get_dataset("singlepoint", dataset_name)
ds.dict()
SinglepointDataset(id=5, dataset_type='singlepoint', name='dataset_demo_4321690179', description='my great dataset!', tagline='', tags=['demo_tutorial'], group='default', visibility=True, provenance={}, default_tag='demo_tutorial', default_priority=<PriorityEnum.normal: 1>, owner_user='admin_default', owner_group=None, metadata={}, extras={}, entry_names_=[], specifications_={}, entries_={}, record_map_={}, contributed_values_=None, auto_fetch_missing=True)
Refresh the below often.
print(ds.status_table())
specification complete running -------------------------- ---------- --------- simple_qm_calculation_demo 4 6
progress = True
status = None
fetch_error = True
fetch_wfn = True
records_list = []
for spec_name in tqdm(ds.specification_names, disable=not progress):
record_iterator = ds.iterate_records(
specification_names=spec_name,
force_refetch=True,
fetch_updated=True,
status=status,
)
for _, _, record in tqdm(record_iterator, disable=not progress, leave=False):
if fetch_error:
record.error
if fetch_wfn:
record.wavefunction # type: ignore
record_dict = record.dict()
record_dict["specification_name"] = spec_name
records_list.append(record_dict)
records = pd.DataFrame(records_list)
records = records.sort_values("id")
records = records.reset_index(drop=True)
records
0it [00:00, ?it/s]
id | record_type | is_service | properties | extras | status | manager_name | created_on | modified_on | owner_user | owner_group | compute_history_ | task_ | service_ | comments_ | native_files_ | specification | molecule_id | molecule_ | wavefunction_ | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 15 | singlepoint | False | {'pe energy': 0.0, 'scf dipole': [-0.000133218... | {} | RecordStatusEnum.complete | manager_hadrien_local_1-gollum-ef38f2ac-b99a-4... | 2023-06-23 00:01:50.894704 | 2023-06-23 00:03:46.557663 | admin_default | None | [{'id': 15, 'record_id': 15, 'status': 'Record... | None | None | None | None | {'program': 'psi4', 'driver': 'SinglepointDriv... | 24 | None | {'compression_type': 'CompressionEnum.zstd', '... |
1 | 52 | singlepoint | False | {'pe energy': 0.0, 'scf dipole': [0.1806203110... | {} | RecordStatusEnum.complete | manager_hadrien_local_1-gollum-ef38f2ac-b99a-4... | 2023-06-23 00:01:50.894737 | 2023-06-23 00:06:23.031904 | admin_default | None | [{'id': 52, 'record_id': 52, 'status': 'Record... | None | None | None | None | {'program': 'psi4', 'driver': 'SinglepointDriv... | 37 | None | {'compression_type': 'CompressionEnum.zstd', '... |
2 | 269 | singlepoint | False | {'pe energy': 0.0, 'scf dipole': [-0.012353689... | {} | RecordStatusEnum.complete | manager_hadrien_local_1-boromir-eb6f7b1c-7db1-... | 2023-06-23 13:09:57.599294 | 2023-06-23 13:11:15.156556 | admin_default | None | [{'id': 261, 'record_id': 269, 'status': 'Reco... | None | None | None | None | {'program': 'psi4', 'driver': 'SinglepointDriv... | 131 | None | {'compression_type': 'CompressionEnum.zstd', '... |
3 | 270 | singlepoint | False | {'pe energy': 0.0, 'scf dipole': [-0.121177958... | {} | RecordStatusEnum.complete | manager_hadrien_local_1-boromir-eb6f7b1c-7db1-... | 2023-06-23 13:09:57.599299 | 2023-06-23 13:11:15.254221 | admin_default | None | [{'id': 262, 'record_id': 270, 'status': 'Reco... | None | None | None | None | {'program': 'psi4', 'driver': 'SinglepointDriv... | 127 | None | {'compression_type': 'CompressionEnum.zstd', '... |
4 | 271 | singlepoint | False | None | None | RecordStatusEnum.running | manager_hadrien_local_1-boromir-eb6f7b1c-7db1-... | 2023-06-23 13:09:57.599300 | 2023-06-23 13:10:14.774614 | admin_default | None | [] | None | None | None | None | {'program': 'psi4', 'driver': 'SinglepointDriv... | 130 | None | None |
5 | 272 | singlepoint | False | None | None | RecordStatusEnum.running | manager_hadrien_local_1-boromir-eb6f7b1c-7db1-... | 2023-06-23 13:09:57.599301 | 2023-06-23 13:10:14.774620 | admin_default | None | [] | None | None | None | None | {'program': 'psi4', 'driver': 'SinglepointDriv... | 133 | None | None |
6 | 273 | singlepoint | False | None | None | RecordStatusEnum.running | manager_hadrien_local_1-boromir-eb6f7b1c-7db1-... | 2023-06-23 13:09:57.599302 | 2023-06-23 13:10:14.774626 | admin_default | None | [] | None | None | None | None | {'program': 'psi4', 'driver': 'SinglepointDriv... | 129 | None | None |
7 | 274 | singlepoint | False | None | None | RecordStatusEnum.running | manager_hadrien_local_1-boromir-eb6f7b1c-7db1-... | 2023-06-23 13:09:57.599303 | 2023-06-23 13:10:14.774632 | admin_default | None | [] | None | None | None | None | {'program': 'psi4', 'driver': 'SinglepointDriv... | 126 | None | None |
8 | 275 | singlepoint | False | None | None | RecordStatusEnum.running | manager_hadrien_local_1-boromir-eb6f7b1c-7db1-... | 2023-06-23 13:09:57.599303 | 2023-06-23 13:10:14.774638 | admin_default | None | [] | None | None | None | None | {'program': 'psi4', 'driver': 'SinglepointDriv... | 132 | None | None |
9 | 276 | singlepoint | False | None | None | RecordStatusEnum.running | manager_hadrien_local_1-boromir-eb6f7b1c-7db1-... | 2023-06-23 13:09:57.599304 | 2023-06-23 13:10:14.774644 | admin_default | None | [] | None | None | None | None | {'program': 'psi4', 'driver': 'SinglepointDriv... | 128 | None | None |
row = records.iloc[0]
row["properties"].keys()
dict_keys(['pe energy', 'scf dipole', 'calcinfo_nmo', 'mbis charges', 'mbis dipoles', 'mayer indices', 'mayer_indices', 'return_energy', 'return_result', 'calcinfo_natom', 'calcinfo_nbeta', 'current dipole', 'current energy', 'lowdin charges', 'lowdin_charges', 'mbis octupoles', 'return_hessian', 'scf iterations', 'scf quadrupole', 'scf_iterations', 'calcinfo_nalpha', 'calcinfo_nbasis', 'hf total energy', 'hf virial ratio', 'return_gradient', 'current gradient', 'mbis quadrupoles', 'scf total energy', 'scf_total_energy', 'hf kinetic energy', 'hf total gradient', 'scf_dipole_moment', 'scf_total_hessian', 'scf total gradient', 'scf_total_gradient', 'dd solvation energy', 'hf potential energy', 'mbis valence widths', 'one-electron energy', 'two-electron energy', 'scf iteration energy', 'wiberg lowdin indices', 'wiberg_lowdin_indices', 'pcm polarization energy', 'scf_one_electron_energy', 'scf_two_electron_energy', 'current reference energy', 'nuclear repulsion energy', 'nuclear_repulsion_energy', 'mbis radial moments <r^2>', 'mbis radial moments <r^3>', 'mbis radial moments <r^4>'])
row["wavefunction_"].keys()
dict_keys(['compression_type', 'data_url_', 'compressed_data_', 'decompressed_data_'])