Send an LP file from command-line to WML using cpdctl

Xavier Nodet
6 min readSep 20, 2021

--

To facilitate the handling of Cloud Pak for Data (CPD) clusters, the CPD team created a command-line client named cpdctl. This program is available on Windows, Linux (on Intel, Power and Z hardware), and MacOS, and can be downloaded from https://github.com/IBM/cpdctl/releases/.

A number of examples are provided, that describe various use-cases such as promoting and deploying a Python script or an ML model. The reason I decided to roll my own was that I wanted it to:

  • Connect to and use Cloud Pak for Data as a Service rather than a private CPD cluster;
  • Be a fully functional bash script, rather than a list of commands that require copy-pasting of intermediate information from one result to the next command;
  • Launch a Decision Optimization payload, to exercise the 'batch job' feature rather than 'online scoring'.

I won't always go into the details of what each cpdctl parameter means, but you can find this information in the reference documentation. So here it is…

The first step will be to set your credentials and URLs so that cpdctl can connect to the right instance of CPDaaS. Indeed, there are several of them, running in different regions of the world, and I'll use the one located in Germany, hence the 'eu-de' that appear in these URLs.

The only secret you need to connect to an instance of CPDaaS is an API key. The apikey is a secret that identifies the IBM Cloud user. One typically creates one key per application or service, in order to be able to revoke them individually if needed. To generate such a key, open https://cloud.ibm.com/iam/apikeys, and click the blue 'Create an IBM Cloud API key' on the right. For this script, I stored the key in an environment variable.

#!/bin/bashset -e -xcpdctl config user set the_user --apikey $WML_API_KEY  \
--username 'User Name'
cpdctl config profile set wml \
--url https://api.eu-de.cloud.ibm.com
cpdctl config service set ws-de \
--url https://api.eu-de.dataplatform.cloud.ibm.com
cpdctl config context set the_context --profile wml \
--user the_user --watson-studio ws-de
cpdctl config context use the_context

To save on the length of this story, I'm assuming that a deployment space already exists, named 'dowml-space', and that it contains a deployment that we can use. For details about how to create such objects, you can refer to the corresponding pieces of code in the dowml library (functions _find_or_create_space and _create_deployment).

We want to retrieve the id of the space. This is where the JSON parsing capabilities of cpdctl come very handy… The default output format of cpdctl is a table format that's very nice for humans to read, but not so much for tools…

With --output json, you get the raw JSON struct that comes from the service.

$ cpdctl space list --name dowml-space --output json
{
"first": {
"href": "https://api.eu-de.dataplatform.cloud.ibm.com/v2/spaces?name=dowml-space"
},
"limit": 100,
"resources": [
{
"entity": { ... },
"metadata": {
"created_at": "2021-05-07T11:51:02.761Z",
"creator_id": "IBMid-2700042VG2",
"id": "4645be25-e08e-4a34-a475-ff3cd0dc9635",
"updated_at": "2021-05-07T11:54:44.601Z",
"url": "/v2/spaces/4645be25-e08e-4a34-a475-ff3cd0dc9635"
}
}
]
}

Using a JMES query, it is easy to isolate exactly what we want: the metadata.id value in the first element of the resources list. And we want the raw value, without the quotes around it.

$ cpdctl space list --name dowml-space --output json    \
-j "(resources[].metadata.id)[0]" --raw-output
4645be25-e08e-4a34-a475-ff3cd0dc9635

In our bash script, we can therefore include:

SPACE_NAME='dowml-space'
SPACE_ID=`cpdctl space list --name $SPACE_NAME \
-j "(resources[].metadata.id)[0]" \
--output json --raw-output`
if [ "$SPACE_ID" == "null" ]; then
echo "No space exists with name $SPACE_NAME. Exiting..."
exit 1
fi

The command to retrieve the deployment that we want is almost identical:

DEPLOYMENT_NAME='dowml-deployment-cplex-20.1-S'
DEPLOYMENT_ID=`cpdctl ml deployment list --name $DEPLOYMENT_NAME \
-j "(resources[].metadata.id)[0]" \
--space-id $SPACE_ID --output json --raw-output`
if [ "$DEPLOYMENT_ID" == "null" ]; then
echo "No deployment exists with name $DEPLOYMENT_NAME. Exiting..."
exit 1
fi

This deployment was built specifically to run a CPLEX model using version 20.1 of the engine, on a small node. We just need the LP (or MPS, or SAV…) instance that it will solve. Let's upload this file as a data-asset if it doesn't exist already in the space.

ASSET_NAME=afiro.mps
ASSET_ID=`cpdctl asset search --query "asset.name:$ASSET_NAME" \
--type-name data_asset \
-j "(results[].metadata.asset_id)[0]" \
--space-id $SPACE_ID --output json --raw-output`
if [ "$ASSET_ID" == "null" ]; then
# There is no such asset, let's upload one
ASSET_ID=`cpdctl asset data-asset upload --file $ASSET_NAME \
-j "metadata.asset_id" --space-id $SPACE_ID \
--output json --raw-output`
fi

We are now ready to create a deployment job that will read and solve this instance. The JSON payload specifies all the information that the Decision Optimization processor needs in order to read and solve the instance:

DO_JOB_BODY='{
"solve_parameters": {
"oaas.logAttachmentName": "log.txt",
"oaas.logTailEnabled": "true",
"oaas.includeInputData": "false",
"oaas.resultsFormat": "JSON"
},
"input_data_references": [{
"id": "'$ASSET_NAME'",
"type": "data_asset",
"location": {
"href": "/v2/assets/'$ASSET_ID'?space_id='$SPACE_ID'"
}
}],
"output_data": [{"id": ".*\\.*"}]
}'
DEPLOYMENT_JOB_ID=`cpdctl ml deployment-job create \
--name 'cpdctl-deployment-job' \
--deployment '{"id": "'$DEPLOYMENT_ID'"}' \
--decision-optimization "$DO_JOB_BODY" \
-j "metadata.id" --space-id $SPACE_ID \
--output json --raw-output`

We now need two pieces of information from the deployment job: the platform job id, and the run id. We could retrieve them with two different calls to cpdctl, but we can also store the JSON that has both information, and parse it ourselves using jq, the command-line JSON processor.

OUTPUT=`cpdctl ml deployment-job get --job-id $DEPLOYMENT_JOB_ID \
-j "entity.platform_job" --space-id $SPACE_ID \
--output json --raw-output`
JOB_ID=`echo $OUTPUT | jq -r '.job_id'`
RUN_ID=`echo $OUTPUT | jq -r '.run_id'`

The job execution is asynchronous, and we need to wait for it to complete before we can get the results:

cpdctl job run wait --job-id $JOB_ID --run-id $RUN_ID \
-j "entity.job_run.state" --space-id $SPACE_ID \
--output json --raw-output

Let's retrieve the CPLEX log. Because we used inline outputs in the DO job payload (“output_data”: [{“id”: “.*\\.*”}]), all the results of the job can be retrieved directly from the run logs. These logs contain a JSON structure that we'll parse again using jq. Specifically, we want to fetch the content value of the entity in the list decision_optimization.output_data that has an id equal to'log.txt', and base-64 decode that content. Before that, we remove the first line of the output from cpdctl, which is useless to us.

cpdctl job run logs --job-id $JOB_ID --run-id $RUN_ID       \
--space-id $SPACE_ID \
| tail -n +2 \
| jq --raw-output ".decision_optimization.output_data[] | \
select(.id == \"log.txt\") | .content | \
@base64d"

As you can see, jq uses a very neat syntax where the | character is used to feed the results of the previous filter into the next one. Also, it's able to select elements from a list based on field values. Very nice!

Here's an example of the log that you would get:

The last step is to delete the deployment job, now that we don't need it any more.

cpdctl ml deployment-job delete --job-id $DEPLOYMENT_JOB_ID \
--space-id $SPACE_ID --hard-delete true

The whole thing is of course much easier when you use dowml:

$ dowml -c 'solve afiro.mps' wait log delete

But dowml is around 1500 lines of code, and was built specifically to make this task (sending DO instances to CPD/WML) easy, so this is not surprising. The point with this script was to demonstrate how the lower-level and much more versatile cpdctl can do it…

--

--

Xavier Nodet
Xavier Nodet

Written by Xavier Nodet

Customer Advocate for Decision Optimization at IBM

No responses yet