WML vocabulary, in the context of Decision Optimization

Xavier Nodet
2 min readAug 20, 2021

A platform such as IBM Watson Machine Learning offers so many possibilities that it has to define a lot of concepts. One has to be familiar with these concepts to build a mental model of the platform and start using it effectively. So here's an overview of these concepts and how they fit together.

Watson Machine Learning (WML) is the deployment service in Cloud Pak for Data (CPD). It is accessed through its REST API or client libraries such as ibm-watson-machine-learning in Python. It can also be used indirectly through Watson Studio, the development environment of CPD. It allows to run jobs that solve Decision Optimization instances. Example code that does this in Python is available at https://github.com/nodet/dowml.

Each job is defined by a deployment and some transactional data. A deployment belongs to a deployment space and is defined by associating a model and its static data with some hardware specification that specifies the size and number of PODs that will be able to run jobs in this deployment. Creating more than one POD allows to run several jobs in parallel from the same deployment. The size of a POD is defined by letters such as S, M or XL and refers to the hardware allocated to running the jobs.

The transactional data is the set of data that changes from one run of the job to another. It can be inline data, data assets in the deployment space, or references to Cloud Object Storage. Inline input data are submitted in the payload to create the job. Inline output data are returned in the response when asking for the details of the job. Data assets are files stored in the job’s deployment space, and both input and output transactional data can be references to such data assets. References to Cloud Object Storage (COS) are pointers to objects in the Cloud Object Storage service that can be accessed through connections, and they can be used both as input or output of a job.

A deployment space holds references to a set of related assets: models, data assets, connections, deployments, jobs. Connections are objects that store credentials and can be used to refer to e.g. COS objects. A model associates a software specification, a model type and some static data. A software specification references a set of software that will be installed on the PODs that will run the jobs. Examples include do_12.10 and do_20.1. The model type specifies the exact job processor that will run the job. For Decision Optimization, this job processor will run CPLEX, CP Optimizer, docplex or OPL. Static data is the set of all the data that belongs to the definition of the model and doesn’t change with each job. It’s often empty for DO jobs, unlike ML jobs where it holds the data that actually define the trained model.

--

--

Xavier Nodet

Customer Advocate for Decision Optimization at IBM