Facilitating the sharing of SPM jobs across multiple machines
The goal is to create a system to allow the user to use Salford Predictive Modeler (SPM) to build batteries of predictive models on multiple machines in an automated fashion (in parallel) and then collect model settings, performance statistics, and other like data into a PostgreSQL database.
I have an example that will run six models on two machines residing in Azure and then send the aforesaid performance statistics and model settings to a Postgres database residing on a third Azure machine. At the present time, this all happens under Linux, but in theory, it could happen on Windows as well, as will as any other computing platforms Minitab may choose to support in the future.
The instructions that follow are for Linux. Thus far, the distributions employed have been Fedora, Ubuntu, and OpenSUSE.
docs/Howto.md is a detailed document explaining how to set up a distributed
job using this system. The PDF version is docs/Howto.pdf. It will get
simpler.
- SPM 8.3 (non-GUI), available from Salford Systems.
- JobScheduler 1.12.9 You want the main program (jobscheduler_*), and the JOC Cockpit (joc_*).
- SPM Model Database
- JobScheduler Universal Agent
- The Java Development Kit, version 8 or higher OpenJDK works fine and probably comes with your Linux distro.
- Perl
- An FTP server. I've been using vsftpd.
- A supported relational database system to work with JobScheduler. I have been using MariaDB, but PostgreSQL should work as well.
- A modern, standard web browser (I have been using Firefox).
- libcanberra-gtk (name will vary somewhat, depending on the distro).
These instructions are Linux specific, but the principles will be similar under Windows. An X display is required to install JobScheduler and JOC Cockpit.
-
If it is not already installed, install the JDK.
-
Install and configure the MariaDB server. Run
sudo mysqland enter the following commands:
> ALTER USER 'root'@'localhost' IDENTIFIED WITH mysql_native_password BY ' <password> ';
> FLUSH PRIVILEGES;
> set @wait_timeout = 31536000;
> CREATE DATABASE scheduler;
> USE scheduler;
> exit
<password> is the password for the root role, as previous entered when configuring the server.
- Install JobScheduler by extracting the archive and running
setup.shinjobscheduler.1.12.9. Do not run it as root. If the script fails because it fails to connect to the X server (a problem I have had repeatedly), run Java directly. The command will be displayed by the script and will look something like this:
sudo "java" -jar "./jobscheduler_linux-x64.1.12.9.jar"
After entering your password, the GUI will appear and you will be able to install. On step 8, change the value for Allowed Host to 0.0.0.0. On step 13, set the host to 127.0.0.1 (localhost), the user to "root" and the password to the previously set password for the root role.
- Enable JobScheduler to run as a service. The startup script is
/opt/sos-berlin.com/jobscheduler/ <hostname> _40444/bin/jobscheduler.sh. Copy it into/etc/init.d, removing the.shextension. If one is running SysVInit (traditional Linuxinit) then one can simply add the usual links to the usualrc?.ddirectories. If instead one is runningsystemd(as most modern Linux distributions now do), then you will need to add it as a service there. Under Debian and such derivatives as Ubuntu, one can run the following command from/etc/init.d:
sudo update-rc.d jobscheduler defaults
Otherwise, one can run the following command:
sudo systemctl start jobscheduler
And systemd will automatically incorporate the new service before launching it.
-
Install the JOC Cockpit in the same manner as the JobScheduler. On step 6, set Host to 127.0.0.1, Database to "scheduler", and username and password the same as for JobScheduler.
-
Reboot the machine.
-
Assuming all has gone well, you should be able to point your browser to
http://localhost:4446and log in as root (lame default password is "root"). Change the password to something reasonable and while you're at it, create a non-privileged account to do work from (and give it a decent password too).
-
Install all software required by SPM Model Database, except for the PostgreSQL server (install the client instead). Install
addgrvto a directory in the path such as/usr/local/bin. -
Install the JDK.
-
Unpack the agent archive. Move the directory
jobscheduler_agentto the home directory of the account from which the agent will be run and rename it tojos. -
Copy
sos_jos.sh(in the root of this repository) to/etc/init.d. Modify the script, as necessary, by settingUSERto the name of the unprivileged account that will run the agent. -
Make
sos_josa service in the same manner described for the JobScheduler server. -
Reboot the machine.
-
Assuming all has gone well, the agent will be running. To check (assuming you are running
systemd), type:
systemctl status sos_jos
If it isn't, try starting it manually, like so:
sudo systemctl start sos_jos
Then check again.
Normally, this will be the same as the master machine, but may be separate. The slave machines need to be able to write to it, which may not be possible if the master is on your local network behind a firewall and the slave machines are not.
-
Install the PostgreSQL server. The exact package will vary depending on the Linux distribution employed.
-
As the PostgreSQL user (usually
postgres), create the role that will own the databsase (johnin our example). This can be done with thecreateuserutility, or in thepsqlinterpreter with theCREATE USERcommand. -
In
postgresql.conf, setlisten_addressto whatever addresses should be listening for database connections. By default, PostgreSQL will only listen for connections originating on the local machine. -
If the username on the slave machines does not match the name of the Postgres account that owns the database, a mapping will need to be created in
pg_ident.conf. In our current example, we have:
# MAPNAME SYSTEM-USERNAME PG-USERNAME
agent jobscheduler john
- Each client machine must be entered into
pg_hba.conf. Here are mine:
host all all 52.162.218.151/32 ident map=agent
host all all 168.62.104.141/32 ident map=agent
I chose to use ident as the authentication protocol, but there are other options that might work better for you. If you use it, then make sure that an ident service is installed and running on each client machine. See the Postgres documentation for configuration details. The raw IP addresses are used because it appears that Azure does not support reverse DNS.
-
As the user created to own the model database, create it (
spmin our example). This can be done with thecreatedbutility or inpsqlwith theCREATE DATABASEcommand. -
Test your configuration by trying to log in to the database from each of the slave machines, using the
psqlutility. Something like the following should work:
psql -h <hostname> -U <username> -d spm
If you get a prompt without error messages, it works. Exit with the \q command.
Copy the contents of Example/automate in this repository to your home directory on the master machine.
Copy the contents of Example/JOS-Config/automate_example_stream to /opt/sos-berlin.com/jobscheduler/ <hostname> _40444/config/live.
Edit the following configuration files, replacing the existing hostnames with yours:
In Example/JOS-Config:
- agent1.process_class.xml
- agent2.process_class.xml
In both cases, change the value of remote_scheduler to the name or IP address of the appropriate slave machine.
In Example/JOS-Config/automate_example:
- transfer_cmd1.job.xml
- transfer_cmd2.job.xml
- transfer_data1.job.xml
- transfer_data2.job.xml
In all four cases, change the value of target_host to the name or IP address of the appropriate slave machine. Use the name specified in agent1.process_class.xml in transfer_cmd1.job.xml and transfer_cmd2.job.xml. Use the name specified in agent2.process_class.xml in transfer_data1.job.xml and transfer_data2.job.xml. Also, change the usernames and passwords to the ones you are actually using.
With your web browser, log into https://localhost:4446. Assuming all is well, you will see something like this:
Click on "JOB STREAMS" and you will see:
Now, click on "automate_example_stream". You will see:
Now, click on "Import Job Stream".
Click on "CHOOSE FILES TO UPLOAD".
Select Example/JOS_Config/jobstream.json.
Check "Job Name", causing all boxes to be checked.
Click on Import, and you should the diagram below:
Each box represents a job and the arrows show the dependencies. The jobs are as follows:
-
BosTN2 starts the sequence off. At present, all it does is wait for a second.
-
transfer_data1 copies the input dataset to the first agent. transfer_data2 does so to the second agent.
-
transfer_cmd1 and transfer_cmd2 copie the requisite command files to the respective agent servers. The contents of projects/automate/cmd1 are copied to the first agent and the contents of projects/automate/cmd2 to the second.
-
build1 and build2 build the models on the respective agent servers, move the command files to the archives directory, and send the model information to the PostgreSQL database.
-
modstats creates a report on the models built thus far.
To start the stream, click on the elipsis next to the initial job (BosTN2) and select "Start Job Now".
The blocks for the jobs running will turn green, like so:
When all of the blocks are again yellow, the job is finished. Assuming everything ran successfully, the task history will look something like this:
In the current example, the output file will be $HOME/projects/automate/bostn2__stream_perf.csv.
In LibreOffice, it looks like this:











