System Migration: Single Node To Multi Node

This guide explains the steps necessary to migrate from a single-node to a multi-node Platform installation. Areas which require particular attention include job scheduling, caching behavior and file management.

Installation and Configuration

Install an empty Platform on each node to be added to the cluster. To do so, follow the instructions contained in the System Installation guide for your Platform version.

Make sure that the Platform version and any extensions installed on the additional nodes are identical to the software on the original node.

Also make sure that the load balancer (e.g. Apache, Websphere,...) is set to sticky sessions. This depends on the user's IP and port. Statefull vs stateless

Adapt App for Cluster Setting

Most apps require changes to work in an environment with multiple nodes. This chapter outlines which aspects you should pay attention to and how to best adapt your app depending on your needs:

  1. Locally stored files may need to be made available to the whole cluster.
  2. Scheduled jobs run on all nodes by default, but it is possible to run them on one node only, or to split a job among all nodes.
  3. The Platform allows you to cache data either locally on each node or in a cluster-wide cache available to all nodes.

Local and Cluster Files

Processes often involve saving and accessing files specific to your solution such as PDF documents or images. In versions of Appway prior to 6.0, these files were stored locally within FNZ Studio's Data Home folder ("[Local file system]/data/"). From 6.0 onwards, files can either still be stored locally on one node, or they can be made available for all nodes.

When switching to a multi-node system, decide which files need to be available throughout the cluster, and which ones can remain local:

  • Local files: Temporary files which only live during one session do not need to be distributed. For example, these include files used to create a PDF document, to convert an image, or to upload a file to a database. Temporary files saved on one node are available during the user's entire HTTP session, thanks to sticky sessions from the load balancer.

  • Distributed files: Files which need to be stored for longer than a single user session, e.g. until the end of a process instance, must be distributed. There are two options to make such files available cluster-wide:

    1. Adapt your app to use the Cluster File Service Cluster Files are automatically distributed over all nodes. They are also automatically replicated to guarantee high availability.
      Advantage: Recommended approach to implement distributed files. No specific infrastructure set-up is required.
      Disadvantage: Requires adapting the existing app model by replacing file calls with "Cluster File" objects.
    2. Move the files to a shared directory. Implement the directory containing files to be distributed as a shared directory. For example, you can use a mounted network directory.
      Advantage: No changes to the app model required.
      Disadvantage: The infrastructure set-up is more complex, with an additional shared folder to manage.

    See the following subsections for more details on implementing these options.

Adapting your App to Use the Cluster File Service

Follow these steps to replace references to files in your app with Cluster File objects:

  1. Run a global search to find all occurrences of functions which affect files (6.x: Administration > System > Global Search, 7.x: Solution Design > Search). These functions can be in Camel case (AppendToTextFile) or upper case (APPENDTOTEXTFILE).
    • AppendToTextFile
    • ReadTextFile
    • ReadTextFileLine
    • WriteProcessMessageFile
    • WritePropertiesFile
    • WriteTextFile
    • CopyFile
    • MoveFile
    • CompressFile
    • DeleteFile
    • CreateDirectory
    • MoveDirectory
    • CopyDirectory
    • DeleteDirectory
    • Unzip
    • WriteBinaryData
    • java.io.File
  2. Alternatively, you can also stop the Platform and explore the local file system to identify files created by your app. Non-system files are normally located in a sub-directory of the Data Home, for example: <Local file system>/data/myfiles/… In Studio Composition, run a global search for each file to find out where it is used in your app. For example, look for /data/myfiles/myExampleFile.pdf.
  3. Refactor all locations where files are created or accessed, using the Cluster File Primitive Type to represent files. For more information, see Data Management: Files.

Moving Files to a Shared Directory

Follow these steps to move local app files to a shared directory:

  1. Stop the Platform and explore its Data Home on the local file system (<Local file system>/data/). Identify all local directories currently containing files created by your app. Non-system files are normally located in a sub-directory of the Data Home, for example: <Local file system>/data/myfiles/…
  2. Create a shared top-level directory. Example: <Local file system>/data/nfs/
  3. Create one sub-directory for each directory containing local app files (those identified in Step 2). Examples: <Local file system>/data/nfs/myfiles/<Local file system>/data/nfs/foobar/
  4. Create one Data Home for each node. Example: <Local file system>/data/
  5. In each node's Data Home, mount every shared sub-directory (see Step 4) in the appropriate place.

Job Scheduling

By default, scheduled jobs run on all nodes in a multi-node system. You can however configure a job to run on one node only, or split it among all nodes.

Scheduling Behavior

For each app-specific job, verify whether it can be run on all nodes, or whether it should be run on a single node only.

By default, scheduled jobs run on all nodes. The Script Execution Job extension allows you to configure jobs to run on one node only:

  1. In Studio Composition, go to Extensions > ScriptExecutionJob (right-click) > Edit Configuration.
  2. Add the property scriptjob.[job#].onenodeonly with the value true. For example: scriptjob.1.onenodeonly = true The number '1' refers to a specific job. This property can be configured for each scheduled job.
Note on Custom Extensions (SDK)
Each job implemented by a custom extension is also run on all nodes by default. To implement a job that is executed on one node only in an extension, use of the following service method:n`boolean JobExecutionService.shouldExecute(String jobName);`nA job using this method will still be run on all nodes. However, depending on the result of the method call, the execution is stopped before any real work is performed. If the method returns `true`, the job is actually executed. If the method returns `false`, the execution is aborted immediately.

Task Division

Some jobs need to perform a lot of work. In this case, it is advisable to split the work into smaller pieces and execute each piece on a different node.

The concept of "locally owned data" enables you to create such a task division. You can create a job which runs on all nodes, but only performs work on the data owned by the current node.

For example, you can implement a job which loops over all Process Instances, but only works on the Process Instances owned by the current node.

Use the following function in the job's script to get only the Process Instances owned by the node it runs on:

Copy
CONTEXT().getWorkflowInstanceService().getLocalWorkflowInstanceIds() 
//or
CONTEXT().getWorkflowInstanceService().getLocalWorkflowInstanceIds(predicate)

Example:

Copy
Indexed String $localPids := ToIndexed(CONTEXT().getWorkflowInstanceService().getLocalWorkflowInstanceIds(), String);
Integer $count := 0;
ForEach String $pid In $localPids Do
   PRINTLN('Do something useful with process instance ' & $pid & ' here...');
   $count := $count + 1;
End 
PRINTLN('Number of process instances on this node: ' & $count);

Example job script dividing a task among nodes

Every process instance is randomly assigned to a Hazelcast partition. The assignment is performed by Hazelcast while loading the data and is based on a hash algorithm. The default number of partitions is 271. Each partition, in turn, is then assigned to a cluster node. Every cluster node is therefore responsible, i.e. "owns", a set of Process Instances.

Since this job is executed on every node and every node processes all of the Process Instances it owns, in the end all Process Instances are processed.

Caching Behavior

The Platform Local Application Cache allows you to cache values that are expensive to compute. A Cluster Cache is also available for values that should only be computed once for the whole cluster instead of once for every node.

To reduce the effort in adapting your app, you can generally keep using the Local Application Cache. However, you must switch to the Cluster Cache for values that are not allowed to be computed more than once.

Description Local Application Cache Cluster Cache
Every node computes its own cached value. It is stored in each node's Local Application Cache. nnValues in the Local Application Cache can only be looked up from the current node. nnLookup is slightly cheaper than accessing the Cluster Cache. Only one node computes the value and stores the result in the Cluster Cache. nnValues in the Cluster Cache are available on all nodes.
Use when the cached value…
  • … can be computed multiple times.
  • … is not too expensive to compute.
  • … cannot be computed more than once (e.g. because of service call restrictions)
  • … is very expensive to compute.
Relevant built-in functions
  • CacheStore
  • CacheLookup
  • CacheNames
  • CacheRemove
  • CacheClear
  • ClusterCacheStore
  • ClusterCacheLookup
  • ClusterCacheNames
  • ClusterCacheRemove
  • ClusterCacheClear

The following examples illustrate the difference between both types of caches.

Local Application Cache:

  1. Call the following expression on Node 1 of your system: CacheStore('ExpensiveToComputeA', 1 + 1)

  2. Then, call the following expression on Node 2: CacheStore('ExpensiveToComputeB', 2 + 2)

  3. Finally, look up both cached values by calling this expression on Node 1:

    Copy
    PRINTLN('A: ' & TOSTRING(CacheLookup('ExpensiveToComputeA')));
    PRINTLN('B: ' & TOSTRING(CacheLookup('ExpensiveToComputeB')));

    This only returns value A (2), since value B (4) is cached on Node 2 only.

Cluster Cache:

  1. Call the following expression on Node 1: ClusterCacheStore('VeryExpensiveToComputeA', 1 + 2)

  2. Call the following expression on Node 2: ClusterCacheStore('VeryExpensiveToComputeB', 1 + 3)

  3. Look up both cached values on Node 1:

    Copy
    PRINTLN('A: ' & TOSTRING(ClusterCacheLookup('VeryExpensiveToComputeA')));
    PRINTLN('B: ' & TOSTRING(ClusterCacheLookup('VeryExpensiveToComputeB')));

    This returns both value A (3) and value B (4), since both are stored in the Cluster Cache and thus accessible from all nodes.