Cluster Shutdown

This document illustrates a safe procedure to shut down large clusters.

There are two fundamentally different ways to stop an FNZ Studio cluster:

(a) One node after the other
(b) All nodes at the same time (highly recommended)

Each approach has different advantages and disadvantages, outlined below. We highly recommend the second approach — that is, shutting all nodes down at the same time — using one of the three options covered in point b below.

(a) [Unrecommended] Shutting down a large cluster: One node after the other

While this approach is possible, it is not recommended due to the following issues:

Duration — It takes a long time, as you need to wait for a node to stop before continuing with the next node, and so on.
Memory — It requires rebalancing data in memory several times, namely once per node, except for the last node.
Race conditions — This approach is prone to race conditions: one of the still-running nodes might trigger a job which could interfere with the stopping node.

(b) [Recommended] Shutting down a large cluster: All nodes at the same time

This second approach is a lot cleaner and faster because the in-memory data layer is disconnected simultaneously on all nodes. No rebalancing of the data in memory is therefore required.

Furthermore, larger installations may necessitate the use of approach (b); approach (a) requires that a single FNZ Studio node can hold all the data in the distributed maps.

The one disadvantage of this approach is that it requires explicitly triggering an FNZ Studio cluster shutdown before stopping the application servers.

Triggering a Cluster shutdown — All nodes at the same time

Publishing a shutdown sends a shutdown command to all cluster nodes to initiate an FNZ Studio-internal shutdown procedure on every node. HTTP requests sent to FNZ Studio after this moment are blocked.

Each node stops its services and then prints the following message in the log files: "FNZ Studio will soon be stopped, but the Application Server might continue running..."

Note that JVM shuts down by default when the FNZ Studio cluster is shut down. You can revert the default JVM shutdown behavior by setting the nm.cluster.shutdown.jvm.stop configuration property to false.>

There are three options available to implement a shutdown :

Use the Appway REST service to publish a shutdown. To do so, call ${baseURL}/rest/cluster/shutdown on any node. More information can be found in the REST interface documentation.
Use the ClusterServiceInfo JMX bean to publish a shutdown.
- Connect to the JMX management server running on any node
- Go to the ClusterServiceInfo JMX bean under com.nm/Cluster
- Call publishShutdown()
Use an Appway script to publish a shutdown. Use the Interactive Script Editor to call the following Script Function: System:ShutdownCluster()

A good example of automation already in practice is that of implementing approach (b) with option 2 (calling the JMX bean):

A newly-created UNIX script wraps the application server stop script
This new script first calls JMX to publish a shutdown
The script then waits until the debug log on all nodes shows "Appway will soon be stopped, but the Application Server might continue running..."
After this log message is produced on every node of the cluster, it calls the application server stop script.