ClusterTools Extension

The ClusterTools extension provides system administrators with insights into the internals of FNZ Studio Cluster and Hazelcast.

When the extension is installed, it adds a new section to FNZ Studio Composition under System Maintenance > Cluster Tools, providing access to the Hazelcast Tools described below. Most pages allow downloading data tables (through the Download option).

The ClusterTools extension does not require any special configuration steps.

Terminology

The terms 'member' and 'node' are usually interchangeable:

Hazelcast uses the term 'member' to refer to nodes connected to a Hazelcast cluster. Every member of a Hazelcast cluster is identified by a 'member UUID'.
FNZ Studio uses the term 'node' to refer to nodes connected to an FNZ Studio cluster. Every node of an FNZ Studio cluster is identified by a 'node name'.

Hazelcast Tools

Hazelcast Members

The Hazelcast Members table displays a list of all members of the Hazelcast cluster.

The following columns are available:

Node Name – Name of the FNZ Studio cluster node.
Member UUID – Unique identifier of the Hazelcast member. Changes every time the node is restarted.
Network Address – IP address and port of the Hazelcast member.
System Time – Time on the system where the Hazelcast member is running.
Node Uptime – Uptime of the FNZ Studio cluster node.

Use this tool to answer the following questions:

Are all cluster nodes up and running?
When has a particular node joined the cluster?
Do all cluster nodes have the same system time?
What is the Hazelcast member UUID of the node with a particular FNZ Studio node name (or vice versa)?

Hazelcast Events

The extension registers a listener in Hazelcast and collects information about events like members joining or leaving the cluster. Events are stored only in memory. If the extension is stopped, all events are lost.

The Hazelcast Events table shows all events recorded on all cluster nodes, not only the events recorded on the current node.

The following columns are available:

Time – System time when the event was recorded.
Type – Type of event, see the full list of events below for details.
Text – Type-specific information about the event.
Node Name – Name of the FNZ Studio node on which the event was recorded.
Member UUID – UUID of the Hazelcast member on which this event was recorded.

Additional information:

The table shows multiple entries for the same event if the event was recorded by multiple cluster nodes.
Use the page context menu action Clear Events to clear all recorded events on all cluster nodes.
The extension stores a maximum of 1000 events (per cluster node). If this limit is reached and a new event is recorded, the oldest event is discarded. Use the extension configuration property events.limit to adjust this limit.
If a Hazelcast member has left the cluster, the events recorded by the ClusterTools extension on this node are no longer available. The only visible entry is the Member Removed event for this member.

The following event types are tracked in the Type column:

Extension Started – Used to track when the ClusterTools extension has been started.
Member Added – A new Hazelcast member has joined the cluster. The text contains the UUID and network address of the new member.
Member Removed – A Hazelcast member has left the cluster. The text contains the UUID and network address of the member.
Client Connected – A Hazelcast client has established a connection to the cluster. The text contains the UUID and network address of the client.
Client Disconnected – A Hazelcast client has closed its connection to the cluster. The text contains the UUID and network address of the client.
Migration Started – A Hazelcast partition migration has been started. The text contains the partition ID and the UUIDs of the old and new member that owns this partition.
Migration Completed – A Hazelcast partition migration has been completed. The text contains the partition ID and the UUIDs of the old and new member that owns this partition.
Migration Failed – A Hazelcast partition migration has failed.
Events Cleared – Used to track when a system administrator has cleared all recorded events.

Use this tool to answer the following question:

Has a node been added or removed from the cluster since the last startup?

Hazelcast Maps

Three tabs providing detailed information on distributed maps are available in the Hazelcast Maps window:

Hazelcast Map States (see below)
Hazelcast Map Operations (see below)
Hazelcast Map Configurations (see below)

Hazelcast Map States

The Hazelcast Map States table displays a list of all distributed maps currently registered on the local Hazelcast cluster node and provides information on their respective states.

The following columns are available:

Map – Human readable name of the Hazelcast map.
Hazelcast Name – Technical name of the map in Hazelcast.
Node – The name of the node on which the listed map partition is stored. In addition to specific partitions of single maps on single nodes, the list also contains aggregate rows. These rows list the sums (over nodes and maps) of each statistic and each statistic averaged over all nodes.
Owned Entries (Count) – Number of objects stored in this distributed map.
Owned Entries (Size) – Amount of memory used by objects in this distributed map.
Backup Entries (Count) – Number of backup objects in this distributed map.
Backup Entries (Size) – Amount of memory used by backup objects in this distributed map.
Dirty Entries (Count) – Number of objects which have been modified in memory and need to be persisted.
Locked Entries (Count) – Number of currently locked locally owned keys.
Near Cache Entries (Count) – Number of objects in the near cache of this distributed map. This value is usually between 0 and the total number of objects (cluster-wide) in the distributed map. The value [disabled] means that the near cache has been disabled for this map.
Near Cache Hit Rate – The percentage of near-cache hits (of all near-cache access operations) for the listed map on the listed node. In parentheses, the numbers of near-cache hits and near-cache misses are listed.

Additional information:

Backups are created only if there is more than one cluster node. On single-node installations, the column Backup Entries (Count) is always 0.
Use the context menu action Clear Near Cache available for each row to flush the according near cache:
- For rows that list a specific map and a specific node, this action clears the near cache of that map on that node.
- For rows showing aggregate data, this action clears all near caches contributing to the data shown by that row.
Alternatively, use the page context menu action Clear all Near Caches to flush all near caches of all distributed maps.

Important! Use the Clear Near Cache / Clear all Near Caches features with caution! Clearing a heavily used near cache may have a negative impact on system performance.

Use this tool to answer the following questions:

How many objects does a distributed map contain?
How much memory does a distributed map need? (not including near cache)
Does the distributed map contain dirty entries which need to be persisted (saved to disk or database)?

Hazelcast Map Operations

The Hazelcast Map Operations table displays a list of all distributed maps currently registered and provides information on the operations performed on them.

The following columns are available:

Map – Human readable name of the Hazelcast map.
Node – The name of the node on which the listed map partition is stored. In addition to specific partitions of single maps on single nodes, the list also contains aggregate rows. These rows list the sums (over nodes and maps) of each statistic.
Sum – The total number of Put, Get, Remove and other operations.
Puts – The number of Put operations.
Gets – The number of Get operations.
Removes – The number of Remove operations.
Other Operations – The number of other operations (Map Size operation, Contains operation etc.).
Events – The number of events received (Near Cache Invalidation, Partition Lost etc.).
Hits – The number of hits (reads) of the locally owned entries.
Avg Put Latency – The average latency of Put operations.
Avg Get Latency – The average latency of Get operations.
Avg Remove Latency – The average latency of Remove operations.

Use this tool to answer the following questions:

Do specific operations take particularly long on a certain distributed map (check the Avg Latency columns)?
Is there an unexpectedly high number of operations recorded for a distributed map that needs to be investigated?

Hazelcast Map Configurations

The Hazelcast Map Configurations table shows the eviction and near cache configurations for every distributed map.

The following columns are available:

Map – Human readable name of the Hazelcast map.
Persisted – Indicates if the entries in this distributed map are persisted to Cluster Storage.
Eviction Enabled – Indicates if the eviction is enabled on the map.
TTL – Time to live for the entries in the distributed map. 'Time to live' is the time since the last write access was performed. The time is expressed in seconds.
Max Idle – Max idle time for the entries in the distributed map. 'Idle time' is the time since the last read or write access was performed. The time is expressed in seconds.
Max Size – Max number of entries in the distributed map per node.
Near Cache Enabled – Indicates if a near cache is enabled on the map.
Near Cache TTL – Time to live for the entries in the near cache. 'Time to live' is the time since the last write access was performed. The time is expressed in seconds.
Near Cache Max Idle – Max idle time of entries in the near. 'Idle time' is the time since the last read or write access was performed. The time is expressed in seconds.
Near Cache Max Size – Max number of entries in the near cache per node.

Hazelcast Locks

Two tabs providing detailed information on distributed locks are available in the Hazelcast Locks window:

Hazelcast CP Locks (see below)
Hazelcast Map Locks (see below)

The following columns are available for both tabs:

Node – Name of the FNZ Studio node where the distributed lock object is managed, or where the backup of this lock is stored.
Holder Node – Node name where this lock has been acquired and is currently held.
Holder Thread ID – ID of the Java thread which has acquired this lock.
Acquire Time – Time at which this lock has been acquired.
Expiration Time – Time at which this lock will expire.
Age – Age of the lock (duration since it was acquired).
Lock Count – Number of lock operations on the same lock object by the thread currently holding this lock.

Consider this additional information:

Locks always appear twice in this list:
- One entry is created by the node that is managing the distributed lock object.
- A second entry is created by the node that is keeping the backup copy of the distributed lock.
  
  Note: In a cluster with more than two nodes, it can happen that the thread that acquires the lock is running on a node that manages neither the primary, nor the backup copy of the lock object.
To force the release of a Hazelcast lock, use the context menu action Force Unlock.

Important! Use the Force Unlock feature with caution! Releasing a lock may lead to lost updates.

Use this tool to answer the following questions:

Are there any long-lasting distributed locks which could lead to performance problems?
Which thread on which cluster node is currently holding a distributed lock?

Hazelcast CP Locks

The Hazelcast CP Locks table contains information about locks in Hazelcast CP subsystem. CP stands for 'consistent and partition tolerant' in the CAP theorem. This type of system provides strong consistency guarantees in case of network partitioning, but sacrifices availability.

Additional columns specific to CP Locks are:

Lock Name – Technical name of the lock in Hazelcast.
Remaining Lease Time – Time until this lock is automatically released.

Hazelcast Map Locks

The Hazelcast Map Locks table contains information about locks attached to a specific key of a specific distributed map.

Additional columns specific to Map Locks are:

Map – Human-readable name of the Hazelcast map.
Hazelcast Name – Internal name of the Hazelcast map.
Lock Key – Key to which the lock is attached in a map.
Copy – Shows whether the lock is a primary copy or a backup copy.

Long-lasting Locks

If a lock is held for a long time, perform the following steps to understand what the thread is currently doing:

Write down the Holder Node and Holder Thread ID.
Go to Solution Maintenance > Processes > Java Threads and filter by the holder node and holder thread ID in the Node Name and Id columns.
Right-click on the thread and select View Stack Trace from the context menu to get more information about this thread.

If the thread does not exist any more, or if it is outside of any code section where it could hold the lock, it is very likely that the thread did not release the lock or that the release operation has failed (maybe due to a temporary network problem). Consider using Force Unlock to unlock this lock.

JMX Monitoring

FNZ Studio installs several JMX MBeans specific to Hazelcast maps. Every MBean publishes information about a single distributed Hazelcast map. The MBean name has the form com.nm:type=Cluster,context=HazelcastMaps,name=[MapName].