Troubleshooting v4
Authorization file not found. Is the local agent running?
If you invoke an Failover Manager cluster management command and Failover Manager isn't running on the node, the efm
command displays an error:
Not authorized to run this command. User '<os user>' is not a member of the `efm` group.
You must have special privileges to invoke some of the efm
commands documented in Using the efm utility. If these commands are invoked by a user who isn't authorized to run them, the efm
command displays an error:
Notification; Unexpected error message
If you receive a notification message about an unexpected error message, check the Failover Manager log file for an OutOfMemory
message. Failover Manager runs with the default memory value set by this property:
If you're running with less than 128 megabytes allocated, increase the value and restart the Failover Manager agent.
Confirming the OpenJDK version
Failover Manager is tested with OpenJDK. We strongly recommend using OpenJDK. You can use the following command to check the type of your Java installation:
Note
There's a temporary issue with OpenJDK version 11 on RHEL and its derivatives. When starting Failover Manager, you might see an error like the following:
java.lang.Error: java.io.FileNotFoundException: /usr/lib/jvm/java-11-openjdk-11.0.20.0.8-2.el8.x86_64/lib/tzdb.dat (No such file or directory)
If you see this message, the workaround is to manually install the missing package using the command sudo dnf install tzdata-java
.
Unexpected connection attempts from outside the cluster
If an external process tries to connect to an agent on the bind.address
port, Failover Manager logs a warning containing the source of the connection attempt. These warnings don't affect the Failover Manager cluster. However, you can use the source address to stop or configure the outside process to not try to connect to a Failover Manager agent. The following is an example of the message that appears when something outside of the cluster attempts to connect to the agent process from <source_address>
:
If you're running an agent with an address that used to be part of a different cluster, the original cluster might still be trying to connect to this address to re-form the cluster. In this example, the cluster oldcluster
is still trying to connect to an address that's now part of newcluster
:
The cluster name and <source_address>
information can be used to find the original cluster. Using the efm reset-members command with that cluster should clear the address from its cache.