Overview
You are experiencing high CPU usage on the Exinda and see "Error retrieving data' messages. There are over 100 missing tables in MySQL. Reports are also not generating.
You may be experiencing the following error messages from report configuration failures, server is locked out with Watchdog restarting the server, mysql fails to terminate, and there are signs of database corruption.
Report configuration failures:
Jun 21 17:49:11 Exinda configd: [configd.ERR]: mdc_get_binding_children_ex(), mdc_main.c:493, build 1: Error code 14001 (unexpected NULL) returned
Jun 21 17:49:11 Exinda configd: [configd.ERR]: CPUUtilization(), alarm/AlarmManager.cpp:52, build 1: Error code 14001 (unexpected NULL) returned
Jun 21 17:49:25 Exinda httpd: [Mon Jun 21 17:49:24 2021] [warn] [client <IP address>] Timeout waiting for output from CGI script /opt/tms/lib/web/cgi-bin/launch, referer: https://<ip address>/admin/launch?script=rh&template=report-config&tabsel=pdfreport
Jun 21 18:12:49 Exinda mgmtd:[mgmtd.NOTICE]: Async: timed out getting external response for type query_request sess
Server is fully locked out and Watchdog restarts the server. There is a failure to terminate the mysqld process:
Jun 21 21:51:49 Exinda httpd[number]: [Mon Jun 21 21:51:49 2021] [notice] caught SIGTERM, shutting down
Jun 21 21:51:49 Exinda pm[number]: [pm.NOTICE] : Terminating process jboss )JBoss server responsible for serving up the AMS pages)
Jun 21 21:51:51 Exinda hardwared[number]: [hardwared.NOTICE]: SEL event : 1 | 06/22/2021 | 02:51:44 | Watchdog2 #0xca | Timer interrupt () | Asserted
Jun 21 21:51:51 Exinda hardwared[number]: [hardwared.NOTICE]: SEL event : 2 | 06/22/2021 | 02:51:44 | Watchdog2 #0xca | Time expired () | Asserted
Jun 21 21:52:28 Exinda pm[number]: [pm.NOTICE]: Terminating process mysqld
Jun 21 21:52:28 Exinda pm[number]: [pm.WARNING]: Failed to kill process mysqld (pid 2722) with SIGTERM, trying SKIGKILL next
Jun 21 21:52:28 Exinda pm[number]: [pm.NOTICE] Process mysqld (pid 2722) terminated from signal 9 (SIGKILL)
Jun 21 21:52:28 Exinda pm[number]: [pm.NOTICE] Terminating process statsd (Statistics Daemon)
There are signs of corruption in the database:
Jun 21 21:49:15 Exinda mysqld_init[number]: [mysqld_init.NOTICE]: Database is currently version 27, want to get to version 27
Jun 21 21:49:15 Exinda mysqld_init[number]: [mysqld_init.NOTICE]: Database has table with name 'DATABASECHANGELOG'
Jun 21 21:49:15 Exinda mysqld_init[number]: [mysqld_init.NOTICE]: Database has table with name 'DATABASECHANGELOGLOCK'
Jun 21 21:49:15 Exinda mysqld_init[number]: [mysqld_init.WARNING]: An error occurred launching liquidbase. Launch result: 0 Process result: 65280 Output: Unexpected error running Liquibase: Error executing SQL CREATE INDEX
host_stats_timestamp_idx ON monitor.host_stats(timestamp): Duplicate key name 'host_stats_timestamp_idx'#012#012#012
Jun 21 21:49:15 Exinda mysqld_init[number]: [mysqld_init.NOTICE}: Database is now online
Solution
- Remove the Exinda from the network
- Reboot the appliance
- Login to the Web UI and navigate to Configuration > Reboot/Shutdown > Reboot
- If there is difficulty getting to the UI, connect over SSH and restart by following this article Exinda CLI: Reboot and Shutdown
- Contact Exinda support to obtain a temporary restricted shell license.
- <supportagent>
Please see Shell Unlock Key article to generate a customer-specific, temporary restricted shell license.
</supportagent> - Rebuild the database by following this article on The Monitoring Database has Stopped or Become Corrupted
- Wait for MySQL to stabilize
- Restart Jboss by following the Restarting Jboss article.
- Confirm the missing tables are restored (total tables should be ~280+) by going to the CLI and use the following commands:
en
_shell
<license key>
mysql
clear
connect monitor
show tables - Check the various monitor graphs are no longer producing errors and are displaying correctly
- Wait a few minutes before reconnecting the Exinda to the network
- Wait 5-10 minutes for the CPU to settle from the traffic spike.
- Check the CPU to ensure there are no high CPU loads.