MariaDB galera cluster fail during dump or optimize

Support/Development for MySQL, MariaDB, and other database systems
gijs007
New Forum User
New Forum User
Posts: 4
Joined: Mon Jan 11, 2016 9:00 am
Location: Netherlands

MariaDB galera cluster fail during dump or optimize

Unread post by gijs007 »

I'm experiencing an issue with a MariaDB cluster when dumping or optimizing databases.
When I run a a "mysqlcheck" or a "mysqldump" on my MariaDB 10.1 database server (which runs in a Galera cluster with two other servers) then the tasks stop after a short time and don't show any progress. The entire cluster seems to stall.

For example the mysqldump stops after creating an 14,0 MB (14.760.912 bytes) dump file and doesn't proceed, even after several minutes nothing happens.
The mysqlcheck to repair and optimize tables also hangs after checking a few tables.

In both situations the cluster starts to have issues and the only way to get it to work normally again is by taking the server that executed the job offline and also taking another server offline. I then take them back online on by one and the cluster works normally again.

I'm not sure what is causing these problems. I haven't found any errors in the syslog, although during the shutdown of the servers I noticed the following:

Jan 10 20:43:46 france mysqld[1015]: 2016-01-10 20:43:46 140096330258176 [Warning] WSREP: TO isolation failed for: 3, schema: mysql, sql: OPTIMIZE TABLE proc. Check wsrep connection state and retry the query.

Jan 10 21:58:47 france mysqld[1034]: 2016-01-10 21:58:47 139691511322368 [Warning] WSREP: TO isolation failed for: 3, schema: smf, sql: OPTIMIZE TABLE smf_categories. Check wsrep connection state and retry the query. Jan 10 21:58:47 france mysqld[1034]: 2016-01-10 21:58:47 139691511322368 [Warning] Aborted connection 24 to db: 'smf' user: 'maintenance' host: 'localhost' (Unknown error) Jan 10 21:58:47 france mysqld[1034]: 2016-01-10 21:58:47 139691509827328 [Warning] WSREP: TO isolation failed for: 3, schema: (null), sql: SELECT 1 FROM mysql.user LIMIT 1. Check wsrep connection state and retry the query.


Any idea what could be the cause of this?
scott
Atomicorp Staff - Site Admin
Atomicorp Staff - Site Admin
Posts: 8355
Joined: Wed Dec 31, 1969 8:00 pm
Location: earth
Contact:

Re: MariaDB galera cluster fail during dump or optimize

Unread post by scott »

I havent sen that one before. A few things to look into, is that node in a failed state for any other reason? Like are any tables crashed, or something along those lines.

Another might be some big blocking job (like an optimize) running from another application talking to the DB. Or even a filesystem error on that node blocking a read.

Last thing to try would be to try a different SST method
gijs007
New Forum User
New Forum User
Posts: 4
Joined: Mon Jan 11, 2016 9:00 am
Location: Netherlands

Re: MariaDB galera cluster fail during dump or optimize

Unread post by gijs007 »

Thank you for the quick reply.

The node is working fine in production simulations up till I run the dump or optimize. (so I don't think there are any crashed tables, otherwise it would be in the log files. and the server wouldn't be working correctly for normal requests.)

There aren't any running applications which run an optimize job.

I'm thinking along the lines of a performance issue or a deadlock bug.

My Galera config looks like:

Code: Select all

[galera]
# Mandatory settings
wsrep_on=ON
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_cluster_name="cluster_name"
wsrep_cluster_address="gcomm://IP's"
binlog_format=row
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2

#
# Allow server to accept connections on all interfaces.
#
bind-address=0.0.0.0
#
# Optional setting
wsrep_slave_threads=16
wsrep_restart_slave=1
wsrep_replicate_myisam=1
wsrep_gtid_domain_id=1
wsrep_gtid_mode=1
innodb_flush_log_at_trx_commit=1
innodb_flush_neighbors=1

wsrep_provider_options = gcache.size = 32G;evs.keepalive_period = PT3S;evs.suspect_timeout = PT30S;evs.inactive_timeout = PT1M;evs.install_timeout = PT1M;evs.send_window = 512;evs.user_send_window = 512


# Galera Synchronization Congifuration
wsrep_sst_method=rsync
#wsrep_sst_auth=user:pass

# Galera Node Configuration
wsrep_node_address="IP"
wsrep_node_name="A name"
I've removed the IP's and names, for security reasons ;)

As soon as I turn wsrep_on=ON into wsrep_on=OFF and restart the server the optimize and dump job run fine.

I'll try with the mysqldump as SST now and I'll edit my post once I have the results.
prupert
Forum Regular
Forum Regular
Posts: 573
Joined: Tue Aug 01, 2006 2:45 pm
Location: Netherlands

Re: MariaDB galera cluster fail during dump or optimize

Unread post by prupert »

gijs007 wrote: wsrep_replicate_myisam=1
Are you sure? :shock:
As soon as I turn wsrep_on=ON into wsrep_on=OFF and restart the server the optimize and dump job run fine.
That's most likely because OPTIMIZE statements will also get replicated over the cluster, thus causing potential significant performance degradation, in your case resulting in a stalling cluster. Additionally, mysqldump will lock tables, which can cause all kinds of issues when the node is actively used by your application. Check your process list for table names when this happens, perhaps you see a pattern there.
I'll try with the mysqldump as SST now and I'll edit my post once I have the results.
Why not use xtrabackup for state transfers? Rsync and mysqldump are slow and blocking.
Lemonbit Internet Dedicated Server Management
gijs007
New Forum User
New Forum User
Posts: 4
Joined: Mon Jan 11, 2016 9:00 am
Location: Netherlands

Re: MariaDB galera cluster fail during dump or optimize

Unread post by gijs007 »

prupert wrote:
gijs007 wrote: wsrep_replicate_myisam=1
Are you sure? :shock:
Yup, most of my database is xtradb/innodb. I have a few tables that are myisam which aren't edited very often.
As soon as I turn wsrep_on=ON into wsrep_on=OFF and restart the server the optimize and dump job run fine.
That's most likely because OPTIMIZE statements will also get replicated over the cluster, thus causing potential significant performance degradation, in your case resulting in a stalling cluster. Additionally, mysqldump will lock tables, which can cause all kinds of issues when the node is actively used by your application. Check your process list for table names when this happens, perhaps you see a pattern there.
I was pointed into the right direction by Tom on the MariaDB KB: https://mariadb.com/kb/en/mariadb/galer ... mment_1911
The issue appears to be caused by flow control. I've resolved it by tuning the flow control settings.
I did this by adding the following wsrep_provider_options: gcs.fc_limit=500; gcs.fc_master_slave=YES; gcs.fc_factor=1.0
I'll try with the mysqldump as SST now and I'll edit my post once I have the results.
Why not use xtrabackup for state transfers? Rsync and mysqldump are slow and blocking.
Xtrabackup doesn't support "wsrep_gtid_mode=1" according to the MariaDB blog regarding Galera SST modes.
gijs007
New Forum User
New Forum User
Posts: 4
Joined: Mon Jan 11, 2016 9:00 am
Location: Netherlands

Re: MariaDB galera cluster fail during dump or optimize

Unread post by gijs007 »

I've noticed that the cluster still go's down after completing the backup with the new flow control settings..
Post Reply