MariaDB Galera

MariaDB Galera was ok to start but now with me reinstalling backend03 it’s not going great.

Sep 28 20:30:49 backend03.incandescent.tech docker[186863]: WSREP_SST: [INFO] Stale sst_in_progress file: /var/lib/mysql/sst_in_progress (20230928 18:30:49.649)
Sep 28 20:30:49 backend03.incandescent.tech docker[186863]: WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20230928 18:30:49.720)
Sep 28 20:30:50 backend03.incandescent.tech docker[186863]: WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20230928 18:30:50.746)
Sep 28 20:30:50 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:50 0 [Warning] WSREP: last inactive check more than PT6S ago (PT6.00067S), skipping check
Sep 28 20:30:51 backend03.incandescent.tech docker[186863]: WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20230928 18:30:51.786)
Sep 28 20:30:52 backend03.incandescent.tech docker[186863]: WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20230928 18:30:52.819)
Sep 28 20:30:53 backend03.incandescent.tech docker[186863]: WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20230928 18:30:53.840)
Sep 28 20:30:54 backend03.incandescent.tech docker[186863]: WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20230928 18:30:54.864)
Sep 28 20:30:55 backend03.incandescent.tech docker[186863]: WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20230928 18:30:55.893)
Sep 28 20:30:56 backend03.incandescent.tech docker[186863]: WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20230928 18:30:56.938)
Sep 28 20:30:57 backend03.incandescent.tech docker[186863]: WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20230928 18:30:57.978)
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20230928 18:30:59.003)
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: WSREP_SST: [ERROR] previous SST script still running. (20230928 18:30:59.009)
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 0 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_mariabackup --role 'joiner' --address '192.168.2.83' --datadir '/var/lib/mysql/' --parent 1 --progress 0 --binlog 'backend03-bin' --binlog-index 'backend03-bin.index'
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]:         Read: '(null)'
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 0 [ERROR] WSREP: Process completed with error: wsrep_sst_mariabackup --role 'joiner' --address '192.168.2.83' --datadir '/var/lib/mysql/' --parent 1 --progress 0 --binlog 'backend03-bin' --binlog-index 'backend03-bin.index': 114 (Operation already in progress)
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 2 [ERROR] WSREP: Failed to prepare for 'mariabackup' SST. Unrecoverable.
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 2 [ERROR] WSREP: SST request callback failed. This is unrecoverable, restart required.
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 2 [Note] WSREP: ReplicatorSMM::abort()
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 2 [Note] WSREP: Closing send monitor...
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 2 [Note] WSREP: Closed send monitor.
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 2 [Note] WSREP: gcomm: terminating thread
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 2 [Note] WSREP: gcomm: joining thread
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 2 [Note] WSREP: gcomm: closing backend
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 2 [Note] WSREP: view(view_id(NON_PRIM,52cd15fb-9b29,87) memb {
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]:         be038c1b-8061,0
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: } joined {
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: } left {
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: } partitioned {
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]:         52cd15fb-9b29,0
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]:         82c6ead4-bf0f,0
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: })
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 2 [Note] WSREP: PC protocol downgrade 1 -> 0
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 2 [Note] WSREP: view((empty))
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 2 [Note] WSREP: gcomm: closed
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 0 [Note] WSREP: Flow-control interval: [16, 16]
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 0 [Note] WSREP: Received NON-PRIMARY.
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 0 [Note] WSREP: Shifting PRIMARY -> OPEN (TO: 8042075)
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 0 [Note] WSREP: New SELF-LEAVE.
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 0 [Note] WSREP: Flow-control interval: [0, 0]
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 0 [Note] WSREP: Received SELF-LEAVE. Closing connection.
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 0 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 8042075)
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 0 [Note] WSREP: RECV thread exiting 0: Success
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 2 [Note] WSREP: recv_thread() joined.
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 2 [Note] WSREP: Closing replication queue.
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 2 [Note] WSREP: Closing slave action queue.
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 2 [Note] WSREP: mariadbd: Terminated.
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 230928 18:30:59 [ERROR] mysqld got signal 11 ;
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: This could be because you hit a bug. It is also possible that this binary
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: or one of the libraries it was linked against is corrupt, improperly built,
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: or misconfigured. This error can also be caused by malfunctioning hardware.
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: We will try our best to scrape up some info that will hopefully help
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: diagnose the problem, but since we have already crashed,
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: something is definitely wrong and this may fail.
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Server version: 11.1.2-MariaDB-1:11.1.2+maria~ubu2204-log source revision: 9bc25d98209df6810f7a7d5e7dd3ae677a313ab5
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: key_buffer_size=0
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: read_buffer_size=131072
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: max_used_connections=0
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: max_threads=1002
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: thread_count=3
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: It is possible that mysqld could use up to
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2206964 K  bytes of memory
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Hope that's ok; if not, decrease some variables in the equation.
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Thread pointer: 0x7f6acc000c68
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Attempting backtrace. You can use the following information to find out
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: where mysqld died. If you see no messages after this, something went
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: terribly wrong...
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: stack_bottom = 0x7f6ae8fb0c68 thread_stack 0x49000
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Printing to addr2line failed
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: mariadbd(my_print_stacktrace+0x32)[0x5575c97de7c2]
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: mariadbd(handle_fatal_signal+0x488)[0x5575c92b7cf8]
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f6af380b520]
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: /lib/x86_64-linux-gnu/libc.so.6(abort+0x178)[0x7f6af37f1898]
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: /usr/lib/libgalera_smm.so(+0x156812)[0x7f6aeb1fb812]
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: /usr/lib/libgalera_smm.so(+0x6f151)[0x7f6aeb114151]
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: /usr/lib/libgalera_smm.so(+0x6bdb4)[0x7f6aeb110db4]
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: /usr/lib/libgalera_smm.so(+0x8a5b1)[0x7f6aeb12f5b1]
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: /usr/lib/libgalera_smm.so(+0x5f690)[0x7f6aeb104690]
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: /usr/lib/libgalera_smm.so(+0x47611)[0x7f6aeb0ec611]
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: mariadbd(_ZN5wsrep18wsrep_provider_v2611run_applierEPNS_21high_priority_serviceE+0x12)[0x5575c989d3a2]
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: mariadbd(+0xd5f191)[0x5575c9571191]
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: mariadbd(_Z15start_wsrep_THDPv+0x26b)[0x5575c955f15b]
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: mariadbd(+0xcd1906)[0x5575c94e3906]
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: /lib/x86_64-linux-gnu/libc.so.6(+0x94b43)[0x7f6af385db43]
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x44)[0x7f6af38eebb4]
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Trying to get some variables.
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Some pointers may be invalid and cause the dump to abort.
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Query (0x0): (null)
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Connection ID (thread ID): 2
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Status: NOT_KILLED
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off,hash_join_cardinality=on
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: information that should help you find out what is causing the crash.
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: We think the query pointer is invalid, but we will try to print it anyway.
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Query:
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Writing a core file...
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Working directory at /var/lib/mysql
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Resource Limits:
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Limit                     Soft Limit           Hard Limit           Units
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Max cpu time              unlimited            unlimited            seconds
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Max file size             unlimited            unlimited            bytes
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Max data size             unlimited            unlimited            bytes
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Max stack size            8388608              unlimited            bytes
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Max core file size        0                    0                    bytes
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Max resident set          unlimited            unlimited            bytes
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Max processes             unlimited            unlimited            processes
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Max open files            1073741816           1073741816           files
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Max locked memory         8388608              8388608              bytes
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Max address space         unlimited            unlimited            bytes
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Max file locks            unlimited            unlimited            locks
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Max pending signals       14479                14479                signals
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Max msgqueue size         819200               819200               bytes
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Max nice priority         0                    0
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Max realtime priority     0                    0
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Max realtime timeout      unlimited            unlimited            us
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Core pattern: |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Kernel version: Linux version 5.14.0-284.11.1.el9_2.x86_64 (mockbuild@x64-builder01.almalinux.org) (gcc (GCC) 11.3.1 20221121 (Red Hat 11.3.1-4), GNU ld version 2.35.2-37.el9) #1 SMP PREEMPT_DYNAMIC Tue May 9 05:49:00 EDT 2023
Sep 28 20:30:59 backend03.incandescent.tech systemd[1]: mariadbgalera.service: Main process exited, code=exited, status=139/n/a

This is with the following config:

[root@backend03 ~]# cat /etc/containers/mariadbgalera/config/*
[mariadb]
log-bin                    = ON
server-id                  = 3
log-basename               = backend03

wsrep_cluster_address      = gcomm://backend01.incandescent.tech,backend02.incandescent.tech,backend03.incandescent.tech

wsrep_cluster_name         = svealiden
binlog-format              = ROW
default_storage_engine     = InnoDB
innodb_autoinc_lock_mode   = 2
wsrep_on                   = ON
wsrep_log_conflicts        = ON
wsrep_node_address         = 192.168.2.83
wsrep_sst_receive_address  = 192.168.2.83
wsrep_provider             = /usr/lib/libgalera_smm.so
wsrep_provider_options     = ist.recv_addr=192.168.2.83;ist.recv_bind=0.0.0.0;evs.inactive_check_period=PT2S;evs.view_forget_timeout=P15M
wsrep_sst_method           = mariabackup
[mysqld]
skip-external-locking
bind-address                    = 0.0.0.0
expire_logs_days                = 4
gtid-domain-id                  = 10
character-set-server            = utf8mb4
collation-server                = utf8mb4_general_ci
innodb_buffer_pool_size         = 1G
innodb_compression_algorithm    = zlib
innodb_compression_default      = ON
performance_schema              = 1
max_connect_errors              = 1000
max_connections                 = 1000
max_user_connections            = 50

I’m also using a custom script for SST without timeout raised to one hour:

    impts=$(parse_cnf sst inno-move-opts "")
    stimeout=$(parse_cnf sst sst-initial-timeout 3600)
    ssyslog=$(parse_cnf sst sst-syslog 0)

This is done with ansible.

 - name: Copy wsrep script file
   ansible.builtin.copy:
     src: wsrep_sst_mariabackup
     dest: /etc/incandescent/containers/mariadbgalera/config/wsrep_sst_mariabackup
     mode: '755'

The service is started like this:

cjp@workstation:~/incandescent.tech/roles$ cat mariadbcluster/templates/mariadbgalera.service.j2
[Unit]
Description=MariaDB Galera

[Service]
TimeoutStartSec=3600
RestartSec=20
Restart=always
ExecStartPre=-/usr/bin/docker stop mariadbgalera
ExecStartPre=-/usr/bin/docker rm mariadbgalera
ExecStart=/usr/bin/docker run --name mariadbgalera -p {{publicport}}:{{privateport}} -p {{publicportgalera}}:{{privateportgalera}} -p {{publicportist}}:{{privateportist}} -p {{publicportsst}}:{{privateportsst}} -v /etc/incandescent/containers/mariadbgalera/config/wsrep_sst_mariabackup:/usr/bin/wsrep_sst_mariabackup -v /etc/incandescent/containers/mariadbgalera/config:/etc/mysql/conf.d -v /srv/storage/mariadb/data:/var/lib/mysql  -v /srv/storage/mariadbbackups:/backup --env-file /etc/incandescent/containers/mariadbgalera/environment/mariadbgalera.env --cpu-quota=100000 --memory={{memlimit}}m "{{registryhost}}:{{registryport}}/{{image_basename}}:{{image_tagname}}" {% if bootstrap != 0 %}-- --wsrep-new-cluster{% endif %}

[Install]
WantedBy=multi-user.target

Weird. Now I’m seeing lsof taking all CPU:

top - 20:43:54 up  8:04,  1 user,  load average: 1.09, 1.31, 1.25
Tasks: 207 total,   2 running, 205 sleeping,   0 stopped,   0 zombie
%Cpu(s): 10.4 us, 13.3 sy,  0.0 ni, 71.2 id,  0.0 wa,  0.6 hi,  0.6 si,  3.9 st
MiB Mem :   3661.7 total,   1561.0 free,   1003.4 used,   1362.7 buff/cache
MiB Swap:   3584.0 total,   3583.2 free,      0.8 used.   2658.3 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 188514 systemd+  20   0    5424   1616   1372 R  92.7   0.0  10:49.62 lsof -Pnl -i :4444

Der Angriff Steiner war ein Befehl! OK, I’ll try to calm down…

So I went back to rsyncing and yes lsof stalls somehow. So I changed the wsrep_sst_rsync script and made sure it didn’t run lsof and now we get past that part.

top - 21:36:40 up  8:57,  1 user,  load average: 1.04, 0.93, 1.16
Tasks: 206 total,   1 running, 205 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.9 us,  1.7 sy,  0.0 ni, 84.1 id,  0.0 wa,  0.9 hi,  6.0 si,  6.4 st
MiB Mem :   3661.7 total,    127.2 free,   1017.9 used,   2787.6 buff/cache
MiB Swap:   3584.0 total,   3583.0 free,      1.0 used.   2643.8 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 194726 systemd+  20   0   10912   3344   2048 S   3.3   0.1   0:35.21 rsync --daemon --no-detach --port 4444 --config /var/lib/mysql/rsync_sst.conf
  26799 root      20   0 1619348  83060  51416 S   1.3   2.2   9:16.91 /usr/sbin/promtail-linux-amd64 -config.file /etc/promtail/promtail.yaml
 194291 root      20   0  719844  17620   7096 S   1.3   0.5   0:03.42 /usr/bin/containerd-shim-runc-v2 -namespace moby -id e4ccec842b4c4f20225b2daf876f55613032eb49a72e5+
     28 root      20   0       0      0      0 S   0.7   0.0   0:55.24 [ksoftirqd/2]

It’s not done yet so it still might crash like before. Weird how strace shows a lot of SQL statement. With rsync binary files are transferred. Well, tar balls is pretty great.

Uhm, I’m starting to think rsync got stuck. This is the size of tables on backend02:

4.0K    /srv/storage/mariadb/data/zabbix/history_log.frm
28K     /srv/storage/mariadb/data/zabbix/history_log.ibd
4.0K    /srv/storage/mariadb/data/zabbix/history_str.frm
2.1M    /srv/storage/mariadb/data/zabbix/history_str.ibd
4.0K    /srv/storage/mariadb/data/zabbix/history_text.frm
1.3G    /srv/storage/mariadb/data/zabbix/history_text.ibd
4.0K    /srv/storage/mariadb/data/zabbix/history_uint.frm
328M    /srv/storage/mariadb/data/zabbix/history_uint.ibd

If we go to backend03:

4.0K    /srv/storage/mariadb/data/zabbix/history_log.frm
64K     /srv/storage/mariadb/data/zabbix/history_log.ibd
4.0K    /srv/storage/mariadb/data/zabbix/history_str.frm
15M     /srv/storage/mariadb/data/zabbix/history_str.ibd
4.0K    /srv/storage/mariadb/data/zabbix/history_text.frm
8.0G    /srv/storage/mariadb/data/zabbix/history_text.ibd

Uhm, why is history_text 8G on backend03? No compression? It’s the same binary running on both systems. history_str is also larger by a wide margin. rsync seems to have completed but then we get the crash:

Sep 28 21:58:40 backend03.incandescent.tech docker[194177]: 2023-09-28 19:58:40 0 [Note] WSREP: 0.0 (5e5dd1122766): State transfer to 2.0 (e4ccec842b4c) complete.
Sep 28 21:58:40 backend03.incandescent.tech docker[194177]: WSREP_SST: [INFO] Extracting binlog files: (20230928 19:58:40.822)
Sep 28 21:58:40 backend03.incandescent.tech docker[194177]: backend01-bin.000018
Sep 28 21:58:40 backend03.incandescent.tech docker[194177]: WSREP_SST: [INFO] Galera co-ords from recovery: 52d278a6-50ad-11ee-8431-2ab12cc70be8:8066709 0 (20230928 19:58:40.857)
Sep 28 21:58:40 backend03.incandescent.tech docker[194177]: WSREP_SST: [INFO] rsync SST completed on joiner (20230928 19:58:40.866)
Sep 28 21:58:40 backend03.incandescent.tech docker[194177]: WSREP_SST: [INFO] Joiner cleanup: rsync PID=263, stunnel PID=0 (20230928 19:58:40.874)
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: WSREP_SST: [INFO] Joiner cleanup done. (20230928 19:58:41.415)
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: 2023-09-28 19:58:41 3 [Note] WSREP: SST received
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: 2023-09-28 19:58:41 3 [Note] WSREP: Server status change joiner -> initializing
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: 2023-09-28 19:58:41 3 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: 2023-09-28 19:58:41 0 [Note] mariadbd: Aria engine: starting recovery
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: tables to flush: 1 0 (0.0 seconds);
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: 2023-09-28 19:58:41 0 [Note] mariadbd: Aria engine: recovery done
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: 2023-09-28 19:58:41 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: 2023-09-28 19:58:41 0 [Note] InnoDB: Number of transaction pools: 1
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: 2023-09-28 19:58:41 0 [Note] InnoDB: Using SSE4.2 crc32 instructions
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: 2023-09-28 19:58:41 0 [Note] InnoDB: Initializing buffer pool, total size = 1.000GiB, chunk size = 16.000MiB
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: 2023-09-28 19:58:41 0 [Note] InnoDB: Completed initialization of buffer pool
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: 2023-09-28 19:58:41 0 [Note] InnoDB: File system buffers for log disabled (block size=512 bytes)
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: 2023-09-28 19:58:41 0 [ERROR] InnoDB: Upgrade after a crash is not supported. The redo log was created with MariaDB 10.6.15. You must start up and shut down MariaDB 10.7 or earlier.
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: 2023-09-28 19:58:41 0 [ERROR] InnoDB: Plugin initialization aborted with error Generic error
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: 2023-09-28 19:58:41 0 [Note] InnoDB: Starting shutdown...
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: 2023-09-28 19:58:41 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: 2023-09-28 19:58:41 0 [Note] Plugin 'FEEDBACK' is disabled.
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: 2023-09-28 19:58:41 0 [ERROR] Unknown/unsupported storage engine: InnoDB
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: 2023-09-28 19:58:41 0 [ERROR] Aborting
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: 2023-09-28 19:58:41 3 [ERROR] WSREP: sst_received failed: State wait was interrupted
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: 2023-09-28 19:58:41 1 [ERROR] WSREP: Application received wrong state:
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]:         Received: 00000000-0000-0000-0000-000000000000
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]:         Required: 52d278a6-50ad-11ee-8431-2ab12cc70be8
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: 2023-09-28 19:58:41 1 [ERROR] WSREP: Application state transfer failed. This is unrecoverable condition, restart required.
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: 230928 19:58:41 [ERROR] mysqld got signal 11 ;
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: This could be because you hit a bug. It is also possible that this binary
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: or one of the libraries it was linked against is corrupt, improperly built,
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: or misconfigured. This error can also be caused by malfunctioning hardware.
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: We will try our best to scrape up some info that will hopefully help
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: diagnose the problem, but since we have already crashed,
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: something is definitely wrong and this may fail.
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Server version: 11.1.2-MariaDB-1:11.1.2+maria~ubu2204-log source revision: 9bc25d98209df6810f7a7d5e7dd3ae677a313ab5
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: key_buffer_size=134217728
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: read_buffer_size=131072
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: max_used_connections=0
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: max_threads=1002
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: thread_count=2
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: It is possible that mysqld could use up to
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2338036 K  bytes of memory
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Hope that's ok; if not, decrease some variables in the equation.
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Thread pointer: 0x7fe694000c68
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Attempting backtrace. You can use the following information to find out
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: where mysqld died. If you see no messages after this, something went
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: terribly wrong...
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: stack_bottom = 0x7fe6ac602c68 thread_stack 0x49000
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Printing to addr2line failed
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: mariadbd(my_print_stacktrace+0x32)[0x5610db8167c2]
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: mariadbd(handle_fatal_signal+0x488)[0x5610db2efcf8]
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7fe6b665c520]
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: /lib/x86_64-linux-gnu/libc.so.6(abort+0x178)[0x7fe6b6642898]
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: /usr/lib/libgalera_smm.so(+0x156812)[0x7fe6ae04c812]
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: /usr/lib/libgalera_smm.so(+0x6f151)[0x7fe6adf65151]
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: /usr/lib/libgalera_smm.so(+0x6cde1)[0x7fe6adf62de1]
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: /usr/lib/libgalera_smm.so(+0x8a5b1)[0x7fe6adf805b1]
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: /usr/lib/libgalera_smm.so(+0x5f690)[0x7fe6adf55690]
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: /usr/lib/libgalera_smm.so(+0x47611)[0x7fe6adf3d611]
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: mariadbd(_ZN5wsrep18wsrep_provider_v2611run_applierEPNS_21high_priority_serviceE+0x12)[0x5610db8d53a2]
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: mariadbd(+0xd5f191)[0x5610db5a9191]
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: mariadbd(_Z15start_wsrep_THDPv+0x26b)[0x5610db59715b]
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: mariadbd(+0xcd1906)[0x5610db51b906]
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: /lib/x86_64-linux-gnu/libc.so.6(+0x94b43)[0x7fe6b66aeb43]
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x44)[0x7fe6b673fbb4]
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Trying to get some variables.
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Some pointers may be invalid and cause the dump to abort.
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Query (0x0): (null)
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Connection ID (thread ID): 1
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Status: NOT_KILLED
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off,hash_join_cardinality=on
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: information that should help you find out what is causing the crash.
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: We think the query pointer is invalid, but we will try to print it anyway.
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Query:
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Writing a core file...
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Working directory at /var/lib/mysql
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Resource Limits:
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Limit                     Soft Limit           Hard Limit           Units
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Max cpu time              unlimited            unlimited            seconds
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Max file size             unlimited            unlimited            bytes
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Max data size             unlimited            unlimited            bytes
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Max stack size            8388608              unlimited            bytes
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Max core file size        0                    0                    bytes
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Max resident set          unlimited            unlimited            bytes
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Max processes             unlimited            unlimited            processes
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Max open files            1073741816           1073741816           files
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Max locked memory         8388608              8388608              bytes
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Max address space         unlimited            unlimited            bytes
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Max file locks            unlimited            unlimited            locks
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Max pending signals       14479                14479                signals
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Max msgqueue size         819200               819200               bytes
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Max nice priority         0                    0
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Max realtime priority     0                    0
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Max realtime timeout      unlimited            unlimited            us
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Core pattern: |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h
Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Kernel version: Linux version 5.14.0-284.11.1.el9_2.x86_64 (mockbuild@x64-builder01.almalinux.org) (gcc (GCC) 11.3.1 20221121 (Red Hat 11.3.1-4), GNU ld version 2.35.2-37.el9) #1 SMP PREEMPT_DYNAMIC Tue May 9 05:49:00 EDT 2023
Sep 28 21:58:42 backend03.incandescent.tech systemd[1]: mariadbgalera.service: Main process exited, code=exited, status=139/n/a

I’ll try tomorrow again using Ubuntu 22.04 which is what my backend cluster currently runs and we’ll see what happens. But I hope scylla behaves better. I’m a-ok with replace-node and all that but these crashes? RDBMS clustered is a pain.

Addendum 1:

Scylla is nice.

root@backend02:~# scylla nodetool status
Datacenter: svealiden
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load       Tokens       Owns    Host ID                               Rack
UN  192.168.2.81  4.64 MB    1            ?       6be20c62-a7f6-41b9-8924-1de608b5fb49  one
UN  192.168.2.82  4.67 MB    1            ?       c4e0631b-a282-4bd8-866b-aa2f15877f1c  one
UN  192.168.2.83  4.64 MB    1            ?       5a902584-96a2-4a0e-9e79-77ddcaa1f62f  one

From backend03:

[root@backend03 ~]# scylla nodetool status
Datacenter: svealiden
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load       Tokens       Owns    Host ID                               Rack
UN  192.168.2.81  ?          1            ?       6be20c62-a7f6-41b9-8924-1de608b5fb49  one
UN  192.168.2.82  ?          1            ?       c4e0631b-a282-4bd8-866b-aa2f15877f1c  one
UN  192.168.2.83  6.03 MB    1            ?       365a7f90-1626-46ff-86e2-0d2ebdf3d762  one

Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless
[root@backend03 ~]# scylla nodetool status
Datacenter: svealiden
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load       Tokens       Owns    Host ID                               Rack
UN  192.168.2.81  ?          1            ?       6be20c62-a7f6-41b9-8924-1de608b5fb49  one
UN  192.168.2.82  4.71 MB    1            ?       c4e0631b-a282-4bd8-866b-aa2f15877f1c  one
UN  192.168.2.83  6.03 MB    1            ?       365a7f90-1626-46ff-86e2-0d2ebdf3d762  one

Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless
[root@backend03 ~]# scylla nodetool status
Datacenter: svealiden
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load       Tokens       Owns    Host ID                               Rack
UN  192.168.2.81  4.68 MB    1            ?       6be20c62-a7f6-41b9-8924-1de608b5fb49  one
UN  192.168.2.82  4.71 MB    1            ?       c4e0631b-a282-4bd8-866b-aa2f15877f1c  one
UN  192.168.2.83  6.03 MB    1            ?       365a7f90-1626-46ff-86e2-0d2ebdf3d762  one

Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless

Addendum 2:

I’m running docker with version 10.11.5 now and it works if I use rsync but this causes a crash just like I saw on 11.1.2 so this line has to do in the config:

wsrep_sst_method           = mariabackup

In the meantime I tried running Zabbix with PostgreSQL and cockroachdb. But Zabbix need some of the features that cockroachdb hasn’t implemented. So no dice there. If Galera is too unpredictable I can always use a single master and two slaves and use my scripts for switching between them. Standard replication.

Addendum 3:

I think I might have come across a good combo. It passes my exceptional “can be fixed while I’m drunk” test! Sure, it still requires me to patch the wsrep_sst_rsync script but I did that part while I was sober. Getting the galera cluster from 1 to 3 nodes passed the test. Now for some music!