MariaDB Galera was ok to start but now with me reinstalling backend03 it’s not going great.
Sep 28 20:30:49 backend03.incandescent.tech docker[186863]: WSREP_SST: [INFO] Stale sst_in_progress file: /var/lib/mysql/sst_in_progress (20230928 18:30:49.649)
Sep 28 20:30:49 backend03.incandescent.tech docker[186863]: WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20230928 18:30:49.720)
Sep 28 20:30:50 backend03.incandescent.tech docker[186863]: WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20230928 18:30:50.746)
Sep 28 20:30:50 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:50 0 [Warning] WSREP: last inactive check more than PT6S ago (PT6.00067S), skipping check
Sep 28 20:30:51 backend03.incandescent.tech docker[186863]: WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20230928 18:30:51.786)
Sep 28 20:30:52 backend03.incandescent.tech docker[186863]: WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20230928 18:30:52.819)
Sep 28 20:30:53 backend03.incandescent.tech docker[186863]: WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20230928 18:30:53.840)
Sep 28 20:30:54 backend03.incandescent.tech docker[186863]: WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20230928 18:30:54.864)
Sep 28 20:30:55 backend03.incandescent.tech docker[186863]: WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20230928 18:30:55.893)
Sep 28 20:30:56 backend03.incandescent.tech docker[186863]: WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20230928 18:30:56.938)
Sep 28 20:30:57 backend03.incandescent.tech docker[186863]: WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20230928 18:30:57.978)
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20230928 18:30:59.003)
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: WSREP_SST: [ERROR] previous SST script still running. (20230928 18:30:59.009)
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 0 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_mariabackup --role 'joiner' --address '192.168.2.83' --datadir '/var/lib/mysql/' --parent 1 --progress 0 --binlog 'backend03-bin' --binlog-index 'backend03-bin.index'
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Read: '(null)'
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 0 [ERROR] WSREP: Process completed with error: wsrep_sst_mariabackup --role 'joiner' --address '192.168.2.83' --datadir '/var/lib/mysql/' --parent 1 --progress 0 --binlog 'backend03-bin' --binlog-index 'backend03-bin.index': 114 (Operation already in progress)
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 2 [ERROR] WSREP: Failed to prepare for 'mariabackup' SST. Unrecoverable.
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 2 [ERROR] WSREP: SST request callback failed. This is unrecoverable, restart required.
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 2 [Note] WSREP: ReplicatorSMM::abort()
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 2 [Note] WSREP: Closing send monitor...
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 2 [Note] WSREP: Closed send monitor.
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 2 [Note] WSREP: gcomm: terminating thread
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 2 [Note] WSREP: gcomm: joining thread
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 2 [Note] WSREP: gcomm: closing backend
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 2 [Note] WSREP: view(view_id(NON_PRIM,52cd15fb-9b29,87) memb {
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: be038c1b-8061,0
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: } joined {
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: } left {
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: } partitioned {
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 52cd15fb-9b29,0
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 82c6ead4-bf0f,0
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: })
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 2 [Note] WSREP: PC protocol downgrade 1 -> 0
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 2 [Note] WSREP: view((empty))
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 2 [Note] WSREP: gcomm: closed
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 0 [Note] WSREP: Flow-control interval: [16, 16]
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 0 [Note] WSREP: Received NON-PRIMARY.
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 0 [Note] WSREP: Shifting PRIMARY -> OPEN (TO: 8042075)
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 0 [Note] WSREP: New SELF-LEAVE.
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 0 [Note] WSREP: Flow-control interval: [0, 0]
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 0 [Note] WSREP: Received SELF-LEAVE. Closing connection.
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 0 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 8042075)
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 0 [Note] WSREP: RECV thread exiting 0: Success
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 2 [Note] WSREP: recv_thread() joined.
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 2 [Note] WSREP: Closing replication queue.
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 2 [Note] WSREP: Closing slave action queue.
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 2023-09-28 18:30:59 2 [Note] WSREP: mariadbd: Terminated.
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: 230928 18:30:59 [ERROR] mysqld got signal 11 ;
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: This could be because you hit a bug. It is also possible that this binary
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: or one of the libraries it was linked against is corrupt, improperly built,
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: or misconfigured. This error can also be caused by malfunctioning hardware.
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: We will try our best to scrape up some info that will hopefully help
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: diagnose the problem, but since we have already crashed,
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: something is definitely wrong and this may fail.
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Server version: 11.1.2-MariaDB-1:11.1.2+maria~ubu2204-log source revision: 9bc25d98209df6810f7a7d5e7dd3ae677a313ab5
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: key_buffer_size=0
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: read_buffer_size=131072
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: max_used_connections=0
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: max_threads=1002
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: thread_count=3
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: It is possible that mysqld could use up to
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2206964 K bytes of memory
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Hope that's ok; if not, decrease some variables in the equation.
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Thread pointer: 0x7f6acc000c68
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Attempting backtrace. You can use the following information to find out
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: where mysqld died. If you see no messages after this, something went
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: terribly wrong...
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: stack_bottom = 0x7f6ae8fb0c68 thread_stack 0x49000
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Printing to addr2line failed
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: mariadbd(my_print_stacktrace+0x32)[0x5575c97de7c2]
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: mariadbd(handle_fatal_signal+0x488)[0x5575c92b7cf8]
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f6af380b520]
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: /lib/x86_64-linux-gnu/libc.so.6(abort+0x178)[0x7f6af37f1898]
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: /usr/lib/libgalera_smm.so(+0x156812)[0x7f6aeb1fb812]
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: /usr/lib/libgalera_smm.so(+0x6f151)[0x7f6aeb114151]
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: /usr/lib/libgalera_smm.so(+0x6bdb4)[0x7f6aeb110db4]
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: /usr/lib/libgalera_smm.so(+0x8a5b1)[0x7f6aeb12f5b1]
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: /usr/lib/libgalera_smm.so(+0x5f690)[0x7f6aeb104690]
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: /usr/lib/libgalera_smm.so(+0x47611)[0x7f6aeb0ec611]
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: mariadbd(_ZN5wsrep18wsrep_provider_v2611run_applierEPNS_21high_priority_serviceE+0x12)[0x5575c989d3a2]
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: mariadbd(+0xd5f191)[0x5575c9571191]
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: mariadbd(_Z15start_wsrep_THDPv+0x26b)[0x5575c955f15b]
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: mariadbd(+0xcd1906)[0x5575c94e3906]
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: /lib/x86_64-linux-gnu/libc.so.6(+0x94b43)[0x7f6af385db43]
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x44)[0x7f6af38eebb4]
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Trying to get some variables.
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Some pointers may be invalid and cause the dump to abort.
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Query (0x0): (null)
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Connection ID (thread ID): 2
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Status: NOT_KILLED
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off,hash_join_cardinality=on
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: information that should help you find out what is causing the crash.
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: We think the query pointer is invalid, but we will try to print it anyway.
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Query:
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Writing a core file...
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Working directory at /var/lib/mysql
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Resource Limits:
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Limit Soft Limit Hard Limit Units
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Max cpu time unlimited unlimited seconds
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Max file size unlimited unlimited bytes
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Max data size unlimited unlimited bytes
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Max stack size 8388608 unlimited bytes
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Max core file size 0 0 bytes
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Max resident set unlimited unlimited bytes
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Max processes unlimited unlimited processes
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Max open files 1073741816 1073741816 files
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Max locked memory 8388608 8388608 bytes
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Max address space unlimited unlimited bytes
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Max file locks unlimited unlimited locks
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Max pending signals 14479 14479 signals
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Max msgqueue size 819200 819200 bytes
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Max nice priority 0 0
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Max realtime priority 0 0
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Max realtime timeout unlimited unlimited us
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Core pattern: |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h
Sep 28 20:30:59 backend03.incandescent.tech docker[186863]: Kernel version: Linux version 5.14.0-284.11.1.el9_2.x86_64 (mockbuild@x64-builder01.almalinux.org) (gcc (GCC) 11.3.1 20221121 (Red Hat 11.3.1-4), GNU ld version 2.35.2-37.el9) #1 SMP PREEMPT_DYNAMIC Tue May 9 05:49:00 EDT 2023
Sep 28 20:30:59 backend03.incandescent.tech systemd[1]: mariadbgalera.service: Main process exited, code=exited, status=139/n/a
This is with the following config:
[root@backend03 ~]# cat /etc/containers/mariadbgalera/config/* [mariadb] log-bin = ON server-id = 3 log-basename = backend03 wsrep_cluster_address = gcomm://backend01.incandescent.tech,backend02.incandescent.tech,backend03.incandescent.tech wsrep_cluster_name = svealiden binlog-format = ROW default_storage_engine = InnoDB innodb_autoinc_lock_mode = 2 wsrep_on = ON wsrep_log_conflicts = ON wsrep_node_address = 192.168.2.83 wsrep_sst_receive_address = 192.168.2.83 wsrep_provider = /usr/lib/libgalera_smm.so wsrep_provider_options = ist.recv_addr=192.168.2.83;ist.recv_bind=0.0.0.0;evs.inactive_check_period=PT2S;evs.view_forget_timeout=P15M wsrep_sst_method = mariabackup [mysqld] skip-external-locking bind-address = 0.0.0.0 expire_logs_days = 4 gtid-domain-id = 10 character-set-server = utf8mb4 collation-server = utf8mb4_general_ci innodb_buffer_pool_size = 1G innodb_compression_algorithm = zlib innodb_compression_default = ON performance_schema = 1 max_connect_errors = 1000 max_connections = 1000 max_user_connections = 50
I’m also using a custom script for SST without timeout raised to one hour:
impts=$(parse_cnf sst inno-move-opts "")
stimeout=$(parse_cnf sst sst-initial-timeout 3600)
ssyslog=$(parse_cnf sst sst-syslog 0)
This is done with ansible.
- name: Copy wsrep script file
ansible.builtin.copy:
src: wsrep_sst_mariabackup
dest: /etc/incandescent/containers/mariadbgalera/config/wsrep_sst_mariabackup
mode: '755'
The service is started like this:
cjp@workstation:~/incandescent.tech/roles$ cat mariadbcluster/templates/mariadbgalera.service.j2
[Unit]
Description=MariaDB Galera
[Service]
TimeoutStartSec=3600
RestartSec=20
Restart=always
ExecStartPre=-/usr/bin/docker stop mariadbgalera
ExecStartPre=-/usr/bin/docker rm mariadbgalera
ExecStart=/usr/bin/docker run --name mariadbgalera -p {{publicport}}:{{privateport}} -p {{publicportgalera}}:{{privateportgalera}} -p {{publicportist}}:{{privateportist}} -p {{publicportsst}}:{{privateportsst}} -v /etc/incandescent/containers/mariadbgalera/config/wsrep_sst_mariabackup:/usr/bin/wsrep_sst_mariabackup -v /etc/incandescent/containers/mariadbgalera/config:/etc/mysql/conf.d -v /srv/storage/mariadb/data:/var/lib/mysql -v /srv/storage/mariadbbackups:/backup --env-file /etc/incandescent/containers/mariadbgalera/environment/mariadbgalera.env --cpu-quota=100000 --memory={{memlimit}}m "{{registryhost}}:{{registryport}}/{{image_basename}}:{{image_tagname}}" {% if bootstrap != 0 %}-- --wsrep-new-cluster{% endif %}
[Install]
WantedBy=multi-user.target
Weird. Now I’m seeing lsof taking all CPU:
top - 20:43:54 up 8:04, 1 user, load average: 1.09, 1.31, 1.25
Tasks: 207 total, 2 running, 205 sleeping, 0 stopped, 0 zombie
%Cpu(s): 10.4 us, 13.3 sy, 0.0 ni, 71.2 id, 0.0 wa, 0.6 hi, 0.6 si, 3.9 st
MiB Mem : 3661.7 total, 1561.0 free, 1003.4 used, 1362.7 buff/cache
MiB Swap: 3584.0 total, 3583.2 free, 0.8 used. 2658.3 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
188514 systemd+ 20 0 5424 1616 1372 R 92.7 0.0 10:49.62 lsof -Pnl -i :4444
Der Angriff Steiner war ein Befehl! OK, I’ll try to calm down…
So I went back to rsyncing and yes lsof stalls somehow. So I changed the wsrep_sst_rsync script and made sure it didn’t run lsof and now we get past that part.
top - 21:36:40 up 8:57, 1 user, load average: 1.04, 0.93, 1.16
Tasks: 206 total, 1 running, 205 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.9 us, 1.7 sy, 0.0 ni, 84.1 id, 0.0 wa, 0.9 hi, 6.0 si, 6.4 st
MiB Mem : 3661.7 total, 127.2 free, 1017.9 used, 2787.6 buff/cache
MiB Swap: 3584.0 total, 3583.0 free, 1.0 used. 2643.8 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
194726 systemd+ 20 0 10912 3344 2048 S 3.3 0.1 0:35.21 rsync --daemon --no-detach --port 4444 --config /var/lib/mysql/rsync_sst.conf
26799 root 20 0 1619348 83060 51416 S 1.3 2.2 9:16.91 /usr/sbin/promtail-linux-amd64 -config.file /etc/promtail/promtail.yaml
194291 root 20 0 719844 17620 7096 S 1.3 0.5 0:03.42 /usr/bin/containerd-shim-runc-v2 -namespace moby -id e4ccec842b4c4f20225b2daf876f55613032eb49a72e5+
28 root 20 0 0 0 0 S 0.7 0.0 0:55.24 [ksoftirqd/2]
It’s not done yet so it still might crash like before. Weird how strace shows a lot of SQL statement. With rsync binary files are transferred. Well, tar balls is pretty great.
Uhm, I’m starting to think rsync got stuck. This is the size of tables on backend02:
4.0K /srv/storage/mariadb/data/zabbix/history_log.frm 28K /srv/storage/mariadb/data/zabbix/history_log.ibd 4.0K /srv/storage/mariadb/data/zabbix/history_str.frm 2.1M /srv/storage/mariadb/data/zabbix/history_str.ibd 4.0K /srv/storage/mariadb/data/zabbix/history_text.frm 1.3G /srv/storage/mariadb/data/zabbix/history_text.ibd 4.0K /srv/storage/mariadb/data/zabbix/history_uint.frm 328M /srv/storage/mariadb/data/zabbix/history_uint.ibd
If we go to backend03:
4.0K /srv/storage/mariadb/data/zabbix/history_log.frm 64K /srv/storage/mariadb/data/zabbix/history_log.ibd 4.0K /srv/storage/mariadb/data/zabbix/history_str.frm 15M /srv/storage/mariadb/data/zabbix/history_str.ibd 4.0K /srv/storage/mariadb/data/zabbix/history_text.frm 8.0G /srv/storage/mariadb/data/zabbix/history_text.ibd
Uhm, why is history_text 8G on backend03? No compression? It’s the same binary running on both systems. history_str is also larger by a wide margin. rsync seems to have completed but then we get the crash:
Sep 28 21:58:40 backend03.incandescent.tech docker[194177]: 2023-09-28 19:58:40 0 [Note] WSREP: 0.0 (5e5dd1122766): State transfer to 2.0 (e4ccec842b4c) complete. Sep 28 21:58:40 backend03.incandescent.tech docker[194177]: WSREP_SST: [INFO] Extracting binlog files: (20230928 19:58:40.822) Sep 28 21:58:40 backend03.incandescent.tech docker[194177]: backend01-bin.000018 Sep 28 21:58:40 backend03.incandescent.tech docker[194177]: WSREP_SST: [INFO] Galera co-ords from recovery: 52d278a6-50ad-11ee-8431-2ab12cc70be8:8066709 0 (20230928 19:58:40.857) Sep 28 21:58:40 backend03.incandescent.tech docker[194177]: WSREP_SST: [INFO] rsync SST completed on joiner (20230928 19:58:40.866) Sep 28 21:58:40 backend03.incandescent.tech docker[194177]: WSREP_SST: [INFO] Joiner cleanup: rsync PID=263, stunnel PID=0 (20230928 19:58:40.874) Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: WSREP_SST: [INFO] Joiner cleanup done. (20230928 19:58:41.415) Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: 2023-09-28 19:58:41 3 [Note] WSREP: SST received Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: 2023-09-28 19:58:41 3 [Note] WSREP: Server status change joiner -> initializing Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: 2023-09-28 19:58:41 3 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: 2023-09-28 19:58:41 0 [Note] mariadbd: Aria engine: starting recovery Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: tables to flush: 1 0 (0.0 seconds); Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: 2023-09-28 19:58:41 0 [Note] mariadbd: Aria engine: recovery done Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: 2023-09-28 19:58:41 0 [Note] InnoDB: Compressed tables use zlib 1.2.11 Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: 2023-09-28 19:58:41 0 [Note] InnoDB: Number of transaction pools: 1 Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: 2023-09-28 19:58:41 0 [Note] InnoDB: Using SSE4.2 crc32 instructions Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: 2023-09-28 19:58:41 0 [Note] InnoDB: Initializing buffer pool, total size = 1.000GiB, chunk size = 16.000MiB Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: 2023-09-28 19:58:41 0 [Note] InnoDB: Completed initialization of buffer pool Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: 2023-09-28 19:58:41 0 [Note] InnoDB: File system buffers for log disabled (block size=512 bytes) Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: 2023-09-28 19:58:41 0 [ERROR] InnoDB: Upgrade after a crash is not supported. The redo log was created with MariaDB 10.6.15. You must start up and shut down MariaDB 10.7 or earlier. Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: 2023-09-28 19:58:41 0 [ERROR] InnoDB: Plugin initialization aborted with error Generic error Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: 2023-09-28 19:58:41 0 [Note] InnoDB: Starting shutdown... Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: 2023-09-28 19:58:41 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed. Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: 2023-09-28 19:58:41 0 [Note] Plugin 'FEEDBACK' is disabled. Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: 2023-09-28 19:58:41 0 [ERROR] Unknown/unsupported storage engine: InnoDB Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: 2023-09-28 19:58:41 0 [ERROR] Aborting Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: 2023-09-28 19:58:41 3 [ERROR] WSREP: sst_received failed: State wait was interrupted Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: 2023-09-28 19:58:41 1 [ERROR] WSREP: Application received wrong state: Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Received: 00000000-0000-0000-0000-000000000000 Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Required: 52d278a6-50ad-11ee-8431-2ab12cc70be8 Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: 2023-09-28 19:58:41 1 [ERROR] WSREP: Application state transfer failed. This is unrecoverable condition, restart required. Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: 230928 19:58:41 [ERROR] mysqld got signal 11 ; Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: This could be because you hit a bug. It is also possible that this binary Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: or one of the libraries it was linked against is corrupt, improperly built, Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: or misconfigured. This error can also be caused by malfunctioning hardware. Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: We will try our best to scrape up some info that will hopefully help Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: diagnose the problem, but since we have already crashed, Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: something is definitely wrong and this may fail. Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Server version: 11.1.2-MariaDB-1:11.1.2+maria~ubu2204-log source revision: 9bc25d98209df6810f7a7d5e7dd3ae677a313ab5 Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: key_buffer_size=134217728 Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: read_buffer_size=131072 Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: max_used_connections=0 Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: max_threads=1002 Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: thread_count=2 Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: It is possible that mysqld could use up to Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2338036 K bytes of memory Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Hope that's ok; if not, decrease some variables in the equation. Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Thread pointer: 0x7fe694000c68 Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Attempting backtrace. You can use the following information to find out Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: where mysqld died. If you see no messages after this, something went Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: terribly wrong... Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: stack_bottom = 0x7fe6ac602c68 thread_stack 0x49000 Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Printing to addr2line failed Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: mariadbd(my_print_stacktrace+0x32)[0x5610db8167c2] Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: mariadbd(handle_fatal_signal+0x488)[0x5610db2efcf8] Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7fe6b665c520] Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: /lib/x86_64-linux-gnu/libc.so.6(abort+0x178)[0x7fe6b6642898] Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: /usr/lib/libgalera_smm.so(+0x156812)[0x7fe6ae04c812] Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: /usr/lib/libgalera_smm.so(+0x6f151)[0x7fe6adf65151] Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: /usr/lib/libgalera_smm.so(+0x6cde1)[0x7fe6adf62de1] Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: /usr/lib/libgalera_smm.so(+0x8a5b1)[0x7fe6adf805b1] Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: /usr/lib/libgalera_smm.so(+0x5f690)[0x7fe6adf55690] Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: /usr/lib/libgalera_smm.so(+0x47611)[0x7fe6adf3d611] Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: mariadbd(_ZN5wsrep18wsrep_provider_v2611run_applierEPNS_21high_priority_serviceE+0x12)[0x5610db8d53a2] Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: mariadbd(+0xd5f191)[0x5610db5a9191] Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: mariadbd(_Z15start_wsrep_THDPv+0x26b)[0x5610db59715b] Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: mariadbd(+0xcd1906)[0x5610db51b906] Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: /lib/x86_64-linux-gnu/libc.so.6(+0x94b43)[0x7fe6b66aeb43] Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x44)[0x7fe6b673fbb4] Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Trying to get some variables. Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Some pointers may be invalid and cause the dump to abort. Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Query (0x0): (null) Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Connection ID (thread ID): 1 Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Status: NOT_KILLED Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off,hash_join_cardinality=on Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: information that should help you find out what is causing the crash. Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: We think the query pointer is invalid, but we will try to print it anyway. Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Query: Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Writing a core file... Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Working directory at /var/lib/mysql Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Resource Limits: Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Limit Soft Limit Hard Limit Units Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Max cpu time unlimited unlimited seconds Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Max file size unlimited unlimited bytes Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Max data size unlimited unlimited bytes Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Max stack size 8388608 unlimited bytes Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Max core file size 0 0 bytes Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Max resident set unlimited unlimited bytes Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Max processes unlimited unlimited processes Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Max open files 1073741816 1073741816 files Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Max locked memory 8388608 8388608 bytes Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Max address space unlimited unlimited bytes Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Max file locks unlimited unlimited locks Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Max pending signals 14479 14479 signals Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Max msgqueue size 819200 819200 bytes Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Max nice priority 0 0 Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Max realtime priority 0 0 Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Max realtime timeout unlimited unlimited us Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Core pattern: |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h Sep 28 21:58:41 backend03.incandescent.tech docker[194177]: Kernel version: Linux version 5.14.0-284.11.1.el9_2.x86_64 (mockbuild@x64-builder01.almalinux.org) (gcc (GCC) 11.3.1 20221121 (Red Hat 11.3.1-4), GNU ld version 2.35.2-37.el9) #1 SMP PREEMPT_DYNAMIC Tue May 9 05:49:00 EDT 2023 Sep 28 21:58:42 backend03.incandescent.tech systemd[1]: mariadbgalera.service: Main process exited, code=exited, status=139/n/a
I’ll try tomorrow again using Ubuntu 22.04 which is what my backend cluster currently runs and we’ll see what happens. But I hope scylla behaves better. I’m a-ok with replace-node and all that but these crashes? RDBMS clustered is a pain.
Addendum 1:
Scylla is nice.
root@backend02:~# scylla nodetool status Datacenter: svealiden ===================== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 192.168.2.81 4.64 MB 1 ? 6be20c62-a7f6-41b9-8924-1de608b5fb49 one UN 192.168.2.82 4.67 MB 1 ? c4e0631b-a282-4bd8-866b-aa2f15877f1c one UN 192.168.2.83 4.64 MB 1 ? 5a902584-96a2-4a0e-9e79-77ddcaa1f62f one
From backend03:
[root@backend03 ~]# scylla nodetool status Datacenter: svealiden ===================== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 192.168.2.81 ? 1 ? 6be20c62-a7f6-41b9-8924-1de608b5fb49 one UN 192.168.2.82 ? 1 ? c4e0631b-a282-4bd8-866b-aa2f15877f1c one UN 192.168.2.83 6.03 MB 1 ? 365a7f90-1626-46ff-86e2-0d2ebdf3d762 one Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless [root@backend03 ~]# scylla nodetool status Datacenter: svealiden ===================== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 192.168.2.81 ? 1 ? 6be20c62-a7f6-41b9-8924-1de608b5fb49 one UN 192.168.2.82 4.71 MB 1 ? c4e0631b-a282-4bd8-866b-aa2f15877f1c one UN 192.168.2.83 6.03 MB 1 ? 365a7f90-1626-46ff-86e2-0d2ebdf3d762 one Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless [root@backend03 ~]# scylla nodetool status Datacenter: svealiden ===================== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 192.168.2.81 4.68 MB 1 ? 6be20c62-a7f6-41b9-8924-1de608b5fb49 one UN 192.168.2.82 4.71 MB 1 ? c4e0631b-a282-4bd8-866b-aa2f15877f1c one UN 192.168.2.83 6.03 MB 1 ? 365a7f90-1626-46ff-86e2-0d2ebdf3d762 one Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless
Addendum 2:
I’m running docker with version 10.11.5 now and it works if I use rsync but this causes a crash just like I saw on 11.1.2 so this line has to do in the config:
wsrep_sst_method = mariabackup
In the meantime I tried running Zabbix with PostgreSQL and cockroachdb. But Zabbix need some of the features that cockroachdb hasn’t implemented. So no dice there. If Galera is too unpredictable I can always use a single master and two slaves and use my scripts for switching between them. Standard replication.
Addendum 3:
I think I might have come across a good combo. It passes my exceptional “can be fixed while I’m drunk” test! Sure, it still requires me to patch the wsrep_sst_rsync script but I did that part while I was sober. Getting the galera cluster from 1 to 3 nodes passed the test. Now for some music!