Kubernetes vs Nomad+Consul

So I haven’t been super-pleased with Kubernetes historically. I spent the past… five weeks(?) on setting up Nomad and Consul instead. It’s been frustrating but manageably so. Podman support is okey but not great and I of course had to double-down on using Podman instead of Docker because why not?

Having spent… a day on Kubernetes? Did I start this Kubernetes project today? No, last night according to the snapshots for the virtual machines. Okey, but it still feels like it’s been a week. And this has helped me put words to things. Feedback in Kubernetes is very flimsy. I had to turn on debug-logging to be told that I hadn’t provided a configuration for the load balancer! Turns out the people who made the Helm chart for metallb don’t coordinate things entirely with the people who make metallb so the docs don’t line up with the chart.

Normally that would be a minor problem since a tool like Nomad complains very well when things are wrong. Something as major as “You provided no config ‘metallb'” would have popped up in like three seconds if I tried that with Nomad. But with Kubernetes? Three hours of debugging flannel(which was also not working it turned out), kube-router(which didn’t work initially because flannel CNI-files were left behind), iptables(because who the hell knows?) and namespaces(because I don’t exactly understand how well insulated things are in different namespaces in Kubernetes) solved the issue. In the official docs the configmap is named “config”, in the helm chart it’s “metallb”.

I thought that I would be more okey with Kubernetes yaml-files now that I’ve worked with Nomad and Consul which uses similar concepts of services, ports, container images and so on but no… still awful… Oh, this new Kubernetes setup consists of three flimsy master nodes and three beefier worker nodes, all running as VMs on my main workstation which now has an uptime of…

00:19:53 up 67 days, 1:25, 16 users, load average: 1,33, 1,91, 2,05

That’s right! (I say, expecting people to look impressed)

So anyway… The point is that I have snapshots of the VMs for easy restore and these things aren’t used for anything important or even “I kind of like having X available”. That’s running on my actual servers with Rocky Linux, HAproxy, Consul, Nomad, Minio etc. Solid as a rock. So now Kubernetes can fail all it wants and I can debug it in my own good time. Of course, by the looks of it the limits of my natural life seem to be a bigger restriction in that regard than I would like. Even if I live to be 100 I’m not sure how many Kubernetes configuration issues I can resolve in that time. 7? 8?

If I can get Kubernetes to run the same things I run on Nomad/Consul then I will try to keep it going and make changes to it alongside the “production” setup to see how things fare. I’m going to push Loki+Promtail and Cortex+Prometheus next I think along with Cadvisor and stuff. Hopefully this will resolve some of the feedback issues I’ve been having by centralizing all logs. I guess I’ll switch everything to debug-logging initially to be on the safe side.

Another thing that’s very nice about Nomad and Consul is that it is very nice to use incrementally. You don’t need 300 lines of yaml-data to get the Nomad dashboard up and running…

root@k8s-master01:~/kubernetes_dev/dashboard# wc -l *
17 dashboard_ingress.yaml
314 dashboard_multi.yaml
304 dashboard_multi_default.yaml
20 dashboard_svc.yaml
306 recommended.yaml
961 total

It’s just there automatically… Same with Consul. So at this stage I would recommend anyone who wants to learn Kubernetes to start with Nomad and Consul to get a feel for the concepts. Maybe you will find that Nomad and Consul is all you need(that’s where I am right now), if not it will be a good starting point.

Microsoft 365

JFK gave a speech on the space program saying “we do these things, not because they are easy but because they are hard!” And I like that approach. That’s how you learn things. I’m going to throw myself into the madness that is Kubernetes again soon, having made Nomad and Consul do dependably what K8s thus far has failed to do. So, with that mountain climbed, let’s go back to the steeper one!

But these past two days I’ve moved four emails accounts from Microsoft 365 to an unaffiliated hosting provider. I got the customer’s admin password for Microsofts control panel. That didn’t help much because apparently adding some Microsoft App Validator-thing is now part of the onboarding process. I thought that would be that hard part…

I spent two hours trying to figure out how to get an IMAP client to connect to the accounts… Answer: you have to choose OAuth2 as an authentication method which – surprise, surprise – isn’t exactly default. Tools like IMAPsync for instance do not have them(yet). So why not generate an app password? Because you can only do that if you enable multi-factor authentication. But, the OAuth2-implemention that Microsoft uses is virtually MFA already, since the email client needs to connect to Microsofts webpage and do a little extra login-dance.

I got most of the emails over using the Microsoft Outlook email client. It took a long time but at least I didn’t have to supervise it by jumping around between Microsoft 365, Office 365, Exchange, Exchange Online Policy or Azure Directory during the process. I thought, after an hour and a half that Azure Directory was the answer to why IMAP with username and password wouldn’t work. Silly me…

I saw some interesting stuff in there while I was losing my mind one illogical jump from one control panel to another that is now decprecated so please visit this new one but oh-no we haven’t actually implemented all functionality in that one. No, wait… That’s cPanel API documentation I’m thinking of now.

Anyway, interesting stuff. Like some function called litigation hold. What a wonderfully American idea, to have a button to set your email account to “I’m being sued”. Also there are lots of policy things that you can tweak. If I haven’t made this clear yet: I wouldn’t want to use Microsoft 365 for my own applications. And the reason I wouldn’t want to do that isn’t that it’s expensive or unnecessary. I run a three physical server Ceph-cluster with redundant network switches in STP-mode for my own NAS – unnecessary is my middle name. No, I don’t want to use Microsoft 365 because it is too complex.

That’s why I brought up the whole Nomad-Consul-Ceph-Proxmox-STP-cluster-redundancy thing, because I can handle that! But Microsoft 365? It’s like staring into the abyss. You can spend hours just to figure out that the control panel you’ve been fiddling with doesn’t even do anything because a single switch in a completely different control panel is set to “Off”.

Now, if I for some reason had to manage a large corporation’s computer operations: I think I would choose Microsoft 365. I’m not actually sure it would be that secure or dependable. But I would have lots of buttons to easily indicate who should be able to do what. If someone asked “Can people send work-documents to their own personal gmail accounts?” I could pull up the Azure Directory-EOP-Mail Flow-Connector-rule thing that says “Nope, no work documents to outside domains” and everyone would be happy. It wouldn’t surprise me one bit if sending a zip-file containing work documents attached with a changed file-extension so that it didn’t look like a zip-file would circumvent that kind of stuff but no one would ask about that…

Anyway, we’re now going to tell customers who want to move from Microsoft 365 to our hosting that they are free to do so but that there is no practical way of getting their data over to us. Interesting how Microsoft have such great tools for moving data into their systems from IMAP-servers but won’t provide their customers with the same functionality if they should want to move data out. Where I work we have wacky systems that work the same whether customers are moving in or out. Silly.

The Ukranian situation

It’s going to be hard writing this without swearing but I’ll give it a shot.

Now, what the hell is Russia thinking? I don’t mean that rhetorically, I honestly want to know what they are thinking. Are they actually worried about NATO encroachment? They are… (counting) like 15 years too late there. What kind of chess-player does nothing until his opponent has captured the entire board except one or two squares?

Also, it was kind of a long-shot that any NATO country would move out offensive weapons from Eastern Europe but now that Russia has invaded a country in Eastern Europe, whatever possibilities there were for disarmament is gone. Let’s also consider how long it will take Sweden, Finland and Austria to apply for NATO membership. Clearly Russia takes whatever land NATO doesn’t protect. Genius piece of strategy this!

Maybe they never really cared about NATO encroachment, maybe that was just a cover-story? That would explain a few things, but what is the blinking crikey point of spinning such yarns for the West? To the Russian people the story now is something about protecting Russians in eastern Ukraine, not NATO threats. This choice of justification is a good thing because invading one of the few non-NATO countries in Eastern Europe on the grounds of worrying about NATO makes precious little sense.

Note how I don’t field arguments about rights and morality. We’re talking about sovereign nations. Morality is not on the table. But we can usually conjur up some logic that at least makes sense to the actor, even though we don’t necessarily think it’s very nice. Propping up the Belarussian government? Yeah, I suppose it’s nice to have on ally left in Eastern Europe. Not sure if it’s a solid long-term strategy though. Look at the former Warsaw pact and see how friendly the population is to Russia? Oh, they’re not? Maybe Russia imposing decades of dictatorship on them wasn’t a good way of building bridges? So how happy are the Belarussian people going to be when the regime finally falls?

This is kind of what we’re seeing in the Ukraine as well. Russian hasn’t exactly endeared themselves to the Ukranian people. Less so of course when annexing the Crimea! They really shot themselves in the foot with that. The Ukraine is never going to let Russia keep it’s presence in the Crimea. I see Russia having a naval base there now, sure. But fifteen years from now?

Maybe Russia intends to keep it by force? Then at least they aren’t stupid by way of disregarding the diplomatic relationship with the Ukraine and the opinions of their population. But then it’s stupid in another way, namely that they get themselves bogged down in a guerilla-war with the Ukraine for the Donetsk and Luhansk-regions and the Crimea. Russia covers like 12 time zones, why risk getting caught up in a Vietnam-style war for another few hundred square kilometers of territory that makes Soviet Russia look like an oppulent metropolis of wealth and prosperity?

No, this makes sense in exactly zero ways. Maybe I’m missing something, but it made way more sense when Saddam Hussein invaded Kuwait in 1990, which I think we all remember not working out great. But at least we could understand what made the move attractive to Iraq. Lots of oil, coast-line and a slightly iffy ability to cancel a lot of debt. No, it didn’t work out but the best case scenario made a lot of sense at least. What’s the best case scenario here? Massive economic growth by way of the Luhansk coal fields? Huge boost in tourism in Crimea? Detente with NATO after threatening to use nuclear weapons?

Maybe Putin has bought shares in some Russian weapons producers? That would at least make some sense… Is it immoral to throw Russia into war and sanctions to boost your personal stack of ill-gotten gains? Well, yeah! But that’s not on the table here. Reasoning about what actions of world leaders are moral or immoral is like arguing about “why” the laws of thermodynamics are the way they are. You can argue that until you’re blue in the face, but there’s no real answer. You bring a question(“why?”) into a domain where intent isn’t present. Similarly, condemning Putin for not being nice to the Ukraine is a perfectly valid thought-experiment about how the world should be but in reality it isn’t applicable. Nations are operated based on the crudest of base instincts. “What is best for my country?”

That’s why this annoys me, this isn’t even good for Russia! It’s like the whole Vietnam war or the long-term occupation of Afghanistan(pick any of them you want but I was thinking of the most recent one, not the Russian occupation 1979-1989 or the British one before that or the one before that. You get my point…) That doesn’t even help the west. Vietnam fell to communism and America absolutely trashed the general public’s trust in the government. Well done! And it’s not like it wasn’t obvious from the outset that it wasn’t going to work. It took JFK:s advisors two months tagging along with the South Vietnamese army to figure out why the US couldn’t win and also why the US couldn’t win. Stopping the spread of communism between North and South Vietnam? Nope. Stopping it on the border between Cambodia and Thailand? Easy as pie. The people of Thailand wouldn’t even consider going along with an idea that came from Cambodia, even if it were the most brilliant idea ever, just out of spite.

And I trust I don’t need to explain why occupying Afghanistan to make it into a modern democracy was a bad idea from day 1? The US didn’t even have the excuse of having forgotten how bad it is at those things. They tried it in Somalia and noped out before they so much as got a sun tan. Why would you 10 years later think Afghanistan would be any easier to set right? They have only ever succeeded in one thing, to oppose modernity. We mostly talk about their successful attempts at stopping modernity brought in by foreign powers but even during their long stints of autonomy they have a 100% success rate in staying in the Dark Ages.

So it’s not just Russia that’s shooting itself in the foot, but I fail to see how that helps. It’s not like the US debacles drove a bunch of countries to join the Warsaw pact, the way Russia is now acting as the number 1 salesman for NATO membership, however unwittingly. Not that Europe has done particularly well in this by making itself dependent on Russian natural gas. I guess it’s better than buying electricity produced by Russian nuclear power plants but that is also true of do-it-yourself kidney surgery… A blind man in Papua New Guinea saw this issue coming a mile away!

It’s not like we won’t get things back on track but I doubt a Putin-led Russia will be allowed to play a role in Europe. So we have to wait for him to retire first and hope that the next guy has some modicum of sense. I don’t expect Russia to be altruistic but not being self-destructive seems like a reasonable expectation. As noted earlier, the Ukraine will not let Russia hold the Crimea after this and the EU and NATO are probably not going to be willing to shake hands with Russia as long as the Ukraine cries foul. The EU and NATO don’t actually care about the Ukraine of course, but this is turning out to be a perfect fulcrum with which to lift Russia into a Sarlacc pit.

So new Russian leadership, Russia leaving Ukrainian territory and maybe stop propping up Belarussia? Then Russia can be brought back into the industrialized world. Which is how things are supposed to go. The developed world needs Russia to stop being obtuse.

By the way, what’s China going to do here? Because their participation in international politics is almost exclusively about making sure no one interfers with what China considers “internal matters”. Not sure how happy they are going to be that Russia now gives independence to separatists in a sovereign nation… They hate it when NATO does that(as do I, nationalism hasn’t served Europe super-well) so I don’t think “Russians in other countries will be supported in any struggle to take whatever land they happen to live on” is helping Sino-Russian relations. Then there’s also the whole “Russia invading a neighbor country”-thing which China – as a neighbor country – might not be super-cool with. What a bunch of God-damn brain surgeons…

But is this the end of the great European post-war peace, as NATO and EU spokesmen like to opine? There has been nothing of the sort! Soviet invasions of Hungary and Czechoslovakia, innumerable wars and ethnic cleansing in the Balkans and NATO bombing of Serbia over Kosovo spring to mind off the top of my head… Oh, then we have the Turkish invasion of Cyprus in the 70’s! Forgot about that one… No, this is nothing particularly new as such. It’s just weird to sabotage the trading relationships that Russia depends on for solvency in order to capture territory that NATO looks at and says “Nah, not interested.” Only to demonstrate that:

  • Russia doesn’t like the entire Warzaw pact joining NATO
  • Every country that joined NATO saved themselves from beind invaded by the Russian Federation
  • Any European country not already in NATO should join it quickly, or develop nuclear weapons
  • Russia is very keen on ethnic separatism

Now the first point is fair enough. No one was under the impression that Russia liked that all the countries that Russia held hostage by way of the Soviet union ended up joining NATO(save the Ukraine and Belarussia, countries that spent a large portion of the post-Soviet era being run by Russian puppets). So it is if anything a point that doesn’t need making. The other ones, well I think its pretty clear why the other three points harms the overall Russian agenda… NATO can fire their entire PR-department now. Russia has done their work for them.

And now they’re not even succeeding at the simplest part of the plan: capturing Ukraine. That was supposed to be a 72-hour thing! Now they might be getting desperate. How long before they use thermobaric bomb against Kyiv? That would even make China overtly denounce them for fear of ending up as collateral damage in marvelous the global sanctioning-craze that has seized the world. But it’s the kind of idiocy that we must now expect from them at this stage. Yeah, burn more bridges…

Update 2022-03-03:

The Russian president has told his French counterpart that Russia will successfully demilitarise Ukraine and render it neutral, which he said were his goals there.

BBC News report on a a call between France and Russia 2022-03-03

This is good news! Not that it is in any way a correct representation of what Russia actually had as its aims, but that’s hardly the point. We’re now seeing Russia start to set up a goal-post. If Russia had said it had set out to capture and occupy Ukraine then is could hardly save face when pulling out after X weeks. But by saying that the goal is to “demilitarise” Ukraine it will be much easier for Russia to pull out and claim victory.

If they choose their words carefully they might not even have to lie about achieving “demilitarisation”. Not that they are squeamish about bald-face lies, but if you can mix in some technically true statements in your lies that usually helps. Like if they claim to have “decimated Ukraine’s military hardware”. The word “Decimated” can be interpreted in many ways(“reduction by 10%” or “massive loss” are common uses of the word) and it is true that Ukraine has lost equipment. That will be replaced very shortly by the EU and NATO just as they are sending more and more guns to Ukraine at this very moment, but for a brief moment Russia will have achieved a reduction in Ukrainian military equipment.

This nonsense is much better than the (also nonsense) article published accidentally by a Russian media outlet a few days ago which lauded the Russian victory(ahem…) and their leadership’s decision to raise Russia up again to its former glory of a world superpower and sidelining the West. If Russia had publicly got behind that framing of the war then it would have had little choice but to continue fighting it until Ukraine was at least conquered, if not also occupied.

“Render Ukraine neutral” of course is… uhm… not going to happen, to put it mildly. It is right now the steel-tipped shoe kicking Russia in the shins with active support from the West. The EU and NATO are seizing the opportunity to test and demonstrate technology which hasn’t previously been used against the military forces of an industrialised nation. This will continue and Ukraine will probably be the most dedicated and outspoken opponent to Russia for the coming decades. It may in fact be that the EU needs to get Ukraine to chill a bit. Neutrality is the one thing we can carefully exclude as a possibility in Ukraine’s future.

I think we should be prepared for Russia stationing nuclear weapons in Belarussia in the near future however. It’s exceedingly cheap since the nukes already exist and moving them a few hundred kilometers west is easy enough. Based on the NATO response to the Russian invasion of Ukraine they will have to try very hard to make it look like they aren’t in a worse spot than they were before. It’s not like Russia will have a lot of spare cash to build up their military any time soon. It doesn’t really matter where the nukes are placed.

If anything Belarussia is a less strategically advantageous placement. Imagine in ten years if there is a coup in Belarussia and the new government invites Ukrainian forces in to suppress military units loyal to the old regime. Russian nukes could end up being seized in the process. Russia would certainly invade Belarussia to get them back at that point but it is usually the case that countries try to make their nukes as difficult to swipe as possible. Stationing some of them in the last European Warsaw-pact member still aligned to Russia should be seen as quite precarious. Russia will probably still do it just to make it seem like they are bravely countering NATO’s every move, even though it as a practical matter actually weakens them.

Update 2022-03-04:

President Putin has warned those opposing Russia’s actions in Ukraine “not to exacerbate the situation” by imposing more restrictions on his country.

BBC News feed 2022-03-04

I laughed long and hard at that one. Not going to well huh?

If he has any sense he will soon proclaim victory and an end to the “special military operation” which of course has gone better than planned. Based on these latest statements I think we can expect to hear demands from Russia to roll back sanctions within days. Some countries will agree, like China and India who think they can get away with it and are almost certainly correct that they can. But South Korea, Taiwan, North America in its entirety and Europe(barring Belarus of course) will say “Uhm, why don’t you come back to us when you’ve left Crimea and the Donbass?”

When Russia leaves the Crimean peninsula I think Ukraine will relinquish the Donbass and some sanctions will be lifted but until reparations are paid sanctions will persist. The last sanctions will probably only be lifted once Russia engages in some serious detente. Nuclear disarmament wouldn’t be bad but the US would have to follow suit. Free and fair elections would also make a signifcant difference I think. But of course just the “get Russia out of Crimea”-thing is going to take sooo long.

By the way, since the Russian invasion of Ukraine three former Warzaw-pact members have applied for EU membership: Ukraine, Georgia and Moldova. Ukraine is going to make it in once it reaches a deal with Russia on where the borders should be. Georgia is not since it is too far away. Georgia joining NATO might happen but not the EU. Moldova is so dirt-poor they had to introduce legislation to stop people selling their kidneys. So EU membership is not really on the cards for Moldova in my lifetime.

Also NATO announced Sweden’s and Finland’s participation in ongoing talks regarding the current situation. Countering NATO expansion seems to be going real well.

China, Cuba, or Venezuela can usually be relied upon to back Russia – this time they abstained.

BBC News feed 2022-03-04 about UN vote

Meanwhile in Russia the BBC, Deutsche Welle, Twitter and Facebook are blocked by the regime. I was about to propose that Facebook shut Russia out from their end but okey, if Russia wants to swing the axe that’s fine too. When’s the gas-tap going to be shut off and who will it be that makes the move? Customer or supplier?

2022-03-11

[China’s representative to the UN Zhang Jun] says it would encourage any country that has not yet destroyed their stockpiles of chemical weapons to do so as soon as possible.

BBC News feed 2022-03-11

I can sort of imagine how this played at the Kremlin:

– President Putin, the Chinese just gave a speech at the UN in our favor!
– Let me read that! … Uhm, so they said all countries with chemical weapons should destroy them as soon as possible?
– Yes, sir! Referring to Ukraine!
– But… We in Russia actually have chemical weapons. Ukraine doesn’t. That’s just a yarn we’ve spun to justify our invasion. So any call to destroy the world’s chemical weapons is going to have zero effect on Ukraine and a significant impact on Russia. I fail to see the silver lining here…
– Oh…

I’m not too impressed by the US refusal to furnish Ukraine with Mig jets provided by Poland because it makes the US look involved… The EU is sending tonnes of weapons with the expressed intention for them to be used against Russian forces. The world’s only real remaining superpower can probably afford to be seen acting a middle-man in the supply of jets from Poland to Ukraine. I’m not accusing them of being unsupportive, merely overly cautious. What’s Russia going to do? Invade Poland? They seem to have their hands full with Ukraine, I don’t think they’re going to fair much better against a member of NATO and the EU with… shall we say “a lengthy history of opposition to Russia” to be diplomatic? Russia might at most send a few cruise missiles at Polish air bases.

At that point of course it is pretty much open seasons on the Russian military. We might actually see a NATO or EU-enforced no-fly-zone over Ukraine and a blockade of Russian ports. I’m not saying that wouldn’t be an escalation, but it would be an escalation to Russia’s detriment. That an attack by Russia against a NATO member doesn’t not lead to a full-blown NATO retaliatory response but merely a “Russia is currently grounded. Aircraft or ships leaving Russia will be seized or destroyed.”-sort of tactic would signal that Russia isn’t taken all that seriously.

And Russia shouldn’t be seen as a particularly significant threat to the rest of the world. It’s down to one ally hostage in Europe – Belarus – a country that one day hopes it could reach the stratospheric economic success of East Germany… It’s one coup away from joining the “we who are sort of upset with Russia for it’s lengthy imposition of dictatorships on Eastern Europe”-group. If it weren’t for Russia still having nuclear weapons they would be a complete non-issue. But nuclear weapons aren’t credible weapons for anything other than a retaliation against a full-blown enemy invasion of your homeland. Russia trying rattle their nuclear saber if they don’t get to add Ukraine to their dwindling list of hostages implies that not even they think their conventional armed forces are very threatening…

2022-03-21

Russia’s Foreign Minister Sergei Lavrov has accused the US of restraining Kyiv from agreeing to Russian demands but did not appear to provide evidence

BBC News Feed

Right…

2022-03-24

While Biden spoke to the press Russia issued criticism against the western countries giving weapons to Ukraine. That support prolongs and intensifies the conflict in Ukraine, the Russian ministry of foreign affairs said in a statement.

SVT news article (in Swedish)

Well, every war would be short and peaceful if only one side had any weapons. The shortest possible war between Russia and Ukraine however would have been the one that Russia chose not to start. And that ship hasn’t just sailed – it has sailed, reached the East Indies and returned to port laden with spice!

Would Putin just please declare victory – the most complete and glorious victory throughout human history! – and pull back out. Stay in Crimea and Donbass of course! It will be the excuse for sanctions for many years to come.

If the West would just stop buying their god damn natural gas we the Russians would soon run out of money. With any luck the latest demand by Putin that natural gas from “hostile countries” be paid for in roubles will be denied by western countries, thus causing an end to the sale that way. Obviously that will lead to significant issues for Western Europe but if you make yourself dependent on Russia for critical energy supplies you gamble big. And then bad things happen. Countries like the Netherlands and Germany have benefited greatly from furnishing Russia with lots of money with which to build Tsar Putin’s new Russia in exchange for cheap energy and now they need to suffer a commensurate hardship.

Then maybe we can stop this nonsense of scaling down our own production of necessary fuels on the grounds that as long as nothing goes wrong we can typically get what we need.

2022-03-27

Would someone please get the dottering old man away from the microphone? I don’t disagree with Biden’s observation that Putin can’t be allowed to stay in power but being president in a democratic country doesn’t mean you get to dream up foreign policy like some freestyle rap battle. Not that the rest of the administration is that much better. What you do in these cases is double-down on calling for Putin’s removal, rather than let everyone know that the US president is a poorly controlled puppet. It’s like in that movie The Sum of all Fears where the Russian president claims to have ordered the use of chemical weapons – even though it was done by a general in violation of orders – because it’s better to be seen as a monster than someone who isn’t in control.

Zelensky also needs to do some more thinking before he talks. He is in no position to demand anything of NATO. He shouldn’t confuse the help he’s getting with anyone doing it out of some obligation or even with other countries caring about Ukraine. He gets precisely the equipment that helps Europe when it lands in Ukraine. Troops from EU countries in Ukraine or NATO-operated aircraft imposing a no-fly-zone? Nope, that doesn’t help Europe or NATO so none of that.

Maybe this has been explained to him because he now claims to be willing to discuss the status of the Donbass region. It’s not tenable to keep it as part of Ukraine no matter what Russia does. It’s full of Russians! Crimea though can’t be ceded or Ukraine loses lots of natural gas in the Black Sea. It was idiotic of Putin to annex it in the first place. It depends on water from a canal diverting water from a river in Ukraine. Want to guess how much water Ukraine has been letting through since 2014?

Now, I see no problem with leaving the issue of the Crimean peninsula entirely out a cease-fire agreement. Ukraine doesn’t need to use military force to retake Crimea, they just need to let sanctions brew for a while. So no point in forcing Russia to relinquish it now. Just don’t address the issue right now.

I wonder how the West will manage the sanctions at that point. Obviously the first sanctions to go will be the ones that are causing problems for Europe, like the oil embargo. But what about releasing the enormous mountains of foreign currency belonging to the Russian Central Bank? If the West refuses to release that money I wouldn’t be surprised if Russia announces their intention to re-open hostilities with Ukraine at which point Ukraine might also call for the money to be released. Tricky situation. I’d say “keep the money frozen”. “Europe and America doesn’t impose the sanctions on behalf of Ukraine and doesn’t remove them because a country threatens to violate international law.” would be a nice way of putting it.

For these reasons a cease-fire in Ukraine would be a very good period of time during which to bolster Ukraine’s armed forces. Russia is also going to regroup and rearm and a continuation of the war is far from impossible. More fighter jets, plenty of armored vehicles, artillery and well-trained infantry along with plenty of bunkers hidden in the vast plains. Ukraine did well this time around despite Europe reacting quite late to russian intentions to invade, probably thanks to Russia acting out the script to a Three Stooges movie and not a well-crafted military plan… But we shouldn’t assume Russia won’t correct their mistake.

Hopefully stage 2 of the war won’t be on the cards for at least a year. In that time sanctions can curtail what Russia considers to be feasible. As Germany moves away from russian gas it’s going to sting quite a lot at the Kremlin even if other sanctions are removed.

Nice of India to double their purchase of coal from Russia by the way. I find myself hoping for one of those riots the Twitterverse carries out whenever they hear someone with a conflicting opinion. Because India makes it quite clear that it chooses domestic expediency over the sovereignty of other nations. A boycott of Indian goods and services would be most appropriate. Then maybe India won’t need to buy so much coal from Russia because their factories will not have enough customers to warrant a significant use of elecitricity. A man can dream…

2022-03-31

Dang it. Starting to look like Russia isn’t going to follow through on their threat to cut off the gas-supply to Europe. Instead they will accept payment in euros and dollars as per usual and then buy roubles with that money themselves. Which they could have done all along… As it stands, nothing will put more pressure on Russia now than an end to gas exports, which is why it was a surprising development that they started talking tough about the currency used.

Because we see very clearly that central and eastern Europe isn’t going to stop buying Russian gas for a good couple of years, no matter what Russia does. Maybe if Russia attacked a NATO or EU country? Maybe then the flow of gas going west and the flow of money going east would stop? Maaaaybe… I can’t help but think of all the criticism lodged against those same countries against Switzerland for doing business with Germany during the second world war. Seems just vaguely hypocritical.

2022-04-02

Could the British and the Americans please stop telling China to not help Russia? Is that even a remote possibility? It’s like they’re trying to make China side with Russia… China doesn’t respond well to these kinds of threats. They’re more likely than not to do something that isn’t in their best interests just to show the world who’s the boss of China.

Praise China for their dedication to a peaceful resolution to the Ukraine crisis! Not because they’re actually helping of course… But China is unlikely to change course and support Russia just to villify themselves. If however they have to choose between looking weak and looking villainous, they’ll choose being a villain.

2022-04-05

The image from 19 March, first reported by the New York Times and confirmed by the BBC, directly contradicts Russian Foreign Minister Sergei Lavrov’s claim that footage of bodies in Bucha, that has emerged in recent days, was “staged” after the Russians withdrew.

https://www.bbc.com/news/60981238

Are you serious? If Russia had said “Terrible thing what happened to those Ukrainian civilians, typical Ukrainian nationalists murdering innocent people” I would have reserved judgement on who was behind it. Not so much because the Russian argument would have made any sense but the fog of war makes it very hard to know who did what and when. If Russia now claims that the dead civilians – in Russian occupied territories, who ended up in a mass grave with close-contact gunshots wounds – were not in fact actually murdered by anyone then even the most generous interpretation of events makes the Russian military look very guilty.

And again we find ourselves asking Why? Not some deep philosphical Why but a very pragmatic Why. Why kill civilians intentionally? Why try to dismiss accusations in a way that makes you look more guilty than if you had said “We must investigate these claims to ascertain if Russian forces were involved”? I.e. you look more guilty this way than if you openly entertained the notion that Russia was guilty of those acts.

Any plans to make the Russian military look unstoppable or even just competent failed pretty early on. Already by the second week people were starting to wonder what the hell the Russians were doing. But couldn’t they at least have maintained some air of… not being Nazi Einsatzgruppen filling up mass-graves in eastern Europe with civilians? Or are they going to use the same argument now as when they invaded Ukraine initially? “Well, the US invaded Iraq, so we can invade Ukraine! Also, the Nazis massacred civilians in Ukraine, and so can we.” Neither of which seems overly convincing when it comes to making the rest of Europe say “Oh, well then that’s all right! For a moment there we almost thought we had good reason to worry about our own safety”.

Russian sphere of influence

Is it just me or is Russia really bad at playing the game of international politics? Like, they want to keep NATO from placing military forces in countries that border on Russia. Cool beans! I see why they want that. But why then did they invade the Ukraine in 2014 and occupy the Crimean peninsula? Under what conceivable projections did they think invading a neighboring country would lead to anything other than a major influx of Russia-bordering countries to the NATO application procedure? Heck, there were talks about Georgia joining NATO for a while! And now they might be invading the Ukraine again, because they don’t like NATO expansion? It makes precious little sense.

Maybe this is just a side-show? Russia really isn’t threatened by NATO forces on their border. Or more accurately: NATO forces pose no greater threat to Russia on the Latvian-Russian border than they do at the French-German border. Europe is a very small place when war between industrialized nations is on the cards. It’s a shorter distance between Poland and the Belarussian-Russian border than the distance from the Belarussian-Russian border and Moscow. I’m arguing that these distances aren’t very relevant for ground-forces but of course jet-bombers and ICBMs barely notice those distances so they are even less relevant there.

So why kick up a fuss? They have a strong foothold in the Eastern Ukraine. Maybe they are under pressure? Why care about the Donbass? It’s not exactly full of gold mines and semiconductor factories… Coal and Soviet-era heavy industry is all. I thought Russia had enough of that already? Access to Crimea? No, that doesn’t track either, it doesn’t stretch far enough west. There’s always internal politics to keep in mind of course. Slavic unity, independence from the West… But people can’t eat those things and picking a fight with the West threatens trade relations that do support Russia economically.

So what are they up to?

MariaDB master/slave monitor

I have some scripts to do switchover and failover between two MariaDB-instances that I use for everything from PowerDNS backend, Grafana configuration data, Zabbix, my own backup program and some more things I can’t remember off the top of my head. I used to use Pacemaker for that but encountered some strange behavior and chose to make my own so that I could follow the logic of the process, something which is difficult in Pacemaker.

The logic here is simple. mutex01 is the intended master and mutex02 is the intended slave. mutex02 runs a script for failover that fences mutex01 if it stops working correctly and then takes over the role of master. mutex01 does not run a similar script or we could end up in a fencing bonanza where both nodes keep killing each other. This could be handled by using three nodes and voting but that adds complexity that I don’t need.

(The names comes from mutual exclusion which is abbreviated mutex. In a MariaDB master/slave setup there can be only one master at any one time. So mutual exclusion must be maintained. This is different from multimaster-systems like MariaDB Galera(which I used to run and yes, these are nested parentheses) and things like Elasticsearch and MongoDB which I run on three servers called multimaster01, multimaster02 and multimaster03.)

The same set of scripts allows for switchover, using some files as flags to indicate that a change is being processed to keep the failover script for instance from going bananas. But this is kind of… not very easy to keep track of. I frequently forget to reset the flags which keeps the failover script from working. Enter my Python/Flask app that exports information about MariaDB and the flags of the scripts for each node and a React-frontend to view the data:

Each panel is its own React component which is given a hostname and intended role as arguments:

Based on this information the data can be highlighted based on how it conforms to the expected state of each server. For instance, after a failover has occurred we see that mutex02 is no longer in read-only mode which mutex01 is in read-only mode and is replicating data from mutex02. This is entirely correct after failover has occurred but it’s still a matter of the system being in a degraded state that I should inspect and fix.

Hmm, I should probably set red markings on Slave IO Running = No and Slave SQL Running = No for the slave node. (Making mental note that I will soon forget)

Anyway, we see that the failover-flag has been set on mutex02 to prevent it from doing any monitoring of mutex01, fencing or even writing to the failover-log(as indicated by the age ofthe log at the bottom of the mutex02-column). I reset the failover flag on mutex02 and check that seconds_behind_master was OK(it is marked as red if it is greater than the time the switchover-script is willing to wait for the nodes to be in sync before giving up) and then ran the switchover_to_here.sh-script on mutex01.

After clearing the maintenance flag on both nodes failover_log_age dropped and stayed low(the failover script runs for like 50 seconds starting once a minute and keeps outputting data so the timestamp of the failover-log is typically no more than 10 seconds during normal operations.

We can see how mutex01(green) stopped processing queries and how mutex02(yellow) took over and then how the switchover restored things in the Grafana graphs of Prometheus data:

All in all I’m pretty pleased with how this worked out. I may publish the failover/switchover scripts and possibly also the Python/Flask-stuff. The React app however is way too hacky. Example:

class ServerStatus extends React.Component {
  constructor(props) {
    super(props)
    this.state = {
      error: null,
      isLoaded: false,
      failover_flag: null,
      gtid_binlog_pos: null,
      exec_master_log_pos: null,
      gtid_position: null,
      gtid_slave_pos: null,
      last_io_errno: null,
      last_sql_errno: null,
      maintenance_flag: null,
      position_type: null,
      read_only: null,
      relay_log_space: null,
      seconds_behind_master: null,
      slave_io_running: null,
      slave_io_state: null,
      slave_sql_running: null,
      slave_transactional_groups: null,
      failover_log_age: null
    }
  }
// {"exec_master_log_pos":5274471,"failover_flag":0,"gtid_binlog_pos":"1-11-40976726","gtid_position":"1-12-38358929",
// "gtid_slave_pos":"1-12-38358929","last_io_errno":0,"last_sql_errno":0,"maintenance_flag":0,"position_type":"Current_Pos",
// "read_only":0,"relay_log_space":309452,"seconds_behind_master":null,"slave_io_running":"No","slave_io_state":"",
// "slave_sql_running":"No","slave_transactional_groups":455}
  fetchData = () => {
    fetch('http://' + this.props.servername + '.svealiden.se:5000/')
      .then((res) => res.json())
      .then(
        (result) => {
          //alert("result:" + JSON.stringify(result));
          this.setState({
            isLoaded: true,
            failover_flag: result['failover_flag'],
            gtid_binlog_pos: result['gtid_binlog_pos'],
            exec_master_log_pos: result['exec_master_log_pos'],
            gtid_position: result['gtid_position'],
            gtid_slave_pos: result['gtid_slave_pos'],
            last_io_errno: result['last_io_errno'],
            last_sql_errno: result['last_sql_errno'],
            maintenance_flag: result['maintenance_flag'],
            position_type: result['position_type'],

I’m pretty sure this isn’t how you’re supposed to do it… But it beats not having continuously updated information on the state of a MariaDB pair with halfway complex rules on what is correct and what isn’t correct.

Note that since I started using this script for my actual “workloads” I’ve had like four or five failures that required me to resynchronize nodes after failover and even switchover! I struggled for hours to deal with extraneous transactions that messed up the GTID-sequences only to finally learn that if you have tables with Engine=MEMORY you always get “DELETE FROM tablename” added at startup of MariaDB. That adds an extra local operation that gets master and slave out of sync. So I’m not using MEMORY tables any more. They were only there to avoid write-tests straining my hard drives which was kind of silly anyway.

But now things seem to have settled down and not even yesterday’s failover required my to resynchronize nodes. That’s otherwise something you should expect, that failover leaves you with the old master node being out of sync with the slave that has now taken over the master-role temporarily. Switchover is different as we are just moving the master role between two functioning systems so we can bail out of the process if something doesn’t work out correctly, keeping the master as master. Example from my script for failover:

  echo "Starting failover from $OTHERNODE to $THISNODE. $(date +%s)"
  echo "1" > "/root/failovermode"
  # Need to demote master if possible
  my
  if OTHERNODE_RO_MODE=$(mysql --connect-timeout=2 -N -s -B -h "$OTHERNODE" -B -N -e "SELECT @@GLOBAL.read_only;");
  then
    # If that worked we can check the return value
    if [ "$OTHERNODE_RO_MODE" = "1" ];
    then
      # We can become master without an issue in an emergency, 
      # but we should ideally wait for this node to catch up.
      echo "Other node is read_only now. Waiting for catchup. $(date +%s)"
      wait_for_catchup
      # Don't really care how we got out of wait for catchup, it's time to become master.
      promote_local
    elif [ "$OTHERNODE_RO_MODE" = "0" ];
    then
      echo "Failed to set $OTHERNODE to set read_only=1. Fencing!"
      if fence_other_node;
      then
        # Can't catch up since we don't know the master GTID
        echo "Fenced other node successfully. $(date +%s)"
        promote_local
        exit 0;
      else
        echo "Failed to fence master. Can't proceed. $(date +%s)"
        exit 1;
      fi # End fence_other_node check
    else
      echo "We received neither read_only=1 _or_ read_only=0. Shouldn't be possible."
    fi # End second OTHERNODE check run after sending read_only=1
  else
    echo "Couldn't check if $OTHERNODE is read_only. Must fence! $(date +%s)"
    if fence_other_node;
    then
      # Can't catch up since we don't know the master GTID
      echo "Fenced other node successfully. $(date +%s)"
      promote_local
      exit 0;
    else
      echo "Failed to fence master. Can't proceed. $(date +%s)"
      exit 1;
    fi # End fence_other_node check
  fi # End of check of return status from other node read_only-status
  exit 1

As you can see in the first nested if-statements we can wait for some time for the slave to catch up to the old master before promoting itself but we’re not going to wait indefinitely. We only failover if the master is slightly wonky so we can’t assume that the slave will always catch up(maybe the master sent back a bad GTID?). Same thing if we can’t even talk to the old master when failing over, we have no choice but to fence it. We wouldn’t know which GTID is the latest so saying “Let the old slave wait until it reaches GTID X before becoming master” makes no sense.

This has been a test of the audience’s patience.

Ceph quorom shenanigans

God I love the word “shenanigans”. I can’t tell you how many files on my work laptop include that word.

Anway… So I moved some VMs around to upgrade pve2 to get rid of an annoying issue with UEFI where the console won’t work. But I overloaded pve1 which promptly went and died. Well, rebooted… And then mon.pve1 couldn’t join the cluster again. I figured I could take that opportunity to upgrade Ceph and the whole pve1-server actually.

No dice. Okey, delete mon.pve1 and add again? No dice. Proxmox’ Ceph tools don’t like things being wonky. After (looking at the time) an hour and a half? Roughly I’ve deleted and readded mon.pve1 several times and finally got it to work by also running “ceph mon add pve1 192.168.1.21” on one of the quorate nodes.

That might seem obvious but can you find that in the guide for adding and removing mons? https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/

I can’t. Also, why is it ceph mon add pve1 192.168.1.21 and not ceph mon add mon.pve1 192.168.1.21? The mon-part is included everywhere else. But this is why I run Ceph at home. To learn this stuff when it’s only me getting annoyed. My bosses are running Ceph in production the poor dears. Obviously not on my rinky-dink setup crammed into a cupboard but still… Nerve-wracking stuff.

Well, I guess I’d better get to upgrading pve2 then… Slightly behind schedule.

MariaDB slave lag

Just a few notes on MariaDB replication lag. My own backup program is an interesting generator of database traffic as we can see below:

But the slaves catch up in a very jerky fashion:

On the face of it both nodes suddenly fell 1800 seconds behind in a matter of 60 seconds. I argue this would only be possible if 1800 seconds of updates were suddenly sent to or acknowledged by the slaves. The sending theory isn’t entirely unreasonable based on this graph:

Commits on the master are relatively evenly spaced:

And Inserts spread out over the whole intensive period:

I suspect this sudden lag increase is a result of changes being grouped together in “replication transactions”:

Global transaction ID introduces a new event attached to each event group in the binlog. (An event group is a collection of events that are always applied as a unit. They are best thought of as a “transaction”,[…]

Let’s check the relay log on mutex02 to see if this intuition is correct. Beginning of relevant segment:

#211215  2:31:06 server id 11  end_log_pos 674282324 CRC32 0xddf8eb3a   GTID 1-11-35599776 trans
/*!100001 SET @@session.gtid_seq_no=35599776*//*!*/;
START TRANSACTION
/*!*/;
# at 674282625
#211215  2:01:54 server id 11  end_log_pos 674282356 CRC32 0x8e673045   Intvar
SET INSERT_ID=22263313/*!*/;
# at 674282657
#211215  2:01:54 server id 11  end_log_pos 674282679 CRC32 0x9c098efd   Query   thread_id=517313        exec_time=0     error_code=0    xid=0
use `backuptool`/*!*/;
SET TIMESTAMP=1639530114/*!*/;
SET @@session.sql_mode=1411383304/*!*/;
/*!\C utf8mb4 *//*!*/;
SET @@session.character_set_client=224,@@session.collation_connection=224,@@session.collation_server=8/*!*/;
insert into FileObservation (hashsum, indexJob_id, mtime, path, size) values ('e182c2a36d73098ca92aed5a39206de151190a047befb14d2eb9e7992ea8e324', 284, '2018-06-08 22:21:16.638', '/srv/storage/Backup/2018-06-08-20-img-win7-laptop/Info-dmi.txt', 21828)

Ending with:

SET INSERT_ID=22458203/*!*/;
# at 761931263
#211215  2:31:05 server id 11  end_log_pos 761931294 CRC32 0x54704ba3   Query   thread_id=517313        exec_time=0     error_code=0    xid=0
SET TIMESTAMP=1639531865/*!*/;
insert into FileObservation (hashsum, indexJob_id, mtime, path, size) values ('e9b3dc7dac6e9f8098444a5a57cb55ac9e97b20162924cda9d292b10e6949482', 284, '202
1-12-14 08:28:00.23', '/srv/storage/Backup/Lenovo/Path/LENOVO/Configuration/Catalog1.edb', 23076864)
/*!*/;
# at 761931595
#211215  2:31:05 server id 11  end_log_pos 761931326 CRC32 0x584a7652   Intvar
SET INSERT_ID=22458204/*!*/;
# at 761931627
#211215  2:31:05 server id 11  end_log_pos 761931659 CRC32 0x6a9c8f8a   Query   thread_id=517313        exec_time=0     error_code=0    xid=0
SET TIMESTAMP=1639531865/*!*/;
insert into FileObservation (hashsum, indexJob_id, mtime, path, size) values ('84be690c4ff5aaa07adc052b15e814598ba4aad57ff819f58f34ee2e8d61b8a5', 284, '202
1-12-14 08:30:58.372', '/srv/storage/Backup/Lenovo/Path/LENOVO/Configuration/Catalog2.edb', 23076864)
/*!*/;
# at 761931960
#211215  2:31:06 server id 11  end_log_pos 761931690 CRC32 0x98e12680   Xid = 27234912
COMMIT/*!*/;
# at 761931991
#211215  2:31:06 server id 11  end_log_pos 761931734 CRC32 0x90f792f6   GTID 1-11-35599777 cid=27722058 trans
/*!100001 SET @@session.gtid_seq_no=35599777*//*!*/;

So it seems like 1-11-35599776 stretches from 02:01:54 to 2:31:06 and it’s somewhat reasonable for mutex02 to suddenly report a lag of 30 minutes. I wonder what that means for actual data transfer. Could I query intermediate results from 1-11-35599776 before 02:31? :thinking_face:

Bonus:

The tiny slave lag caused on the localbackup node when this is run:

 mysql -e "STOP SLAVE;" && sleep 8 && cd $SCRIPT_PATH && source bin/activate && python snapshots.py hourly >> hourly.log && mysql -e "START SLAVE;"

It’s a really hacky way to let the localbackup process any processing of the relay log before making a Btrfs snapshot. Seems to work. Technically you can make snapshots while MariaDB is running full tilt but this seems a bit nicer. Have had some very rare lockups of unknown origin on these kinds of Btrfs snapshot-nodes for database backups.

ISP issues

Bahnhof have had a bad couple of weeks around here. Two multi-hour outages and now packetloss has gone crazy.

I suspect however that it’s not their fault. They don’t own the fiber links between each property and the switching stations. Well, I don’t mind packet loss that much when I’m not working. If this keeps up I’ll have to switch over to the 4G backup manually before I start my shift on telephone support. YouTube is very PL-tolerant but VoIP? Not so much. It’s hard enough understanding what people are saying without syllables going missing…

Higher ping on 4G of course but not so high that it interferes with phone calls.

Pacemaker failure

Dang it… Pacemaker wigged out during pacemaker03:s Btrfs snapshots run. Well, it was probably the cleanup job that clears out old snapshots that did it. Yeah, I know. “Don’t run OLTP workloads on CoW-filesystems”

But I’m running like 30 transactions per second with short bursts of a few hundred per second. CoW works fine. But Pacemaker wigged out for some reason and seems to have fiddled with the GTID:

ov 29 04:01:05.616 pacemaker03.svealiden.se pacemaker-based     [1201] (cib_process_request)   info: Completed cib_modify operation for section status: OK (rc=0, origin=pacemaker03.svealiden.se/crmd/49, version=0.173.120)
Nov 29 04:01:05.619 pacemaker03.svealiden.se pacemaker-controld  [1206] (process_lrm_event)     info: Result of monitor operation for mariadb_server on pacemaker03.svealiden.se: Cancelled | call=41 key=mariadb_server_monitor_20000 confirmed=true
Nov 29 04:01:10.624 pacemaker03.svealiden.se pacemaker-based     [1201] (cib_process_ping)      info: Reporting our current digest to pacemaker01.svealiden.se: bfbe319be87955b6424bb9b041600d5e for 0.173.120 (0x5646a9a5eac0 0)
Nov 29 04:01:27.325 pacemaker03.svealiden.se pacemaker-controld  [1206] (throttle_check_thresholds)     notice: High CPU load detected: 4.050000
Nov 29 04:01:27.325 pacemaker03.svealiden.se pacemaker-controld  [1206] (throttle_send_command)         info: New throttle mode: high load (was medium)
Nov 29 04:01:57.327 pacemaker03.svealiden.se pacemaker-controld  [1206] (throttle_check_thresholds)     notice: High CPU load detected: 4.060000
Nov 29 04:02:03  mariadb(mariadb_server)[1456545]:    INFO: MySQL stopped
Nov 29 04:02:03.343 pacemaker03.svealiden.se pacemaker-execd     [1203] (log_finished)  info: mariadb_server stop (call 44, PID 1456545) exited with status 0 (execution time 57728ms, queue time 0ms)
Nov 29 04:02:03.346 pacemaker03.svealiden.se pacemaker-controld  [1206] (process_lrm_event)     notice: Result of stop operation for mariadb_server on pacemaker03.svealiden.se: ok | rc=0 call=44 key=mariadb_server_stop_0 confirmed=true cib-update=50
Nov 29 04:02:03.346 pacemaker03.svealiden.se pacemaker-based     [1201] (cib_process_request)   info: Forwarding cib_modify operation for section status to all (origin=local/crmd/50)
Nov 29 04:02:03.351 pacemaker03.svealiden.se pacemaker-based     [1201] (cib_perform_op)        info: Diff: --- 0.173.120 2
Nov 29 04:02:03.351 pacemaker03.svealiden.se pacemaker-based     [1201] (cib_perform_op)        info: Diff: +++ 0.173.121 (null)
Nov 29 04:02:03.351 pacemaker03.svealiden.se pacemaker-based     [1201] (cib_perform_op)        info: +  /cib:  @num_updates=121
Nov 29 04:02:03.351 pacemaker03.svealiden.se pacemaker-based     [1201] (cib_perform_op)        info: +  /cib/status/node_state[@id='3']/lrm[@id='3']/lrm_resources/lrm_resource[@id='mariadb_server']/lrm_rsc_op[@id='mariadb_server_last_0']:  @transition-magic=0:0;5:3308:0:7c487611-27b4-49ce-b931-c548d64ecc98, @call-id=44, @rc-code=0, @op-status=0, @exec-time=57728
Nov 29 04:02:03.352 pacemaker03.svealiden.se pacemaker-based     [1201] (cib_process_request)   info: Completed cib_modify operation for section status: OK (rc=0, origin=pacemaker03.svealiden.se/crmd/50, version=0.173.121)
Nov 29 04:02:03.890 pacemaker03.svealiden.se pacemaker-controld  [1206] (do_lrm_rsc_op)         notice: Requesting local execution of start operation for mariadb_server on pacemaker03.svealiden.se | transition_key=16:3308:0:7c487611-27b4-49ce-b931-c548d64ecc98 op_key=mariadb_server_start_0
Nov 29 04:02:03.891 pacemaker03.svealiden.se pacemaker-execd     [1203] (log_execute)   info: executing - rsc:mariadb_server action:start call_id:45
Nov 29 04:02:03.891 pacemaker03.svealiden.se pacemaker-based     [1201] (cib_process_request)   info: Forwarding cib_modify operation for section status to all (origin=local/crmd/51)
Nov 29 04:02:03.893 pacemaker03.svealiden.se pacemaker-based     [1201] (cib_perform_op)        info: Diff: --- 0.173.121 2
Nov 29 04:02:03.894 pacemaker03.svealiden.se pacemaker-based     [1201] (cib_perform_op)        info: Diff: +++ 0.173.122 (null)
Nov 29 04:02:03.894 pacemaker03.svealiden.se pacemaker-based     [1201] (cib_perform_op)        info: +  /cib:  @num_updates=122
Nov 29 04:02:03.894 pacemaker03.svealiden.se pacemaker-based     [1201] (cib_perform_op)        info: +  /cib/status/node_state[@id='3']/lrm[@id='3']/lrm_resources/lrm_resource[@id='mariadb_server']/lrm_rsc_op[@id='mariadb_server_last_0']:  @operation_key=mariadb_server_start_0, @operation=start, @transition-key=16:3308:0:7c487611-27b4-49ce-b931-c548d64ecc98, @transition-magic=-1:193;16:3308:0:7c487611-27b4-49ce-b931-c548d64ecc98, @call-id=-1, @rc-code=193, @op-status=-1, @last-rc-change=1638154923, @exec-time=0
Nov 29 04:02:03.894 pacemaker03.svealiden.se pacemaker-based     [1201] (cib_process_request)   info: Completed cib_modify operation for section status: OK (rc=0, origin=pacemaker03.svealiden.se/crmd/51, version=0.173.122)
Nov 29 04:02:04  mariadb(mariadb_server)[1456733]:    INFO: MySQL is not running
Nov 29 04:02:05  mariadb(mariadb_server)[1456733]:    INFO: MySQL is not running
Nov 29 04:02:08.906 pacemaker03.svealiden.se pacemaker-based     [1201] (cib_process_ping)      info: Reporting our current digest to pacemaker01.svealiden.se: 506f4f6824d1cd4857592724a902db4b for 0.173.122 (0x5646a9a5eac0 0)
Nov 29 04:02:09  mariadb(mariadb_server)[1456733]:    INFO: Changing MariaDB configuration to replicate from pacemaker02.svealiden.se.
Nov 29 04:02:10  mariadb(mariadb_server)[1456733]:    ERROR: MariaDB slave io has failed (1236): Got fatal error 1236 from master when reading data from binary log: 'Error: connecting slave requested to start from GTID 1-3-30418952, which is not in the master's binlog. Since the master's binlog contains GTIDs with higher sequence numbers, it probably means that the slave has diverged due to executing extra erroneous transactions'
Nov 29 04:02:10.888 pacemaker03.svealiden.se pacemaker-execd     [1203] (log_op_output)         notice: mariadb_server_start_0[1456733] error output [ ocf-exit-reason:MariaDB slave io has failed (1236): Got fatal error 1236 from master when reading data from binary log: 'Error: connecting slave requested to start from GTID 1-3-30418952, which is not in the master's binlog. Since the master's binlog contains GTIDs with higher sequence numbers, it probably means that the slave has diverged due to executing extra erroneous transactions' ]

So it seems Pacemaker stopped MariaDB and then started it in a wonky state. Haven’t seen that before. But it’s not the first thing to make me go “Is this setup really solid?”

’cause I ran Galera without a hitch for like a year and half at least. Sure, when quorum is lost you’re in a world of hurt but there are recovery methods that work in artifically generated scenarios worst case scenarios at least: https://deref.se/2019/10/29/percona-xtradb-cluster-again/

I’m thinking of writing my own master-slave runner. It sounds almost as bad as writing your own encryption algorithm but I see benefits in creating something that is written with the sole purpose of dealing with MariaDB master-slave setups. Also something that I can debug. That’s a big plus. Now I’m not quite crazy enough to try to implement my own consistency protocol. Obviously corosync or etcd will have to serve as a coordinator for any code I write. I’ll dive into that part right now so I know if this is even halfway workable.

Addendum:

Okey, so that seemed less than ideal. I think I’m going to go with keepalived. It’s not super-safe but if the slave doesn’t get a heartbeat from the master than it can fence the master and start in master mode. If the master won’t fence the slave(why would I give it that ability? If we’re concerned about the slave assuming the master role while the master is still running, the slave fencing the master “solves” that) then we can avoid mutual killing in case of a glitch. The worst case scenario should be that the slave kills the master unnecessarily.

Kube-router failure

This is just darling:


kube-system      kube-router-7v944                               0/1     CrashLoopBackOff   10         11h   192.168.1.172   kube02.svealiden.se   <none>           <none>
default          grafana-67d6bc9f96-lp2fk                        0/1     Running            3          11h   10.32.1.90      kube03.svealiden.se   <none>           <none>
default          pdnsadmin-deployment-b65c568dd-kd7x4            0/1     Running            8          31d   10.32.0.92      kube02.svealiden.se   <none>           <none>
kube-system      kube-router-nrz6v                               0/1     CrashLoopBackOff   10         11h   192.168.1.173   kube03.svealiden.se   <none>           <none>
kube-system      kube-router-9mmfc                               0/1     CrashLoopBackOff   10         11h   192.168.1.171   kube01.svealiden.se   <none>           <none>
default          zbxserver-b58857598-njf26                       0/1     Running            5          23d   10.32.0.90      kube02.svealiden.se   <none>           <none>
default          pdnsadmin-deployment-b65c568dd-rdtft            0/1     Running            11         11h   10.32.2.113     kube01.svealiden.se   <none>           <none>
default          pdnsadmin-deployment-b65c568dd-s2w4n            0/1     Running            5          11d   10.32.1.93      kube03.svealiden.se   <none>           <none>
default          grafana-67d6bc9f96-ws7dw                        0/1     Running            6          27d   10.32.0.89      kube02.svealiden.se   <none>           <none>

Kube-router is the connection-fabric for pods. So all instances being down is suboptimal. Turns out the file that kube-router needs to connect to Kubernetes couldn’t be found:

[root@kube01 ~]# mkctl logs -f kube-router-lrtxp -n kube-system
I1126 07:44:26.337591       1 version.go:21] Running /usr/local/bin/kube-router version v1.3.2, built on 2021-11-03T18:24:15+0000, go1.16.7
Failed to parse kube-router config: Failed to build configuration from CLI: stat /var/lib/kube-router/client.config: no such file or directory

This was a surprise to me since I hadn’t changed any config. I know because I was asleep! None of this is critical stuff so it’s no biggie but I get kind of curious. Was this a microk8s-thing or a Kubernetes-thing happening? I suspect it’s a microk8s-thing having to do with the path mounted to /var/lib/kube-router/ referencing a specific snap-version of microk8s. Not that I upgraded it while asleep – admittedly – but seems more likely than Kubernetes fiddling with a deployment configuration randomly.

Anyway… Think I’m going to get myself acquainted with Nomad and Consul for a while…

Addendum: Kubernetes is back up and running by the way. I just had to run mkctl edit ds kube-router -n kube-system a couple of times and fiddle some values back and forth.