Seems .world and .ee federation are broken

edit-2 3 months ago

Seems .world and .ee federation are broken

Blaze (he/him)@feddit.org · 3 months ago

https://grafana.lem.rocks/d/bdid38k9p0t1cf/federation-health-single-instance-overview?orgId=1&var-instance=lemmy.world&var-remote_instance=lemm.ee

r00ty@kbin.life · 3 months ago

Wait, how do they get that data remotely? I was looking at my instance vs world and I saw there’s like the +1 hour from a week or so ago when I upgraded to latest mbin lol.

I guess they’re looking at common activities and when they appear on each?

Nothing4You@programming.dev · 3 months ago

lemmy has a public api that shows the federation queue state for all linked instances.

it provides the internal numeric id of the last activity that was successfully sent to an instance, as well as the timestamp of the activity that was sent, and also when it was sent. it also includes data like how many times sending was unsuccessful since the last successful send. each instance only knows about its own outbound federation, but you can just collect this information from both sides to get the full picture.

there is also https://phiresky.github.io/lemmy-federation-state/site to look at the details provided by a specific instance.

seang96 · 3 months ago

This is a nice tool though using it I think triggered my IP to be flagged with Cloudflare when I was trying to fix an issue with my instance and lemmy.ml.

Nothing4You@programming.dev · 3 months ago

lemmy.ml doesn’t use cloudflare, that’s strange.

i’ve also never had issues with this when looking at instances that do use cloudflare.

seang96 · 3 months ago

After commenting I had a theory and it may be right. I have dual WAN for redundancy and setup a routing policy for ml and world to route through my WAN that is not CGNAT, going with the assumption that CGNAT sometimes the public IP is blocked. The primary problem with it is that images will break when federating and after doing this it seems to be working better.

That being said it all started happening after I used the Lemmy state checker and I assume since it queries the endpoint for the selected site on an interval I got flagged by something.

r00ty@kbin.life · 3 months ago

That makes sense. So it’s showing me world’s federation with me and not the other way (since I’m not sure such info is available on mbin)

Nothing4You@programming.dev · 3 months ago

pretty much, yeah. lemmy has a persistent federation queue instead of fire and forget requests when activities get generated. this means activities can be retried if they fail. this allows for (theoretically) lossless federation even if an instance is down for maintenance or other reasons. if mbin has a similar system maybe they could expose that as well, but unless the system is fairly similar in the way it represents this data it will be challenging to integrate it in a view like this without having to create dedicated mbin dashboard.

r00ty@kbin.life · 3 months ago

We can see it ourselves. We use rabbitmq for incoming (and maybe outgoing, it’s been a while since I looked at how it is) federation. So, you can see the queues there. For incoming (from rabbitmq) and outgoing there are also queues (symfony messenger) and these handle failures and can be configured and can be queried.

After the upgrade I just took the default configuration again (because it seems queue names changed). But I used to have various rules setup in rabbitmq for retries and it took a fair few tries before the messages ended up in the proper “failed” queue (which needs manual action to retry). Some items you eventually need to clear (instances that just shutdown, or instances that lost their domain for example). They will never complete.

But it’s not exposed in any way to my knowledge. Well unless people have their rabbitmq web interface open and without login of course.