• 0 Posts
  • 4 Comments
Joined 2 years ago
cake
Cake day: July 20th, 2023

help-circle

  • Certainly! The line we don’t cross is that we don’t directly edit data. Every record in our database must be generated by the system itself. But, we can re-trigger behaviour, or select different flows, or tweak properties around the edges as much as we want.

    For example:

    • Reflows - for every message that enters or leaves our system, we store it in a table. We can then reflow the message either into our system or to our downstreams. This means if there was a transient error or a code change since we received the message, we can replay it again without having to involve anyone else.
    • Triggers - i.e. ask the system to regenerate its output based on its inputs again. This is useful if there’s a bug that’s only hit in certain situations.
    • Migration - we have lots of different flows and some are triggered only on some accounts. We have some scripts that lets us turn on/off migration and then automatically reflow all the different messages.

  • I run a prosody server and have a couple of users who run Monal, and notifications work reliably for us!

    I made sure to follow the considerations for server admins and it’s been ok.

    Regarding the push service: unless you deploy your own version of the app, it’s not possible to self-host your own push service. The flow looks like this:

    XMPP server -> Monal pushserver -> Apple pushserver -> Device

    Apple only allows the developer of the app to send notifications to their push server. They enforce this by giving the app developer a key specific to their app.

    The linkage between XMPP server and Monal pushserver gets set up by Monal: when it connects to the XMPP server, it instructs it to send messages while it is offline to the Monal pushserver.


  • Idempotence / self-healing: the system should be built in such a way that it tries to reach the correct end state, even if the current state is wrong. For instance, every time our system gets an update, it will re-evaluate the calculation from first principles, instead of doing a diff based on what was there before. This prevents bad data from snowballing and becoming a catastrophe.

    Giving yourself knobs to twiddle in production: at work we have ways of triggering functionality in the system on request. Basically calling a method directly on the running process. This is so, so useful in prod issues, especially when combined with the above. We can basically tell the system “reprocess this action/command/message” at any time and it will do it again from first principles.

    Debugging: I always first try and find a way to replicate it quickly. Then, I try and simplify it one tiny step at a time until it’s small enough I can understand in one go. I never combine multiple steps per re-run and always verify whether the bug is there or not at every single stage. This can be quite a slow approach but it also means I am always making progress towards finding the answer, instead of coming up with theories which are often wrong, and getting lost in the process.