I try to describe the basic building blocks of our national domain registry administration to people around me quite often. Yet (or maybe for that very reason), the .cz is still perceived as something that simply works. Just like when you get in your car to take your children to school every morning. You expect the journey to take the usual 10 minutes (or 15 if you need to refuel) and that you won’t have to deal with any trouble. Even though you know that you need to change the oil regularly, check and change worn parts, or repair defects caused by operation, most of you leave these “out of order” cases to service professionals or at least a handy neighbor and avoid having to wash your hands from automotive grease or to remember the required type of brake pads. Modern cars are able to inform you of any necessary maintenance and all you have to do is dial the correct phone number. Although you don’t fully understand the person at the other end of the line, they manage to get through to you because you have a basic idea of how a car works.
That is why in my today’s attempt to describe the not-so-common yet significant action on the infrastructure of the .CZ domain carried out by our service guys last week, I will compare it to servicing a car. We can certainly agree that one of the things a car needs in order to be able to run is fuel. The fuel that goes from the tank to the engine cylinder, where it burns up and the energy transfers through the piston and crankshaft to the axle of the vehicle, allowing us to move. So how does it work in a domain registry? Let’s say the fuel is the contents of the zone, and to ensure smooth and problem-free operation (of the DNS and therefore the Internet) we need to transfer it from the registry database to individual instances of the DNS anycast. These are queried by the average internet user, or rather their web browser, to get whatever they need to know. That is, where to find the content for the domain that the user wants to view on their computer. Similarly to the fuel flowing from the tank to the engine all the time, the zone is also transferred from the registry to the DNS servers without an interruption(in case of the .CZ domain, it is updated once every 30 minutes), because its content changes very dynamically. Up to this moment, this comparison has been somewhat stretched, but we can still see some similarities between the registry system and the car. However, as soon as we proceed to replace an important component of the fuel system or the zone distribution system, my comparison begins to limp on at least one leg. If you need to replace the fuel pump and a fuel filter, you will surely have to accept that you will not drive for at least a few hours, sometimes days. You will take your car to a repair shop and in the best case you will drive a relief vehicle, and in the worst, you will take a bus. Meanwhile, your car does not drive during the repair. However, if you need to replace the zone generator and its signing system, you expect the system as a whole to work without visible change. The work of a repair shop serviceman is very different from one of a CZ.NIC team member! 🙂
Yes, last week we made (or more precisely, finished) the replacement of the zone generation system and signing using DNSSEC, and nobody but a few insiders noticed this fundamental change. Yes, ordinary users probably won’t thank our administrators, and they will only hear the “wow” of admiration from their colleagues, bosses or a handful of experts who can appreciate their work. Being the way they are, they are much happier with such “celebration” than if their home office was swarming with reporters wishing to find out what they had missed in the process, because “it’s not working”. The risks of breaking the system during such operations are usually much more severe than the obvious benefits. You can skip the following lines if you want, I will try to describe very briefly the basic elements of the change, including the chosen procedure and benefits.
We generate and sign the zone on the so-called hidden master (HM) server. For a very long time, we had been using the BIND DNS daemon for generating and our own fine-tuned scripts for signing (because no other way was possible when we first launched DNSSEC). If it works, don’t touch it, everyone knows this rule. But when you have a developer of the DNS daemon Knot in your team, you can’t resist and eventually try out their child for this part of the DNS infrastructure. All the more so when the child is the “fastest in its class”, proved itself in anycast DNS servers and it can also replace somewhat dusty signing scripts (thanks, Marian!) with well-documented and maintained code. “Trust, but verify” – this is twice as true for such an operation. That’s why our administrators first simulated the change of HM in the test environment and broke the change in production environment into several stages, first sending to the new HM system test domains, then our second-order production domains according to their importance, then the ENUM domain and eventually the .CZ domain. Since we have all registry servers (as well as the rest of the infrastructure) in three geographically remote locations, we were able to arrange for one part of the traffic to flow through the original HM, and the other through the new ones. We took full advantage of this when switching to Knot DNS. By testing, we found that anycast DNS instances that run with the DNS daemon NSD take longer to accept zone changes from the new location after the HM change. In this regard, Knot DNS and BIND respond without delay. A faster acceptance of the change can be forced by restarting the NSD daemon, but this will cause a several-minute outage of DNS functionality.
Therefore, it was necessary to stop propagating our prefixes at these locations during the transition (i.e. make them invisible to the user), restart first, wait for the zone to be downloaded from the new location, and then make the location available again from the outside. If we did not do this, then individual DNS anycast nodes in different places of the world would respond to the same query differently, and this is definitely not desirable. Part of the transition to the new HM was to simulate the communication of an external user with an anycast node with the zone being already generated using the new HM. To do this we disconnected one DNS anycast location from the Internet, operated a DNS resolver in that location, and set it up to query the very disconnected anycast node. We also arranged for this anycast node to receive zone updates from the new HM, and then we tested whether DNS queries directed through the local resolver to the disconnected DNS node with the new zone were forwarded correctly. For example, we have verified that DNSSEC is validated correctly. Furthermore, a number of inspections were carried out (e.g. whether the .CZ zone is correctly signed, whether it is possible to distribute the new zone in short times to all anycast locations, etc.). The detailed plan for the transition to the new HM also contained a part that, fortunately, was not used: a plan for restoration to the original state if any of the steps did not get us where we expected. In my summary of changes, I leave out many “details”, such as the fact that we maintained a functional dual signing system for DNSSEC (signing individual zones using ZSK and signing DNSKEY records using KSK, which is still performed for us by the CSIRT team – e.g. see this article). Well, not exactly “maintained” – they have also made the transition to Knot DNS to make the change even more complex. The tools notorious among administrators, such as https://dnsviz.net, https://dnssec-analyzer.verisignlabs.com (for verifying and visualizing the correctness of DNSSEC settings, currently used KSK and ZSK) or just the ordinary dig were used to test whether the whole operation went well.
So, since last week, the .CZ registry generation and signing system is built on a completely new core, which is more powerful and safer. And since its basis is our own Knot DNS, we can count on its further development to meet our needs.
Are you feeling, as I am right now, like bowing down before the service guys and slipping them a little something for their trouble? Has this changed the way you see their work, especially here at CZ.NIC? If at least one of you answers “yes”, the article has fulfilled its purpose. May your DNS and your car run well!