DNS resolvers are constantly adding features while not removing any, but this trend cannot continue indefinitely because the software would eventually break under its own weight. Which features are used in practice and which can be safely removed? We present preliminary results of a survey among DNS resolver administrators, and also invite readers to participate in cross-vendor survey which is open until 2020-06-30.
Why vendors need feedback
DNS protocol has been with us for 33 years now and its complexity is daunting: Its specification has grown from 132 pages in 1987 to 3000+ pages nowadays and it keeps growing! DNS software offers lots of vendor-specific features as well, which adds even more complexity, and all this complexity in turn makes manuals longer, configuration more error-prone, and software more buggy and less reliable.
In theory, vendors might decide to remove obsolete features and code, making the appearance of bugs less likely … if they only knew which features are actually used by their users. Getting rid of obsolete code would help both parties. But exactly this kind of feedback from administrators is missing, and vendors who try to be conservative keep adding options while not removing anything, and that is obviously not feasible long-term strategy.
How does this historical baggage look in practice? Let’s have a look at documentation for various software packages, estimate number of options, and compare total number of options with usage indicated by 120 detailed survey responses received till today.
BIND estimate:
$ man named.conf | sed -e 's/ //g' | sort -u | wc -l
BIND 9.16 named.conf supports roughly 400+ options, and many of these can be used in various contexts (global, view, zone) and interact together, also with authoritative DNS server which is part of BIND. At the moment survey data show that only 65 out of 400 options is used in practice.
The candidate for most obscure option not yet seen in survey: “dialup” with 6 modes to chose from. Is anyone still using it? (Probably yes.)
Unbound estimate:
$ man unbound.conf | grep '^ *[a-zA-Z0-9_-]*:' | sed -e 's/ //g' -e 's/:.*$//' | sort -u | wc -l
Unbound 1.10.0 unbound.conf supports roughly 230+ options, but at the moment this survey shows only 30 options in active use, possibly because only few Unbound administrators voluntarily submitted their configuration in the survey.
The candidate for most obscure option not yet seen in survey responses: “dlv-anchor”
PowerDNS Recursor estimate:
$ pdns_recursor --help | fgrep -- -- | wc -l
PowerDNS Recursor 4.3.0 comes with 152 options in its configuration file, which is significantly lower number, but it also has a built-in Lua interpreter for configuration, making its configuration file Turing-complete.
At the moment survey does not have enough data from PowerDNS users, but it has a candidate for most obscure option: “distribution-pipe-buffer-size“.
Knot Resolver 5.1.1 is the newest of kids on the block, but its innocently looking configuration file is practically a Lua program with infinite possibilities. Currently the survey data show that some users actually do use Lua for scripting their own functions inside the resolver, but majority of respondents use only pre-baked functions shipped with the software. This opens question if Lua configuration is worth the complexity, or if it can be replaced with something more user-friendly.
The candidate for most obscure option not yet seen in survey responses: modules.unload(‘detect_time_jump’)
As you can see, all four implementations have vast configuration possibilities – that’s a lot of code to maintain and test, especially because the features often interact with each other. At the same time our survey suggests that a lot of options might not be used, which opens possibility to remove historical cruft. Please participate in survey, it will help to determine what obsolete parts should be removed to eliminate bugs and simplify configuration.
Hopefully it is now clearer why vendors need your feedback!
How bad is the lack of feedback?
To illustrate scale of the problem let’s make a back-of-the-envelope estimate to see how many operators give feedback to DNS resolver vendors.
Guess no. 1: Number of people talking to vendors
Here we use public sources to estimate number of people who actually talked to their vendors.
All four projects have public mailing lists, so we can download archives and use couple regexes to get number of unique e-mail addresses:
$ grep -o -h '^From [^ ]\+ at [^ ]\+ ' *-users/2019-*.txt | sort -fu | wc -l
gives us 534 e-mail addresses including contributors working on all four projects.
Also all four projects have public bug trackers, so we can count users who reported issues or commented on them in 2019. To make this task feasible we will simplify the analysis:
- We will not attempt to subtract interactions by vendors themselves.
- BIND and PowerDNS do not separate their repositories for recursor and authoritative server, so we will count all communication in these two repositories.
- We will not de-duplicate people between GitHub and private GitLab instances used by different vendors.
A simple script based on GitHub API and Gitlab CSV export produces 400 accounts posting at least one comment on public trackers in 2019 (certainly an overestimate).
Lastly we also need to count customers talking to vendors in private, which is much harder to do. Luckily ISC publishes a detailed annual report which reveals that roughly 100 more customers could be talking to ISC in private. Other vendors do not publish these numbers so let’s extrapolate ISC numbers to all four vendors and add 400 more people talking in private.
Finally we can summarize the total number of people talking to vendors in 2019:
- public mailing lists = 530 (including vendor employees)
- public bug trackers = 400 (without de-duplication across projects)
- estimated number of customers talking in private = 4 * 100 = 400 (extrapolated from ISC)
The total is 1330 people which is almost surely an overestimate… but how does it compare with number of DNS resolver operators?
Guess no. 2: Number of operators
It is impossible to obtain a precise number, but we can establish a range of possible values.
Very conservative lower bound could be number of Autonomous Systems in use on the Internet . At the moment roughly 67 000 operators care enough about Internet infrastructure to run their own AS, and are thus likely to operate other essential Internet services like DNS resolver.
Upper bound is much harder to establish. If we limited ourselves to recursive DNS resolvers we could base our guess on number of unique IP addresses sending DNS queries to DNS root over period of one day, but number of unique source IP addresses calculated over all root server instances is not available. We need to resort to independent statistics of each root server operator. From these we can see that L-root seems to have the highest number of unique source IP addresses seen during a day, varying around 8 million.
This gives us very broad range from 67 000 to 8 000 000.
What portion of operators talk to vendors?
Finally we can estimate how many operators talked to a DNS vendor in 2019:
upper bound = (1330 people talking to vendors in 2019)/(67 000 Autonomous Systems) = 2 %
lower bound = (1330 people talking to vendors in 2019)/(8 000 000 source addresses) = 0.017 %
We can conclude that only (0.017 to 2 %) of operators talked to a DNS resolver vendor in 2019, so the vendor lacks feedback and are left with the following options:
- Keep maintaining all features, including unused ones, thus producing software which has more bugs and is harder to configure for everyone.
-
Remove features which are not used by the small fraction of “talking” user population, possibly removing features other users depend on.
If none of these options sound appealing to you, participate in the survey and help us fix that!
Better talk late than never
Survey is open until 2020-06-30 and gives you an opportunity to tell vendors what features users need and should not be removed, express wishes about further development of configuration interfaces, DNS-over-TLS and DNS-over-HTTPS support etc.
Of course, a manual survey based on web page with forms has significant limitations, so the survey itself also touches on possibility of built-in “call home” features which could automate future surveys.
What do you think? Tell us!
Collecting data from resolvers is dumb idea!
Please fill in the questionnaire linked above and ideally also explain reasons why you oppose the idea:
Are there technical concerns? Privacy concerns? Ideological opposition?
Can any of these concerns be alleviate by measures of some sort? E.g. something based on https://crypto.stanford.edu/prio/ ?
Do you have other proposal to solve problem described in this article?
All these concerns need to be weighted carefully against each other so just stating “it is a dumb idea” does not help to move discussion forward.
There is a danger in your volume-based model assumptions of which I am sure you’re aware, but bears explicit comment: Many (most?) of the more exotic features that are embedded in various authoritative and recursive resolver software is there because of the needs of a small number of high-volume operators. These operators discover problems or edge cases that result from their size and complexity of deployment. The advantages of these new features (or bug fixes) often are utilized by progressively smaller systems who would otherwise not have been able to understand or invest in the time/effort to document or request the features, but I would suspect there is a fairly short tail of use. However, these larger operators represent the bulk of the DNS query volume so even though they may be relatively small in IP address quantity, they are significant in size. Also adding to the fog of uncertainty is the tendency that the larger the operator, the less likely they are to share their configuration files with all these interesting features in use.
Instead of sharing configuration, they can describe what features they need. The survey branch without providing config snippets asks a few particular questions and then there’s a free-form “What other features do you use?”.
Yes, John is certainly right and this aspect needs to be taken into account. I guess it could be summarized to: “If you require exotic features you had better talk to your vendor”. After all this is an attempt to fix lack of feedback from users.