As a planned milestone in the ADAM project (Advanced DNS Analytics and Measurements), CZ.NIC Laboratories in cooperation with CSIRT.CZ are about to commence regular operation of DNS crawler. This tool will periodically scan all second-level domains under TLD .cz, collect selected publicly available data about them, and process them further in various ways. Despite the name, the DNS crawler will collect data not only from DNS; it will also communicate with each domain’s web and e-mail server. We plan to run the tool with two periods: most data items will be collected on a weekly basis, only the contents of main web pages <domain>.cz or www.<domain>.cz will be retrieved less frequently – once a month. In addition, newly registered domains will be subject to an extra scrutiny: their data will be retrieved daily for the first two weeks of their existence. The DNS crawler software is designed so as to minimize the impact on the operation of second-level domains and network infrastructure in general. Data obtained from the crawler will be used for these principal purposes:
- for various statistics and analyses, either regular or ad hoc, that will contribute to effective administration and strategic planning of the DNS services run by CZ.NIC
- for early discovery of DNS problems and anomalies, be they caused by misconfiguration, invalid zone data or malevolent activities
- for classifying web pages via machine learning, with the primary aim of increasing security of the .cz zone (e.g. by detecting fake e-shops or domains misused by malware).
We are well aware of the fact that such a large-scale scanning of network resources is a double-edged sword in that it is virtually indistinguishable from less-welcome activities of network intruders. With this in mind, we decided to be as open as possible regarding the operation of the DNS crawler:
- The DNS crawler software that we use for .cz zone scanning is open source, which means that everyone can test it and/or inspect its source code (written in Python).
- We publish a complete list of data that is being collected, as well as the internal policy for using this data.
- We also publish the identity (IP addresses) of servers that perform the scanning.
Detailed information about the DNS crawler operation, including contact addresses, is available from the web page https://csirt.cz/en/dns-crawler. We would also like to use this opportunity and ask everybody who may be affected by this activity for cooperation, i.e. network operators, ISPs and providers of other services. If you notice any problems related to the DNS crawler operation, then please let us know about them, for instance by sending e-mail to the address firstname.lastname@example.org. Thanks in advance!