News

AI project aimed at detecting fraudulent domain name registrations

18 July 2023

DNS Belgium and SIDN are working together on software to detect apparently malicious domain name registrations before they go live. A malicious registration involves someone registering a name domain for an abusive or criminal purpose, such as phishing , malware distribution or domain squatting. Both the registries already run systems of their own for flagging up suspect registrations. The reason for teaming up is to see what can be learnt from each other, and whether the 2 teams might ultimately combine their detection systems.

The Dutch system

SIDN has developed a system called RegCheck for detecting suspect domain name registrations. It's been in use since summer 2022, checking all 2,000 to 3,000 domain name applications received each day for a variety of telltale (negative) characteristics. The system assigns a weighted hazard score to each negative characteristic it detects, adding up to an overall score for the application. If that overall score exceeds a threshold value, the case is referred to SIDN's abuse analysts for manual assessment. Whenever the manual checks suggest that there's cause for concern, the registrant is asked to prove their identity. And, if they fail to do so within 3 working days, SIDN can delink or modify the domain's name servers, effectively preventing access to the associated website.

SIDN uses a machine learning (ML) algorithm to assess the interrelationships between negative characteristics and to calculate the hazard scores. For SIDN, it's very important that every ID request made to a registrant can be justified. Consequently, the ML algorithm underpinning RegCheck isn't based on a neural network ('fuzzy black box'), but on logistic regression (a statistical technique).

High precision is another key requirement for SIDN. In other words, the system has to be very good not only at detecting abusive registrations, but at minimising the number of false positive detections. The fewer false positives there are, the less SIDN has to trouble legitimate registrants with ID requests.

As many as possible, as early as possible

DNS Belgium's solution is based on a somewhat different philosophy. The Belgian registry 's new ML system (currently undergoing trials) is designed to detect as many suspect registrations as possible, as early as possible. Any domain name that gets a high hazard score from the ML system at the application stage isn't added to the .be zone until the registrant has proven their identity. "If we have to send 200 legitimation requests to prevent 20 malicious registrations, that's a good trade-off as far as we're concerned," says Maarten Bosteels, who oversees R&D at DNS Belgium.

Whereas SIDN has been operating its RegCheck system for less than a year, Belgium's registry has been assessing new domain name registrations for more than a decade. "We originally checked all registrations manually," explains Bosteels.

However, with roughly 1,000 registrations a day coming in, it was obviously hard to sustain that approach. At the end of 2020, therefore, DNS Belgium introduced a rule-based system for identifying suspect registrations.

Collaboration

It was last year that the 2 registries had the idea of collaborating on further development of their ML systems for the early detection of suspect registrations. "SIDN and DNS Belgium are up against the same problems," explains Bosteels, "and we're working on similar solutions. So we can inevitably learn from each other." The organisations have since exchanged the source codes of their systems. "We're currently busy studying SIDN's code. After that, we plan to train SIDN's algorithm using the same dataset that we used for our system, and see how that works out."

Meanwhile, SIDN will be mirroring DNS Belgium's research, having already incorporated certain features of the Belgian system into RegCheck. "Our Belgian colleagues had identified a number of signs of abuse that we hadn't yet looked at, so we adapted RegCheck to scan for those characteristics as well," says Machine Learning Research Engineer Thijs van den Hout. "That definitely improved the performance of the software."

Distinct philosophies, methods and approaches mean knowledge exchange, showing immediate benefits.

Valuable exchange

As well as differing in philosophy, methodology and approach, the 2 registries' systems have other distinct features that make the exchange of technical knowledge immediately beneficial to both sides. SIDN's RegCheck system is more production-oriented, while the Belgian system is more research-focused. During development of their system, DNS Belgium have done extensive research and explored a variety of different pathways.

"The new ML software was written by a doctoral student from KU Leuven," explains Bosteels. "Our code is more complex, has more features and third-party libraries, and reflects the knowledge gained from a lot of experimentation. As a result, our software is harder to get started with."

"DNS Belgium have been working on their software for a few years," adds SIDN's Machine Learning Engineer Thymen Wabeke. "By contrast, we started with a clean slate. Also, we've designed our software as more of a framework, so that we've got the option of integrating other models at a later date. We wanted the most generic solution we could come up with."

Shared codebase

Combined with its production-oriented design, that architecture means that SIDN's RegCheck software is the more likely candidate for use as the basis for any future shared codebase. The desirability and feasibility of moving towards a joint solution are amongst the questions now being examined by the registries.

Another idea under consideration is to go a step further and develop the existing software into a solution suitable for use by multiple registries. Ideally, the code could then be maintained by a group of 3 or 4 stakeholders. However, evolution into a public, open-source package isn't seen as an option. Insight into the detection methods used by registries would help malicious actors to devise and test avoidance strategies. Collaboration through CENTR , the umbrella body for European ccTLD registries, is more appropriate in this case, Bosteels and Van den Hout believe.

Although it's too early for formation of a multi-registry development team, SIDN and DNS Belgium are proceeding with longer-term ambitions in mind. Once the current evaluation phase is complete, Bosteels envisages the need to get other registries onboard to provide input on progress to date and new ideas about where to go next.