Trend Tide News

Genetic Algorithms: Using Natural Selection to Block Bot Traffic


Genetic Algorithms: Using Natural Selection to Block Bot Traffic

The process of selection, reproduction, and mutation mimics natural selection and genetic variation, allowing the algorithm to explore the solution space and converge towards better solutions over generations.

Leveraging Genetic Algorithms for Bot Protection

The goal of this new rules-based detection method is to use genetic algorithms to discover parts of traffic caused by malicious activities. We are then able to create rules to automatically block those segments.

While DataDome heavily relies on ML techniques to detect malicious bots traffic, our detection engine also offers the possibility to execute millions of rules in a few milliseconds. Thousands of rules -- generated both manually and automatically -- are already efficiently filtering traffic. Multiplying the methods we use to create rules ensures protection is as complete as possible. The nature of this algorithm allows it to find bot traffic that would not have been detected otherwise, by introducing randomness and making no assumptions about the nature of the requests to target.

Our goal is for the algorithm to generate rules that specifically target bot traffic that have not already been blocked by our other detection methods. These rules should be defined as a combination of predicates, to target specific request signatures.

To begin, we collect a set of unique key-values from the signatures of the requests stored in recent days. This set contains the predicates we will use to build our potential solutions (i.e. our rules targeting specific parts of traffic).

If we receive a Post request coming from the country of France with Chrome as a user agent, we add , and to our set of predicates.

We are then able to create a base set of rules by randomly combining elements from our set of predicates. This base set will serve as the starting point for the algorithm's evolution.

For example, we create rule A, defined as and rule B as .

Our hypothesis is that we will be able to make new rules out of these base rules thanks to an evolutionary process. These new rules will use a combination of predicates to effectively match bot traffic.

2. Evaluate Our Potential Solutions

Then, we need to be able to evaluate the fitness of each rule -- that is, how good the rule is at targeting bot traffic. This part is challenging; since we are trying to discover bot activity that wasn't detected by any other method, we have no reliable labels to assess if the requests matched by our rule are bot-made or human-made. The solution we adopted was to look at the time series of the requests that would have been matched by the rule. In particular, we're combining two metrics to give a fitness score to each rule:

If a time series presents the characteristics of a bot traffic time series and is different from all of our known human time series, the associated rule would be deemed good and get assigned a high score. Otherwise, the rule would get assigned a low score.

3. "Evolve" Our Rules to a Satisfactory Solution

Once we have assigned a score to each of our rules, we can retain the ones that perform the best -- "survival of the fittest" -- and generate new rules by combining the ones we have retained.

We can combine our rule A and our rule B to create a rule C that could be .

Next, we introduce random, infrequent mutations in our rules in order to introduce new predicates into the population of potential solutions.

We can randomly mutate rule C by removing part of the rule and replacing it by another key-value from our set of predicates, such as replacing with .

By introducing mutations, the algorithm is able to explore new areas of the solution space and potentially find better solutions that it would have missed otherwise.

Having created this new generation, we're able to repeat the process of scoring our potential solutions and combining them into new ones over and over. This process will increase the fitness score of our population with each new generation.

Once we've gathered enough satisfactory rules, we're able to stop the evolution process and leverage the new rules through our detection system.

Previous articleNext article

POPULAR CATEGORY

commerce

8822

tech

9828

amusement

10600

science

4781

various

11205

healthcare

8452

sports

11152