Radu Angelescu

ian. 03, 2016

Evolving the best driver using differential evolution

Children (light blue) and parents(dark blue) run in the same generation simulation. The red car is the current best individual, the white ones are the old best individual (evaluated in the same generation).

So I finally implemented differential evolution in my Carvatar project (commit f6aba7e ).

If you followed my other articles about using differential evolution algorithm to fit data to a polynomial you probably are up to par with the theory :) . Of course I implemented it in C++ now and for my real-time simulator so there are a few modifications we will discuss.

First I feel like I ignored the taxonomy part of the algorithm description, the main reason for that is that I consider taxonomy of optimization algorithms to be kind of a waste of time. Nonetheless it has it’s merits for remembering the algorithm and adding a story to our numbers.

So the evolution algorithms story is : We have a population of individuals (each individual has a genome ), they mate and make babies. A baby has a genome that is a combination of it’s parents genomes and it may or may not have a small mutation (modification that is not present in the parents, this happens in nature because the ADN and ARN copy is not perfect). After a generation has passed only the fittest members remain and go on to reproduce. So basically it’s a natural algorithm that maximizes the fitness function.

There are many evolution algorithms variations (based on the way we select the individuals that reproduce, what individuals get to remain in the population, the way they reproduce). The “normal” reproduction is based on selecting a crossover point and mixing the genomes, but this also varies. In our case we have genomes that have real values so to search the space optimally it’s kind of clear we need the reproduction process to be more complex then just crossover points and switching variables, so our reproduction algorithm implies 3 individuals (yeah.. so the story starts to get kind of awkward :)) ) . We make a genome between 2 individuals by a sort of differential average and then that genome gets the normal crossover with a third. Some birds require 3 partners to mate but the algorithm kind of steers from the story. Because it does not stick to the story it actually gets good results for optimizing real values.

Ok so enough theory (for a better understanding, check my previous articles or the internet :) ). Let’s get down to business.

Implementating Differential Evolution in Carvatar

The first AI problem we want to solve using evolution is: “Find the optimal parameters for our basic AI controller that get our best race driver (best time and less wall hits)”. Solving this problem should give us an unbeatable AI (almost perfect) and it’s trajectory will be our best race line (a better approximation that we get by using a smooth filter). The main steps for implementing this in our real-time system are:

  • Mate individuals and create Children
  • Run the race simulation to get the fitness for each individual
  • Replace the parents with the children with better fitness scores (again this doesn’t fit the natural story :) )

We will achieve this by implementing a simple FSM (state machine).

1
2
3
4
5
6
7
8
9
 enum DE_STATE
 {
    DE_CREATECHILDREN,
    DE_RUNSIMULATION,
    DE_EVOLVE,
    DE_END,

    DE_NUMSTATES
 };

We will inherit from our RaceManager because the whole thing will actually be a special race for life that is always repeating :) . Lets’s check our init function:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
 void DifferentialEvolution::init()
{
    //Read the values from our TOML file
    loadRaceFromTOML("racesetup.TOML");

    //Set the first state to create children
    m_currentState = DE_CREATECHILDREN;

    //All our racers (parents and children)
    m_numRacers = m_populationSize * 2;

    // Creating the controllers, and cars
    m_controllers = new IController*[m_numRacers];
    m_childrenControllers = &m_controllers[m_populationSize];

    for (unsigned int i = 0; i < m_numRacers; i++)
    {

        TopdownCar * car = new TopdownCar(i, b2Color(0.3f, 0.5f, 0.8f));
        car->setPosition(TRACK->getSectorPoint(0).center);
        m_controllers[i] = new BasicAIController();
        m_controllers[i]->initController(car);
        // Our first guys are the parents
        if (i <= m_populationSize)
        {
            float paramsInit[EBASICAI_NUM];

            //We init the parameters with smart random values
            // we could do this with actually random values, without knowing the interval
            // but that implies a bigger population size and a lower convergence rate
            paramsInit[EBASICAI_ANGLETOTURN]                = randomInterval(0, 60);
            paramsInit[EBASICAI_LOOKAHEAD_DISTANCE]         = randomInterval(2, 20);
            paramsInit[EBASICAI_ANGLETOTURNSPEEDINFLUENCE]  = 1.0f / randomInterval(1, 10000);
            paramsInit[EBASICAI_MAXSPEED]                   = randomInterval(0, 110) / 100.0f;
            paramsInit[EBASICAI_DISTANCETOFRONTWALL_STOP]   = 1.0f / randomInterval(1, 1000);

            //Set our parameters
            ((BasicAIController*)m_controllers[i])->setParams(paramsInit);
        }
        else // These are the children
        {
            m_controllers[i]->getCar()->setDebugColor(b2Color(0.6f, 0.6f, 0.6f));
            //children
        }
    }

    // We don't have a best candidate yet
    m_bestCandidate = NULL;
}

We create all our controllers and cars. We have 2 * population size racers because we want to have the parents race their children so we can see if they are better or not. We hold the parents first and then our children. We also use a pointer to hold the beginning of our children array so it’s all clean and easy to follow.

The loadRaceFromTOML function just loads the parameters from the toml file. I’ll paste it here so you can follow the parameters through the code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
void DifferentialEvolution::loadRaceFromTOML(const char *filename)
{
    // Load our parameters from the TOML file
    std::ifstream ifs(filename);
    toml::Parser parser(ifs);
    toml::Value documentRoot = parser.parse();
    toml::Value* raceSettings = documentRoot.find("DifferentialEvolution");

    m_numLaps              = raceSettings->find("lap_number")->as<int>();
    m_maxRaceTime          = raceSettings->find("max_race_time")->as<int>();
    m_populationSize       = raceSettings->find("population_size")->as<int>();
    m_maxIterations        = raceSettings->find("max_iterations")->as<int>();
    m_differentialWeight   = raceSettings->find("differential_weight")->as<double>();
    m_crossoverProbability = raceSettings->find("crossover_probability")->as<double>();
}

Our state machine looks like this (Note it is as simple as possible, for bigger FSMs it would be better to separate the state transitions from the state execution code):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
void DifferentialEvolution::updateControllers()
{
    switch (m_currentState)
    {
        case DE_CREATECHILDREN:
        {
            // Decrease our iteration number
            m_maxIterations--;
            // Create the actual children
            stepDifferential();

            if (m_maxIterations > 0)
            {// We go into the simulation phase
                m_currentState = DE_RUNSIMULATION;
                m_isRaceEnded = false;
                m_currentRaceTicks = 0;
            }
            else
            {// We need to stop (we reached our number of generations)
                m_currentState = DE_END;
                printf("Best candidate params are: ");
                prinfvector(m_bestCandidate->getParams(), EBASICAI_NUM);
                printf("Best candidate params are: ");
            }
            break;
        }
        case DE_RUNSIMULATION:
        {// We run the simulation (race) until it ends
            RaceManager::updateControllers();
            // Color the best individual (eyecandy :) )
            colorBestFit();

            if (m_isRaceEnded == true)
            {
                m_currentState = DE_EVOLVE;
            }
            break;
        }
        case DE_EVOLVE:
        {
            evolve();
            //Reset racers
            for (unsigned int i = 0 ; i < m_numRacers; i++)
            {
                TopdownCar * car = m_controllers[i]->getCar();
                car->setPosition(TRACK->getSectorPoint(0).center);
                car->reset();
            }

            m_currentState = DE_CREATECHILDREN;
            break;
        }
        case DE_END:
        {
            // DO NOTHING
            break;
        }
    }
}

Ok so the meat of the Differential algorithm is the stepDifferential function. Where we actually create children for each parent by considering their BasicAiController parameters as genomes:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
void DifferentialEvolution::stepDifferential()
{
    for (unsigned int i = 0; i < m_populationSize; i++)
    {
        int pickIdx[3];

        pickIdx[0] = pickUnique(pickIdx, 0);
        pickIdx[1] = pickUnique(pickIdx, 1);
        pickIdx[2] = pickUnique(pickIdx, 2);
        // pick random R
        int R = randomInterval(0, EBASICAI_NUM);

        BasicAIController * x = (BasicAIController*)m_controllers[i];
        // Crossover
        BasicAIController * individualA = (BasicAIController*)m_controllers[pickIdx[0]];
        BasicAIController * individualB = (BasicAIController*)m_controllers[pickIdx[1]];
        BasicAIController * individualC = (BasicAIController*)m_controllers[pickIdx[2]];

        float newParams[EBASICAI_NUM];

        memcpy(newParams, x->getParams(), sizeof(newParams));

        for (unsigned int j = 0; j < EBASICAI_NUM; j++)
        {
            float pickRi = randomInterval(0,10000)/10000.0f;

            if(pickRi < m_crossoverProbability || j == R)
            {
                newParams[j] = individualA->getParams()[j] + m_differentialWeight * (individualB->getParams()[j] - individualC->getParams()[j]);
            }

        }
        ((BasicAIController*)m_childrenControllers[i])->setParams(newParams);
        ((BasicAIController*)m_childrenControllers[i])->getCar()->setDebugColor(b2Color(0.6f, 0.6f, 0.6f));
    }
}

The pickUnique function is similar to the one we wrote in javascript. Picks a random index that was not picked before (not in the blackList array):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
 int DifferentialEvolution::pickUnique(int *blackList, int size)
{
    bool isNotUnique = false;
    int newIdx = -1;
    do
    {
        isNotUnique = false;
        newIdx = floor(rand() % (m_populationSize - 1));
        for (unsigned int i = 0; i < size; i++)
        {
            if (newIdx == blackList[i])
            {
                isNotUnique = true;
                break;
            }
        }
    } while (isNotUnique);

    return newIdx;
}

After running the simulation we make a new generation of parents by replacing the kids that were better. We do this in our evolve function:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
void DifferentialEvolution::evolve()
{
    float bestCurrentFitness = -1000000000;
    if (m_bestCandidate != NULL)
    {
        bestCurrentFitness = m_bestCandidate->getCar()->getFitnessData()->ComputeFitness();
        m_bestCandidate->getCar()->setDebugColor(b2Color(1.0f, 1.0f, 1.0f));
    }

    for (unsigned int i = 0; i < m_populationSize; i++)
    {
        float parentFitness = ((BasicAIController*)m_controllers[i])->getCar()->getFitnessData()->ComputeFitness();
        float childFitness = ((BasicAIController*)m_childrenControllers[i])->getCar()->getFitnessData()->ComputeFitness();

        if (parentFitness < childFitness)
        {
            ((BasicAIController*)m_controllers[i])->setParams(((BasicAIController*)m_childrenControllers[i])->getParams());
        }
        if (bestCurrentFitness < parentFitness || bestCurrentFitness < childFitness)
        {
            m_bestCandidate = (BasicAIController*)m_controllers[i];
        }
    }
    if (m_bestCandidate != NULL)
        m_bestCandidate->getCar()->setDebugColor(b2Color(1.0f, 0, 0));

}

For this to be completely understood we need to show our fitness data structure and computation function:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
struct FitnessData
{
    float data[FT_NUM];

    FitnessData()
    {
        memset(data, 0, sizeof(data));
    }

    float ComputeFitness()
    {
        float empiricalWeights[FT_NUM];
        empiricalWeights[FT_NUMCRASHES]     = -10.0f;
        empiricalWeights[FT_INVERSELAPTIME] = 1000.0f;
        empiricalWeights[FT_DISTANCELEFT]   = 100.0f;

        float fitness = 0;
        // dot product
        for (unsigned int i = 0; i < FT_NUM; i++)
        {
            fitness += empiricalWeights[i] * data[i];
        }
        return fitness;
    }
};

We may change the ComputeFitness function as we wish (by changing the weights parameters). The fitness values we use are actually the number of crashes, the inverse time the car took to make a lap and the distance it managed to travel. I feel that the empiricalWeights values should be read from a TOML but I am not yet convinced of this :) .

So this was it, enjoy my results in the youtube video above.