Programming in Java Evolutionary Algorithms

Programming in Java
Fundamentals of Programming in Java
Assignment 2
Evolutionary Algorithms
Marks available: 100
Date issued: 19/11/2018
Deadline: 1200, 06/12/2018
Submission instructions
Submit your work by the deadline above. The accepted submission format is a zip archive. Submit only
Java source les in your archive. Place all of the les in the same zip. Do NOT submit .class les. Please
do not submit Eclipse (or other IDE) projects.
Please make sure that the following package declaration is the rst line of your Java les:
package com.bham.pij.assignments.a2a;
Do not output anything to the console unless explicitly asked to. You may, of course, use console output
to debug your programs. Just make sure that you comment out or delete such debug statements when you
submit.
Where applicable, your class for each of the exercises must have the name shown in the exercise heading.
In each of these exercises, you are not restricted to only creating the required methods. If you want to,
you can create others. However, you must create at least the required methods.
Do not use any input method (such as Scanner) in any method that is required by this specication.
Introduction
For this assignment you will implement a genetic algorithm (GA), which is a type of evolutionary algorithm.
1
As discussed in the lecture, the genetic algorithm is a search algorithm that is inspired by natural evo-
lution. Two key processes are adopted: genetic mutation and crossover. These processes are described
below. Using these two processes, the GA proceeds from an initial collection of random solutions to nd
some desired solution. The critical component that determines the success or otherwise of the search is
called the tness measure. The tness measure assigns a numerical tness to each solution, enabling the
algorithm to decide which ones are the `ttest', i.e. the best. You do not need to know anything about
biology to implement this assignment.
This assignment is partly an implementation exercise but is also a design exercise. The design that you
choose will aect your mark. You should seek to utilise the object-oriented principles that we have been
discussing in lectures since week 3.
Algorithm overview
The GA starts with a collection of random individuals. This collection is usually of a xed size, for ex-
ample, 100. An individual is an attempted solution of the particular problem. Since all initial individuals
are random, none of them probably represents a particularly good solution. It will usually be the case,
however, that even though they are random, some individuals will be better than others.
Each individual has a chromosome. A chromosome is a collection of genes of xed size. Each gene on a
chromosome can have one value from a set of values. For example, in a text-based task, each gene might
have the value of one of the alphabetic characters (like the `weasel' example we discussed). So, a decision
must be taken about what are the legal values of a gene. In a task involving binary numbers, for example,
gene values can only be 1 or 0. No other values will be legal.
The GA follows the following steps. Please note that other implementations of the GA might follow a
dierent process. This is the process that we are adopting, however.
(1) The rst step of the GA is to create a population of random individuals. Each individual has a chro-
mosome that will have been initialised to random values from the range of legal values. The population
has a xed size, e.g. 100.
(2) The mutation operation is then applied to the whole population. Mutation is discussed in more detail
below.
(3) The GA then assesses the tness of the population. This means it must decide which are the best
individuals in the population. To do this a tness measure is required. The GA applies a tness measure
to each individual and assigns each individual its own tness value. The tness measure depends entirely
on the purpose of the search. For example, to search for a binary string that consists only of ones, we
would use a tness measure that rewards how many ones there are in a string: the more the better. It is
a good idea to sort the population according to this tness measure.
(4) The next step involves the creation of new individuals. We decide how many new individuals we want
(for example, 20 out of a population of 100). We then select the best individuals from the population so
that they can be used in a reproduction process to create the new individuals.
For example, we might choose 10 of the best individuals. These are called the parents. If the population
has been sorted for tness, this step is easy. We then randomly select a pair of individuals from these best
10 and take the following steps with them.
2
(5) We create two new individuals (children) from each pair of parents by using the crossover operation.
This is described in more detail below.
(6) Once we have performed these operations and have our new 20 individuals (from doing this repeatedly),
we prepare to put them into the general population. To maintain the population size, we remove the 20
worst individuals and add the new 20 individuals.
(7) We then return to step (2) and continue with this process until we reach the desired goal or until a
certain number of generations have been processed.
Crossover
The crossover operation is a simple array copying exercise. Assuming an individual chromosome is stored
in some kind of array then we do the following to perform crossover.
Assume we have selected two parents: parent1 and parent2.
We randomly select a crossover point. This needs to be somewhere between the start and the end of the
chromosome. Technically, we view the crossover point as lying between two genes. We then create two new
chromosomes (hence, individuals). The rst new chromosome is created by copying the gene values from
parent1 up to the crossover point and from parent2 after the crossover point. The second new chromosome
is created by copying the gene values from parent2 up to the crossover point and then from parent 1 after it.
Consider the following example that uses binary strings:
parent1 = [1,0,1,1,0,1,1,0]
parent2 = [0,0,0,1,1,0,1,1]
Select a random crossover point, e.g. 3. We will assume, therefore, that the crossover occurs between genes
3 and 4.
child1 = [1,0,1,1,1,0,1,1]
child2 = [0,0,0,1,0,1,1,0]
Note that crossover is not always performed. Crossover occurs according to some (low) probability. If
crossover does not occur for a particular pair of parents then each of them is simply copied to create the
new children. See the section on parameters.
Mutation
Mutation involves changing the value of a gene. Of course, its new value must come from the legal set of
values. For binary strings, for example, we simply change 0 to 1 and vice versa.
Consider individual1 and individual2 below.
We choose a random mutation point for individual1, e.g. 5. We then mutate the gene value at that position.
individual1 = [1,0,1,1,1,1,1,1]
3
We choose a random mutation point for individual2, e.g. 0. We then mutate the gene value at that position.
individual2 = [1,0,0,1,0,1,1,0]
We do this for the entire population.
Again, mutation of any gene value is not always performed. It is performed according to some (low)
probability. Each gene is independently mutated or not mutated. That means there is a high chance that
no gene will be mutated but also a very small chance that more than one will be mutated. See the section
on parameters.
Parameters
The GA requires that some parameters be specied. Not command line parameters, but some constants
that it uses to dictate the operation of the algorithm. These parameters are largely down to you but there
are some guidelines below.
Size of population: You should start out with a population size of 100. You can vary this value to see
if you get better results.
Number of parents: This should start at something like 10% of the population size. Again, you can
vary this to see its eect.
Number of children: This will be the same as the previous parameter. Each pair of parents has two
children.
Crossover probability: This should be set to a low value, e.g. 0.02. This means that each pair of parents
has a very low chance of being involved in the crossover operation. Again, experiment with this value to
see its eect.
Mutation probability: This should be set to a low value, e.g. 0.01. This means that each gene has a
very probability of being mutated. Again, experiment with this value to see its eect.
Implementation
You need to implement the following classes.
Individual
This class represents an individual in the GA. It must have the following methods:
public double getFitness()
This method returns the current tness of this individual.
public String toString()
This method returns a String representation of the individual. For any individual that uses characters,
this must be a string containing the chromosome but as a single string of characters.
4
GAApplication
This class is the base class for all types of GA.
It must contain the following methods:
public void run()
This method processes one generation of the GA. This means that in this method you must perform the
mutation operations, crossover operations, etc. for one iteration of the algorithm.
public Individual getBest()
This method returns the current best individual.
BinaryMaximiser [50 marks]
The BinaryMaximiser class represents a GA that will maximise a binary string of arbitrary length.
This class extends GAApplication.
It must have a constructor that receives an integer: the length of the binary string to maximise.
This application seeks a binary string that contains only ones. Thus, the tness measure should be a count
of the number of ones in a string. The mutation operation should change a zero to a one and and a one to
a zero. The binary string should be represented as a String for ease. Thus, each character will be `1' or
'0'.
Weasel [25 marks]
The Weasel class represents a GA that can nd an arbitrary search string.
This class extends GAApplication.
It must have a constructor that receives a String: the string to nd.
The tness measure for this application requires a little more thought. It seems right that the closer to
the target string a string is, the tter it is. This naturally gives a lower number for a higher tness. You
should consider how you will deal with this.
The mutation operation should change the value of each alphabetic character either `up' or `down'. For
example, a `B' becomes a `C' or an `A'. What should an `A' become if it is mutated downwards? This
depends on what `alphabet' you select for your strings. There is no need to restrict the strings to only the
alphabetic characters (plus space), apart from the fact that it will nd the solution more quickly if you
do. If you look at the ASCII table, it is actually convenient to use the space character as the lower limit
of the legal value. You should, however, exclude all of the control characters (lower than ASCII value 32).
Maths [25 marks]
The Maths class represents a GA that can nd an integer target that is greater than zero from a mathe-
matical expression.
5
This class extends GAApplication.
This class should have a constructor that receives two parameters: one being an integer representing a
value greater than zero, that will be the target; the other being an integer representing the length of the
expressions that will be created.
This application works by creating strings that represent mathematical expressions using the four arith-
metic operators. There are no brackets in the expressions but the correct order of operations should
be respected. Again, since this application uses strings, you need to look at the ASCII table to nd a
suitable range of values. You will clearly need to include the arithmetical operators and digits in the range.
The tness measure is the result of evaluating the expressions. An expression that has a result that is
closer to the desired integer is tter. Note that some strings will be impossible to evaluate. The initial
population is randomly generated and the GA operations might result in impossible operations. In these
cases you should evaluate the strings to have the lowest possible tness. You can decide what that is.
Gene value ranges
As you have read above, you need to select appropriate gene value ranges for the individuals in each ap-
plication.
For BinaryMaximiser the range is easy. Each gene can only have the value `0' or `1' (remember: these are
characters not numbers).
For Weasel you can use most of the ASCII codes. You should exclude the control characters though (they
have values less than 32). The more characters you include in the range, the longer it will take to nd the
target string. This is OK, though. And it's a bit more interesting to watch your target string emerge from
a `soup' of characters.
For Maths, you can also use the same alphabet as for Weasel but I recommend restricting it a bit more.
Just try and include the arithmetical operators and the digits 0 to 9.
Hint
The best way to approach this problem is by initially creating the BinaryMaximiser class. This will teach
you most of the things you need to know to implement the whole assignment.
6