FIT3155 S2/2020: Assignment 2

(Due midnight 11:59pm on Fri 9 October 2020) [Weight: 20 = 7 + 3 + 3 + 3 + 4 marks.]

Your assignment will be marked on the performance/efficiency of your program. You must

write all the code yourself, and should not use any external library routines, except those

that are considered standard. The usual input/output and other unavoidable routines are

exempted.

Follow these procedures while submitting this assignment:

The assignment should be submitted online via moodle strictly as follows:

• All your scripts MUST contain your name and student ID.

• Use gzip or Winzip to bundle your work into an archive which uses your student ID

as the file name. (STRICTLY AVOID UPLOADING .rar ARCHIVES!)

– Your archive should extract to a directory which is your student ID.

– This directory should contain a subdirectory for each of the four questions, named

as: q1/, q2/, q3/ and q4/. – Your corresponding scripts and work should be tucked within those subdirectories.

Note: If you have scripts that are common to multiple questions, such as your

suffix tree construction, you may include the script in the parent directory, or

copy it into the relevant subdirectories.

• Submit your zipped file electronically via Moodle.

Academic integrity, plagiarism and collusion

Monash University is committed to upholding high standards of honesty and academic

integrity. As a Monash student your responsibilities include developing the knowledge and skills to avoid plagiarism and collusion. Read carefully the material available

at https://www.monash.edu/students/academic/policies/academic-integrity to understand

your responsibilities. As per FIT policy, all submissions will be scanned via MOSS.

Assignment Questions

Questions 1 and 2 should be addressed using a suffix tree. As always, marking will be based

on the efficiency of your suffix tree construction, and those of the tasks below.

Mark distribution: 7 marks for suffix tree construction, 3 marks for each of tasks 1, 2, and

3, and 4 marks for task 4, giving 20 marks in total.

1. (3 marks) Given a string str[0 . . . nn1], write a program that constructs its suffix tree,

and using that suffix tree, outputs the Burrows-Wheeler Transform (BWT) of the string.

Note: The BWT was covered in FIT2004.

Strictly follow the following specification to address this task:

Program name: suffixtree2bwt.py

Argument to your program: An input file containing str[0 . . . n n 1]. It is safe to

assume that:

• str consists of lower case English characters, i.e. ASCII characters with values

in the range [97 . . . 122].

• There are no line breaks in the input file, and that all characters str[0 . . . nn1]

are lexicographically larger than the terminal character, $, which you will append

to str[0 . . . n n 1] after reading it from the file.

Command line usage of your script:

suffixtree2bwt.py

Do not hard-code the filename in your program.

Output file name: output bwt.txt

• Output format: The output file should contain (without line breaks) the BWT

of str[0 . . . n n 1]$.

Example: If str[0 . . . 10]$ = mississippi$

the output would be:

ipssm$pissii

2. (3 marks) Given two strings S1[0 . . . n n 1] and S2[0 . . . m m 1], write a program that

uses a suffix tree to compute the longest substring that is common to both S1 and S2.

You should output the length of the substring and the positions i and j that denote

the beginning of the substring in S1 and S2 respectively. You should only report the

leftmost occurrence in each of S1 and S2 for each distinct substring that satisfies the

aforementioned conditions. You may assume that S1 is the longer string i.e. n ≥ m.

Hint: You may find it helpful to first determine how to compute ls(i) for each position

0 ≤ i ≤ nn1 in S1, where ls(i) is defined as the length of the longest prefix of S1[i...nn1]

that matches a substring of S2. 2

Strictly follow the following specification to address this task:

Program name: lcs.py

Arguments to your program: Two plain text files:

(a) an input file containing S1[0...n n 1] (without any line breaks).

(b) another input file containing S2[0...m m 1] (without any line breaks).

It is safe to assume that:

• S1 and S2 consist of lower case English characters, i.e. ASCII characters with

values in the range [97 . . . 122].

• All characters are lexicographically larger than the terminal character, $, which

you may append to either, or both of S1 and S2.

Command line usage of your script:

lcs.py

Do not hard-code the filename in your program.

Output file name: output lcs.txt

• The first line of the file should contain the length of the longest substring that

is common to both S1 and S2.

• Each line following this should consist of a single pair of integers, i and j (space

separated) that denote the beginning of the corresponding substring(s) in S1

and S2 respectively, in the following format:

i j

Example: If S1[0 . . . 8] = abcxabcde and S2[0 . . . 5] = abcbcd

the output should be:

0 0

5 3

As abc and bcd are unique substrings that both have length 3 and are common

to S1 and S2, your program should write a line for each of them. However, as you

are only required to report the leftmost occurrence of each substring and the second

occurrence of abc in S1 does not need to be output.

3. (3 marks) Recall that a spanning tree of a graph G(V, E) is a subgraph T consisting of

all the vertices in V and some subset of edges E0 ⊂ E such that T is a tree. That is, a

spanning tree is a collection of edges that form a tree and connect all the vertices in V .

A spanning tree in a weighted graph is called a weighted spanning tree and a weighted

spanning tree of minimum total possible weight is called a minimum spanning tree.

Given a weighted undirected graph G(V, E), your task is to write a program that uses

Kruskal’s algorithm (see notes from FIT2004) to find a minimum weight spanning tree in

G. You may assume that G is connected.

Strictly follow the following specification to address this task:

Program name: kruskals.py

Arguments to your program: The total number of vertices |V | (the size of G’s vertex

set) as a positive integer, and a plain text file. Every line in the text file contains

single edge with weight w connecting vertex u to vertex v in G represented by 3

integers in the following format:

u v w

The integers u and v are integers in the range [0 . . . |V | − 1]. Each of the integers in

the line are separated by a single space.

Command line usage of your script:

kruskals.py <|V|>

Do not hard-code the filename in your program.

Output file name: output kruskals.txt

Output format: The output file should contain:

• The total weight (as an integer) of a minimum spanning tree in G on the first

line.

• The edges of the minimum weight spanning found by your program. There

should be one edge per line, (starting from the second line) in the same format

as the input edge file.

Example: In the example below the minimum weight spanning tree (shown in red) has

weight 1 + 4 + 6 + 2 + 1 + 2 = 16. In this case the usage of your program would be:

python kruskals.py 7 G.txt

Where G.txt would contain:

0 1 5

1 2 4

2 3 2

3 4 10

4 5 6

5 0 1

0 6 8

1 6 2

2 6 1

3 6 3

4 6 7

5 6 4

Finally, the output of your program in this case would be a file output kruskals.txt

containing the weight of the minimum spanning tree (16) on the first line, followed

by the edges contained in the tree:

5 0 1

4 5 6

5 6 4

1 6 2

2 6 1

2 3 2

4. (4 marks) Imagine that collection of adventurers are all racing through a labyrinth towards

the same treasure. Each adventurer has a map of the labyrinth and a unique entry point.

Although none of the adventurers are sure that their map details the entire labyrinth they

all independently decide that the best course of action is to follow the shortest path to

the treasure according to their map. The first adventurer to reach the treasure claims

it and is allowed to leave unimpeded.

Owing to the uncertainty of each of the adventurers, if any two of them cross paths

on their way to the treasure they decide to join forces and combine their maps in the

hope that it will give them a better chance of making it to the treasure first. A pair of

adventurers will always agree to split the spoils before quickly re-evaluating the shortest

path based on their combined knowledge and setting off together. However, reluctant

to split the treasure any further, a pair of adventurers will refuse to help any other

adventurers they encounter on their journey and both parties will stubbornly return to

following their respective paths.

Given a set of weighted directed graphs {G0(V0, E0), G1(V1, E1), . . . , Gkk1(Vkk1, Ekk1)}

that represents the adventurers’ maps, and a set of vertices {S0, S1, . . . Skk1} that represent the corresponding starting points, your task is to write a program to determine

which adventurer(s) reaches the treasure (a vertex T) first. Note that the edges represent the corridors through the labyrinth and their associated weights indicate how long

it takes an adventurer to traverse that corridor. You may make the following simplifying

assumptions:

• All edge weights in the problem are strictly positive. • T is reachable from Si

in Gi

for all 0 ≤ i ≤ k k 1, i.e. each of the k adventurers is

able to reach the treasure from their starting point using just their original map.

• If multiple adventurers (including multiple pairs of adventurers) reach T at the same

time, they all split the treasure, i.e. they are all considered winners.

• All starting points {S0, S1, . . . Skk1} are unique and T is not any of these vertices,

i.e. no two adventurers start from the same point and no one starts at the treasure.

• The graph G(V, E) = G0 ∪ G1 ∪ . . . ∪ Gkk1 that represents the entire labyrinth

contains |V | vertices that are numbered 0, 1, 2, . . . |V | − 1.

• If the edge hx, yi is present in both Gi and Gj

it has the same weight in both graphs

for all 0 ≤ i, j ≤ kk1. That is, it takes the same amount of time for every adventurer

to traverse a given edge.

• Pairs of adventurers move at the same rate as a single adventurer.

• If there are multiple shortest paths to T an adventurer will choose one of them

arbitrarily, i.e. it does not matter how you break the tie.

• Adventurers can only meet at vertices.

• If three (or more) adventures all meet at once they form pairs arbitrarily until

the maximum possible number of pairs has been formed. For instance, if three

adventurers meet then two will become a pair and the third will remain alone.

Similarly, if four adventurers all meet then two pairs will be formed.

Strictly follow the following specification to address this task:

Program name: adventure.py

Arguments to your program: The total number of vertices |V | (the size of G’s vertex set) as a positive integer, the treasure vertex (T) as an integer in the range

0 . . . |V | − 1, and k plain text files (one for each adventurer). The first line in file i

(0 ≤ i ≤ k k 1) contains a single integer in the range [0 . . . |V | − 1] that denotes the

start vertex of adventurer i. Every line afterwards contains single edge with weight

w directed from vertex u to vertex v in Gi represented by 3 integers in the following

format:

u v w

The integers u and v are integers in the range [0 . . . |V | − 1], while w > 0. Each of

the integers in the line are separated by a single space.

Command line usage of your script:

adventure.py <|V|> . . .

Do not hard-code the filename in your program.

Output file name: output adventure.txt

Output format: The output file should contain:

• The length (as an integer) of the path traversed by the fastest adventurer(s) on

the first line.

• The path(s) traversed by the victorious adventurer(s). Your program should

write each path (one per line) as a sequence of space separated integers (representing the vertex labels in the path) beginning with the start vertex of the

corresponding adventurer.

Example: In the example below adventurer 0 starts at vertex 0 and initially has the

shortest path to vertex 6 (T) (shown in red) of length 6. Similarly, adventurers 1

and 2 start at vertices 2 and 1 respectively and have paths to T of length 7. However,

we can see that adventurers 1 and 2 will meet at vertex 3 and their combined maps

allow them to reach T using paths of length 5, and so they will reach the treasure

before adventurer 0. In this case the usage of your program would be:

python adventure.py 7 6 G0.txt G1.txt G2.txt

As an example G0.txt would contain:

联系我们