首页 > > 详细

解析Python编程、Python设计辅导留学生、讲解SCC graph

0. Read the note at the end!
1. (10pt) A context-free grammar, G, is a device that generates strings. It has a set of symbols, V,
of two types: terminals, , and nonterminals, V , and a set of rules R (V ) V . Starting
from a designated starting symbol S2V , G generates strings by applying production rules; a
rule A! is used by replacing A with . The language L(G) of G is the set of terminal strings
(i.e., over ) produced from S.
A grammar is in Chomsky normal form. if productions are of the form. A!BC or A!a, for
A;B;C2V , a2 . Here is an example: G with V =fS;A;L;Rg, =f(;)g, and R:
1: S ! SS
2: S ! LA
3: A ! SR
4: S ! LR
5: L ! (
6: R ! )
The language of G, L(G), consists of all strings with correctly nested parentheses. For example,
the string w = (())()2L(G) can be generated as follows (the nonterminal used is underlined and
rule number is indicated on top of the double arrow):
S 1) SS 2) LAS 3) LSRS 4) LLRRS 4) LLRRLR 5) (LRRLR 5) ((RRLR 5) ((RR(R 6)
(()R(R 6)(())(R 6)(())()
Give an O(n3)-time dynamic programming algorithm that checks whether a given string w =
w0w1:::wn 1 over belongs to the language L(G). Note that the grammar G is xed; the
algorithm is built using G. Its complexity depends only on the length n of the input string w.
Explain why your algorithm is correct, give the pseudocode, and analyze its running time. Also,
run your algorithm on the above grammar G and string w = (())() and show your work.
2. (10pt) A directed graph G = (V;E) is semi-connected if, for all pairs of vertices u;v2V, there is
a path from u to v or a path from v to u. Give an O(n+m)-time algorithm to determine whether
or not G is semi-connected (n = jVj;m = jEj). Explain why your algorithm is correct, give the
pseudocode, and analyze its running time. Do not include pseudocode for algorithms we studied.
Just call them.
Hint: Think about the SCC graph.
3. (10pt) Given a connected weighted undirected graph, G, with n vertices and m edges, such that
the weight of each edge is an integer in the interval [1::c], for a xed constant c > 0, give an
O(n+m)-time algorithm to solve the single-source shortest paths problem in G. Explain the idea
only. No pseudocode required.
Hint: Modify Dijkstra’s algorithm to run in O(n+m)-time or, simpler, :::.
4. (10pt) (textbook) A-15.3. Suppose NASA wants to link n stations spread out over the solar system
using free-space optical communication, which is a technology that involves shooting lasers through
space between pairs of stations to facilitate communication. Naturally, the energy needed to connect
two stations in this way depends on the distance between them, with some connections requiring
more energy than others. Therefore, each pair of stations has a di erent known energy that is
needed to allow this pair of stations to communicate. NASA wants to connect all these stations
together using the minimum amount of energy possible. Describe an algorithm for constructing such
a communication network in O(n2) time. (The gas used to travel between stars is not included in
the cost.) Explain the idea only. No pseudocode required.
5. (10pt) (textbook) A-16.1. In 2006, the city of Beijing, China, instituted a policy that limits
residents to own at most one dog per household. Imagine you are running an online pet adoption
website for the city. Your website contains pictures of adorable puppies that are available for
adoption, and it allows for dog-less Beijing residents to click on as many puppies as they like, with
the understanding that they can adopt at most one. Suppose now that you have collected the puppy
preferences from among n Beijing residents for your m puppies. Describe an e cient algorithm for
assigning puppies to residents that provides for the maximum number of puppy adoptions possible
while satisfying the constraints that each resident will only adopt a puppy that he or she likes and
that no resident can adopt more than one puppy.
A-16.2. The city of Irvine, California, allows for residents to own a maximum of three dogs per
household without a breeder’s license. Imagine you are running an online pet adoption website
for the city, as in the previous exercise, but now for n Irvine residents and m puppies. Describe
an e cient algorithm for assigning puppies to residents that provides for the maximum number of
puppy adoptions possible while satisfying the constraints that each resident will only adopt puppies
that he or she likes and that no resident can adopt more than three puppies.
6. (10pt) The birthday paradox is not an actual paradox but a mathematical fact that is rather
unexpected for most people. It says that, given 23 people, the probability that two of them have
the same birthday is more than 50%. This is surprising since there are 366 possible birthdays, much
more than 23.
Here is an interesting application. A building has k apartments and each apartment is given
a randomly chosen 3-digit PIN code to enter the main door. What is the smallest number of
apartments in the building such that the probability that at least two PINs are the same is larger
than 50%?
7. (40pt) The purpose of this question is two-fold. First, you implement and compare two pattern
matching algorithms: a fast and a slow one. Second, you design and implement a pattern searching
algortihm in a text index and verify its searching power.
Write a program, patternMatching.py that implements:
(a) Brute Force pattern matching.
(b) Boyer-Moore pattern matching.
(c) Su x Array pattern searching (that is, binary search in the su x array; see slides chapter
23). You need to design the su x array pattern searching algorithm, binSAsearch, such that it
computes the number of occurrences of the pattern in O(mlogn)-time, where n is the length
of the text and m is the length of the pattern. Note that only the number of occurrences, not
the actual positions is required. Explain why your algorithm is correct, give the pseudocode,
and analyze its running time.
For each text and its associated patterns, the program outputs the number of occurrences of the
pattern in the text as well as the time in seconds. The program will take three command line
parameters: the input text: text.txt, the su x array of the text: text.sa, and the list of
patterns: text.pat
The command line is:
python patternMatching.py text.txt text.sa text.pat
I provide two input test les, Dostoyevsky.txt (Fyodor Dostoevsky’s famous novel \Crime and
Punishment"), 1.2MB, and E.coli (the genome sequence of Escherichia coli, a common bacterium),
4.7MB. I provide also their su x arrays: Dostoyevsky.sa and E.coli.sa and the lists of patterns
to be searched for: Dostoyevsky.pat and E.coli.pat. The two texts are very di erent from
the pattern matching point of view as one has a large alphabet (English), the other very small
(fA,C,G,T,Ng). Also, the patterns provided range from short to long and and occurring frequently
to occurring uniquely.
Here is what the output should look like:
text: "Dostoyevsky.txt"
pattern: "the"
occurrences: 11411
Suffix Array: 0.0001 s
Brute Force: 0.565772 s
Boyer-Moore: 0.416118 s
text: "Dostoyevsky.txt"
pattern: "Raskolnikov"
occurrences: 784
Suffix Array: 0.0001 s
Brute Force: 0.618492 s
Boyer-Moore: 0.126854 s
text: "Dostoyevsky.txt"
pattern: "dishevelled"
occurrences: 1
Suffix Array: 9.10000000003e-05 s
Brute Force: 0.59254 s
Boyer-Moore: 0.121932 s
text: "E.coli.txt"
pattern: "AAA"
occurrences: 104915
Suffix Array: 0.000219 s
Brute Force: 2.411402 s
Boyer-Moore: 1.794779 s
text: "E.coli.txt"
pattern: "ACGT"
occurrences: 13733
Suffix Array: 0.000134 s
Brute Force: 2.484364 s
Boyer-Moore: 2.784615 s
text: "E.coli.txt"
pattern: "NNNNNNNNNNNNNNNNNNNNNNNNNNN"
occurrences: 11406
Suffix Array: 0.000278 s
Brute Force: 2.791543 s
Boyer-Moore: 0.288293 s
text: "E.coli.txt"
pattern: "ATATCAGTTATATTTAAACTAAATTAAAGTCATGAATAAT"
occurrences: 1
Suffix Array: 0.000147999999999 s
Brute Force: 3.59928 s
Boyer-Moore: 0.913273 s
What you notice is the time di erence. Boyer-Moore algorithm is up to ten times faster than Brute
Force but it can be slightly slower when the string is very short, the alphabet small, and the string
has all alphabet letters. Also, note that the length of the pattern is important, the number of
distinct characters is important, but the number of occurrences is not. Boyer-Moore performs the
best on the long one-letter pattern.
The Su x Array on the other hand is four orders of magnitude faster, as its complexity depends
only logarithmically on the text size. You need to be careful how you implement the Su x Array
search so that you obtain this di erence.
In case you want to compute your own su x arrays, I provide also a program for that: suffixArray.py.
This is not a very fast implementation, but rather a simple one. Highly optimized algorithms exist.
Note { You may search the web and discuss with your colleagues but do not copy! It’s not useful!
{ Submit your solutions, as one pdf le, and programs as py les to owl.uwo.ca. If any output is
required from the programs, include it in the pdf le.
{ The assignments must be typed, preferably in LATEX1 (do not submit the tex le).
{ The programs must be written in Python 2.7. They may be tested on any legal inputs, not only
those provided with the assignment.
{ The algorithms are given in pseudocode unless speci ed otherwise.2
{ Understanding the assignment is part of solving it. Read everything carefully and try your
best to understand it before asking questions.
{ The solution can use anything as long as it arguably solves the problem as given.
{ When explaining solutions, provide the essential information. Do not omit necessary details
but also do not waste time writing obvious facts. As a guideline, explain only what is not obvious
to you. Explaining things clearly and succinctly is a very important skill.
{ There is no penalty for submitting multiple times (no limit). Only the last submission is graded.
1It is free and by far the best for scienti c writing. Here is a quick start: https://tobi.oetiker.ch/lshort/lshort.pdf
2There are many LATEXpackages for writing pseudocode; for the assignments I am using the algorithmicx package:

联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!