word,,,。
Overview
For this project, you will develop an algorithm to identify the most likely sequence of typos a text underwent in the process of being typed. Designing and implementing this solution will require you to model the problem using dynamic programming, then understand and implement your model.
You are only allowed to consult the class slides, the textbook, the TAs, and the professor. In particular, you are not allowed to use the Internet. This is a group project. The only people you can work with on this project are your group members. This policy is strictly enforced. You may wish to review section 8.2 in the text, which describes a related problem.
In addition to the group submission, you will also evaluate your teammates’ cooperation and contribution. These evaluations will form a major part of your grade on this project, so be sure that you respond to messages promptly, communicate effectively, and contribute substantially to your group’s solution. Details for your team evaluations are in Section 5.2. You will submit the peer evaluations to another assignment on Canvas, labelled “Project 2 (individual).”
A word of warning: this project is team-based, but it is quite extensive and a nontrivial task. You are highly encouraged to start working on (and start asking questions about) this project early; teams who wait to start until the week before the due date may find themselves unable to complete it in time.
Problem Description
Your algorithm will accept a target string and a typo string, and it will compute the minimum cost for transforming the target string into the typo string, as well as a sequence of typos with this cost. The algorithm should recognize 4 different kinds of typos: insertions (adding extra characters), deletions (leaving out characters), substitutions (typing one character in place of another), and transpositions (swapping the order of two characters that appear next to one another).
Typo cost
Computing the cost for a particular typo, though, is a complex process, depending on the typo involved and where the keys for the related characters are located on the keyboard. The rules for computing the typo cost are summarized in the table below:
| Typo |
Condition |
Cost |
| Inserting |
Repeated character |
1 |
| |
Space after key on bottom row |
2 |
| |
Space after something else |
6 |
| |
Character before a space |
6 |
| |
Before or after another key on same hand |
d(k1, k2) |
| |
Before or after a key on opposite hand |
5 |
| Deleting |
Repeated character |
1 |
| |
Space |
3 |
| |
Character after another key on same hand |
2 |
| |
Character after space or key on different hand |
6 |
| |
First character in string |
6 |
| Substituting |
Space for anything or anything for space |
6 |
| |
Key for another on same hand |
d(k1, k2) |
| |
Key for another on same finger, other hand |
1 |
| |
Key for another on different finger, other hand |
5 |
| Transposing |
Space with anything else |
3 |
| |
Keys on different hands |
1 |
| |
Keys on the same hand |
2 |
Note that rules in the table above that reference a “hand” do not apply to spaces, which have separate rules. For ambiguous cases, you should report the minimum cost. This is especially notable when inserting a character between two others. For example, if you inserted a ‘m’ between an ‘a’ and an ‘m’, the cost would be 1 (for the repeated character), not 5 (for inserting after a character on the opposite hand).
The cost for two of the cases in this chart are given as d(k1 , k2 ), which represents the keyboard distance between these two keys. When computing keyboard distance, we assume that the typist is using a standard QWERTY keyboard where the keys can be arranged in 4 rows and 10 columns:
1234567890
qwertyuiop
asdfghjkl;
zxcvbnm,.
(You may assume that the strings will consist only of these characters and space.) The distance between a key in row r1 and column c1 and a key in row r2 and column c2 can be computed as:
d(k1, k2) = max{|r2 - r1 |, |c2 - c1|}
When minimizing this cost function, we can assume that typos are generated in left-to-right order (i.e., the order in which the typist is typing). This rule exists to prevent the algorithm from generating dubious typos like substituting one character for another on the same finger of the other hand, inserting or deleting the next character, then undoing the substitution. It also has the happy side effect of making this problem an excellent candidate for dynamic programming. You should consider transpositions to affect the first character being transposed, deletions to affect the position of the deleted character, and insertions and substitutions to affect the location of the new character.
An example for how to score various typos in a string is given.
Project report
㹮ഠ⁵†㱰㹴ഠੲ⁰⁴†⁹㱯㹳൨੯⁵†⁵†㱢㹥䥦㰠⽴㸱രਠⁱ⁵⁴‼ 㱰㸊䤼†䌠䠠䄼‾⁵⍣㱮⼠㹥ൡ੫†⁷††㱬⽡㹥ഠੰ†㱮㹡൮†⁴††㱲㹯䑲㱥⽲㹮൳ੴ‿†⁙⁵†㱡㹷䑥⁵⍤㱩⽮㹵†⁷†⁴㰠⽳㹵൴੩†⁴†⁴㱥㹲൩੧†⁰㰠㸠㱦⽲㸠൴੨†⁵⁰㱳㸯䌾䡗䅨†⍬㰠⽴㸠൰⁴㰠⽢㹦൯ੲ†⁹⁵†㱣㹳൩੶†⁵⁴‿‼㰯㸾㱴⼠㹣൵ੲ††⁹㱵㹳⍴⌠⭰ㅲ㱢⽬㸠൵ੳ†⁹㱭⽩㹰൲੯㱭⽩㹬൩ਾ㰍⼊㹬൩ਾ㱗㹡䥴ⱥ†䍡䡳䅥⁵ ††ⱴ⁵⍥⁷†⡥〠†⁴†⥵Ⱳ⁴⁴†‾⁶⡺ⱹ♡㭲†㙡㝧♯㬠⥴††⁰⁵⁴⡢ⱹ⁰ ‾†☠㭣䥯⁹㍵♲㭩⁺♬㭨䥭‾ 㐾♇㬠⥳㱴⼠㹮ੳ䉴⁴⁵††䘠ⱬ †⁃†⁴♴㭶☠㭩⁴†⁶㍥⁴♥㭤♡㭴⁵⁵ⰼ ⁴⁴㑮㱡⽧㸠ൡ੮㱤㍤㵤≶䕡≯㹦䔠㱥⽲㍴㹩൶㰠㹯㱲㸬㩮⁴††ੴ㩯⁶‼•⁹൳ਢ‾⁃†べㅯ㉵㍲㐠㕳㙯㝬㡵㥴どㅯ㉮㍳㐼㔯㙨㜲㠾㤍《ㄼ㉰㌾㑉㕮㘠㝡㡤㥤どㅴ㉩㍯㑮㔠㙴㝯㠠㥴とㅥ㈠㍲㑥㕰൯ੲ㱴⼬㸠㱳⽨㹤ഠ੩㱭㹬䍥†㩥㱤⼠㹹൮㱭㹰㱲㹭㉭㝩൮੧ㄠ㉨ൡੴ䤠㑥ഠ੭⁹㘠㝹൰੯䥳†㡴൲੩䥮⁵†ㅹ〠ൢ䐠ㄠ㍯൲ਠ†ㅩ㑴ⴠㅭ㕵൳ੴㅮ㕤ⴠㅲ㙵൮ਠ†㉯㑮൭†⁵㉳㙩൮੧⁰㉮㝧ഠ†㉢㡥ഠ䥣⁴㍨づഠ‼㌾㐼ൣ੯†㍴㥹⵰㑯〠പਮ㑥㐾ⴼ㐯㕰൲㠾ਊㄼばാ㉯㑵ഠੳ㱨⽯㹩㱮⽣㹥ഠ㰠㹡ㄠ㑯ⵦㄠ㙣† ⁉†ⱳ†⁴†⸠㱩⽮㹊ൡ੶㱡㌠⁹㴠≳ⵢⵡⵥ≪㹡㰠⽥㍴㹨ੲ㰠㹡†⁰†† †⁴†⁴⸰† †† ⁰⁰⁴Ⱜ†⁴⁔††⁴ⱥ⁴†Ⱐ⁴⁸ⱥ⁴⡥††††⁰⸠⥩†⁴ⱴ⁙⁴††⁰⁴††⁴⁰†㰠⽮㹩൴੨㱥㸠††⁷†⁴㭦⁵†††㱧⽩㹧ഠੴ㱨㈠㵴∠≯㹲㱮⽰㉴㸮ഠਨ㱙㹵⁵⁴†⁴⁴Ⱞ
††㰠⽷㹴൨ਠ㱥㍡㵹∠䝲≵㹭䝮㱴⽨㌠㹮൵੭㱢㹲†⁴††ㅥ⤠⁴⁵⡨†㍡⤠䐯䙰‾ⰽ•⁰㉵⥴•⁰⠳㑯⥵䤠⁵⁴⁵⁰⁰††⁵⁵†⁵䵭⽡⁴⁴⁴†☠㭮⁹㉰⡳⤠♴㭯⁰†䍦㱴⽩㹬ਠ㱰㍯㵭∠∠㹳†㰠⽬㍮㹥മ਼㱢㸠 ††⁔†⸠⁴††⁰††ㅨ⥴††⁹†⡴⤾Ⰽ ㉔⥨⁶ⱥ㍩⥦†††⁴ⱴ†㑣⥨†⁴†⁰⁴ⱨ†㕭⥡⁴⁰† †‼ ☠㬾⁔⸠‼⁔‾†⁴ ㌀ ⸀㰀 ⼀㸀ഀ䤀☀㬀 ⸀ 䤀 ⠀ ⤀ Ⰰ ⸀ 䘀 Ⰰ ⸀㰀 ⼀㸀ഀ ☀㬀 ㈀ ⠀⤀☀㬀 䌀⸀㰀⼀㸀