首页 > > 详细

Python辅导:CS4117 Music Artist Lyrics Model辅导Python编程

NLP,。,。

Core Description

For the core, you will implement a program that creates a model of a music artist’s lyrics. This model receives lyric data as input and ultimately generates new lyrics in the style of that artist. To do this, you will leverage an NLP concept called an n-gram and use an NLP technique called language modeling.
Your understanding of the linked concepts and definitions is crucial to your success, so make sure to understand n-grams, language modeling, Python dictionaries as taught in the warmup, and classes and inheritance in Python before attempting to implement the core.
The core does not require you to include any external libraries beyond what has already been included for you. Use of any other external libraries is prohibited on this part of the project.

Core Structure

In the language-models/folder, you will find four files which contain class definitions: nGramModel.py, unigramModel.py, bigramModel.py, and trigramModel.py. You must complete the prepData, weightedChoice, and getNextToken functions in nGramModel.py. You must also complete the trainModel, trainingDataHasNGram, and getCandidateDictionary functions in each of the other three files.
In the root CreativeAI repository, there is a file called generate.py, which will be the driver for generating both lyrics and music. For the core, you will implement the trainLyricsModels, selectNGramModel, generateSentence, and runLyricsGenerator functions; these functions will be called, directly or indirectly, by main, which is written for you.
We recommend that you implement the functions in the order they are listed in the spec; start with prepData and work your way down to runLyricsGenerator.

Getting New Lyrics (Optional)

If your group chooses to use lyrics from an artist other than the Beatles, you can use the web scraper we have written to get the lyrics of the new artist and save them in the data/lyrics directory for you. A web scraper is a program that gets information from web pages: ours, which lives in the data/scrapers directory.
If you navigate to the data/scrapers folder and run the lyricsWikiaScraper.py file, you will be prompted to input the name of an artist. If that artist is found on lyrics.wikia.com, the program will make a folder in the data/lyrics directory for that artist, and save each of the artist’s songs as a .txt file in that folder.

Explanation of Functions to Implement

prepData

The purpose of this function is to take input data in the form of a list of lists, and return a copy of that list with symbols added to both ends of each inner list.
For the core, these inner lists will be sentences, which are represented as lists of strings. The symbols added to the beginning of each sentence will be ^::^ followed by ^:::^, and the symbol added to the end of each sentence will be $:::$. These are arbitrary symbols, but make sure to use them exactly and in the correct order.
For example, if the function is passed this list of lists:

[ [apos;heyapos;, apos;judeapos;], [apos;yellowapos;, apos;submarineapos;] ] 

Then it would return a new list that looks like this:

[ [apos;^::^apos;, apos;^:::^apos;, apos;heyapos;, apos;judeapos;, apos;$:::$apos;], [apos;^::^apos;, apos;^:::^apos;, apos;yellowapos;, apos;submarineapos;, apos;$:::$apos;] ] 

The purpose of adding two symbols at the beginning of each sentence is so that you can look at a trigram containing only the first English word of that sentence. This captures information about which words are most likely to begin a sentence; without these symbols, you would not be able to use the trigam model at the beginning of sentences because there would be no trigrams to look at until the third word.
The purpose of adding a symbol to the end of each sentence is to be able to generate sentence endings. If you ever see $:::$ while generating a sentence in the generateSentence function, you know the sentence is complete.

trainModel

This function trains the NGramModel child classes on the input data by building their dictionary of n-grams and respective counts, self.nGramCounts. Note that the special starting and ending symbols also count as words for all NGramModels, which is why you should use the return value of prepData before you create the self.nGramCounts dictionary for each language model.

  • For the unigram model, self.nGramCounts will be a one-dimensional dictionary of {unigram: unigramCount} pairs, where each unique unigram is somewhere in the input data, and unigramCount is the number of times the model saw that particular unigram appear in the data. The unigram model should not consider the special symbols ‘^::^’ and ‘ⰺ›⁳⁳⁩⁨⁤⁥⁳㼼㱣⁥⼾㸤ഺ਺䠺㩣†⁤⁢⁡⁩†⁤⁲⁥⁡⁳†⁲䐍䠠䝩⹲㰠⽴㹥ഠ੢㱩㍲⁡㵭≯䌠≯㹮䌠㱯⼭㍩㹭൥੮㱳㹯†⁢⁵⁴⁲›†⁲⁃⁽⁩⁡Ⱐ⁩†♡㭭⁳⁥⁥†⁵⁡⁡⸠⁢䥩⁲⁨⁰†⁡‬⁥⁹†⁡⁴⁡⁩⁶⸠㱯⁴⽨㹥ഠੂ⁤⁳†⁃⁴㩣㱴⽩㹮ൡੲ㱹㹵㱬㹯㭥㭩㨾‍㐊ⰼ 㬊㭥㨠⁣ㅬⱡ⁳㭩㭰㩹⁴㍨Ɐ㬼㬠㨠†㈠⁴㱢⽯㸊㰠⼠㸠ഠਠ㰼㹲䠾☠㬠†⁴⁣⁧⁴• ††⁥⹮⁥䘢ⱳ⁰⁢‾㨠⁣⁳⁰‼ ⁡⁣⁳Ⱟ⁳‾†‼⁳⁩‾⁢‾⸠†䠠☠㭴‾ †††♣㬢☾㬠††⁲⁳⁣⁳Ⱪ⁳‮♮㭯☽㬻‼‼†⁳⁣⁳⹥㰢⼾㸠ഠਠ㰼㹰䙡⁣ⱴ⁲‧⁡⁹ⱡ†′㝡⁳•⁩⁩ⸯ⁳†‼⁡⁢‰㈼Ⱟ⁳♰㬲☯㬼Ɫ⁲‾⁡⁳⁩††⁳⁲⁩‧㜼Ⱟ⁳›′♰㭡⁳‧⁥‼㠯⹳⁰ⱡ⁵⁲⁳⁰㉡⹮㰠⽣㹡൳ੳ㰽㍳⁴㵮≧≰㹡㱬⽡㍳㸽ഢ੮㱵㹢‴‱⁳†⹡䤠•‾㨯㱳⽰㹮ാ਼㱢㸯ാ਼ ⁰⁲⁥㰾㸠䌠†䌠䐊  †‾⁥ ⁦⹰㰾⼍㹵൬ਾ‍ ††㰠㹬⁲⁥⁡⁥†䌠䑲†⁲䍳⸮㰠⽉㹷൩੬†⁢⁥㰠㹲⁤⁧䍻⁡⹵㱮⁲⽡㹭ൔ੨⁲⁥⁥›䄠⁲⁧ⱽ‬⁥⁧†⁨⁣†⁴†⁳†⁧‫ⱔ⁷⁲⁥⁥⁡†⁩†⁰⁤⁩ ‾•Ɱ⁩⁡⁳⁴⁴⁎⁨⸼㱰⼾㹩൳ਠ㱦⽵㹴൩੯㱮㍴⁡㵳∠䱴䴠∠㹩䱩䴠㱧⽳㌠㹡൮੤㰠㹥⁳†⁴⁣†⁵†⁥⁳♤㭩⁥⁥⁴ⴠ⁡⁴ⱳ⁥⁥⵮††⁴ⱡ†⁤⁴⁥䑃䱴†Ⱪ⁴⁵‾䱬⸠⹯⁲䤠⁨⁩⁤⁴⁳⁲⁴⁩⁥⁳⁴⁹⁤⁡䱬⹯⹯⁲⁳⁳⁲⁩†⁩⁩⁥Ⅷ㱲†⽭㹯൤੥⁡⁵†⁧⁳⁤⁴⹯㱮⽴㹸൴ਮ㰼㍬⁩㴊∠䝆䵴≩㹧䝥䵴㱦⽵㍣㹴൩੯㱮㹲⁦⁤⁳⁴†⁤⁥⁲⁥Ɱ⁣⁴†⁴⁡⸮†䥈⁩ⁱ※⁥⁤⁴⁢⁤…†‮⁡†⁴㬠⁩ⱡ⁴ ††⹲⁩䥧⁡Ɐ⁤‬⁩⁥⁔⁵⁨⁥⁡⁴⁳⁴⁴⹯⁲䅤††ⱴ†⁴⁥⹲⁩††⁥⁩‍‼⁤⁡⁄⁴⁡⁹⁤䝴䵣⁡ ‾‾⁩⁴†⁩⁴⹦㰠⽣㹮൤੩㱤㍴⁥㵥≸≤㹥㰠⽳㍮㹴൥੮㱣㸮⁥⁣‬⁩⁴⁳⁨†⁷⁲⁡⁧†⁥⁳⁤⸠⁧†⁩⁡⁧⁡…ⁱ⁲⁩⁡⁩ⱳ†⁴⁥⁵⁡††⁴††℠㱩⁴⽨㹥ഠੳ䑥⁤⁳⁴⁷††⁴⁴⁷⁵㩴⁥⁦⁴䱹⁳Ⱐ⁴⁨⁴⁄⁈ⱇ⁲⁴⁦†⁣⁧⁧Ⱪ⁡†⁲⁥‮† †⁧†⁴⁷㱩㹯⑯㩫㨠㩡⑴㰠⽴㹡⹳⁴䤠†⁲†⁥⁲⁥Ⱜ†⁥⁥‮⹭㱃⽴㹳ഠੵ†⁲†⁥†⁤⁲⁥⹳††⁵†⁡⁥⁡‭⁡Ⅹ㱣⽨㹮ഠ਽㰠㌬†㴠≴䱵䝭≯㹯䱡䜠㰠⼰㍷㹯൲੤㱳㹩⁥⁥⁲⁦†⁥⁥⁲††⁤†⁣⁡※⁲⁷†Ⱐ⁵⁩ⱬ†⁳⹥⁴䕃⁩⽩⁲⁳⁲⁥㑮⁴Ɱ⁇⁵⁹⁥⹫㩤⁧⁵‬ⁱ⁲‮†Ⱐ⁷††⁡⁷†⁳⁲䱮⡬⁦⁩†⁴⤠⁹†ⴠ⁣†⁦⁧⁳†⁴⁥⁥†⹴㰠⁤⼠㹴൥ੳ䅴⁡†⁢⁲ⱴ⁩ⱦ⁩†⁴Ⰽ ⁩⁰‭⁤‭䱲•Ⱳ†⁩††⁧⸼  ⁦†⁧†⁡†††⁨⵴⁤⸠㱴⽨㸠൭੯㱤㉬⁳㵩≴䔦⵫⵹䜠ⴠ䙩≮㹣䕩†⁤䝥†䙳㱯⽷㈠㹹൯ੵ㱲㍰⁲㵲≡䰠≭㹯䱩㱦⽩㍩㹥൮੣㱹㹩⁳†ⱡ⁩⁣䱆⁡⁥⁩⁡䱯⹡䱵⁨⁥⁲⁥†⁨†⁨⁳‱⁴⹥⁳䤩†⁷⁴⁳⁴†⁤Ⱪ⁦䰰⁴⁡⁹†⁥ⰰ†⁳⁴⁨⁴†⸬‰‰⹩† ⁥⁳†⁡⁥⁦䰠㭥†⁩⁣⁴⁡†䱦ⱥ⁤⁲⁡⁰⸠⁰††⁡䙮⹵㱮″⼰㸠൳੥䥣†⁸⁲䑥䕮⁩††⁴⁰⁲⁩Ⱨ⁲⁡⁰⹩㱤⼽㹈൯੷㰭㍯‭㵮∭䱲∭㹇䱹㰾⽈㍷㸠൴੯㰠㹵⁵⁥⁥′‾⁹⁥⁳⁹⁡㩮†Ⱞ⁰Ⱪ⁣⹵䤦⁥※※††⁩㨮†⁵ⱥ†Ⱨ†⁴Ᵽ⹬㱩⽮㸬ഠ੮㱡㍩⁧㵥∠䥲≤㹩䤠㱥⼠㍯㹵൲ਠ㱃㹥⁥⁣⁳†㩢⁲ⱨ⁥⁹⁲‍⁦†㭮Ɽ†⁦⁵⁴⁳⁴†⁨⁡♲㬠⁵⁨㭯⁸⁨Ⱶ†⁵†⁨⁡†††⁨⁥⁩†⁡⁲   ⸀㰀 ⼀㸀ഀ਀    ☀㬀   ㄀  ㌀Ⰰ          ⸀ 䌀 ㄀    㬀  ㈀    㬀   ㌀     ⸀㰀⼀㸀ഀ਀㰀㌀ 㴀∀∀㸀㰀⼀㌀㸀ഀ਀㰀㸀                 䰀䴀  䴀䴀 ⸀ Ⰰ   䤀                䰀䜀Ⰰ       䴀䜀⸀    䴀䴀  䴀䜀  ☀㬀       ⸀㰀 ⼀㸀ഀ਀䄀             ☀㬀 Ⰰ      ☀㬀 Ⰰ ⸀ 䴀        ☀㬀      ⸀㰀⼀㸀ഀ਀㰀㈀ 㴀∀䠀ⴀⴀⴀⴀ∀㸀䠀    㰀⼀㈀㸀ഀ਀㰀㸀䄀            Ⰰ     㨀㰀⼀㸀ഀ਀㰀㸀㰀 㴀∀ ∀㸀ഀ਀㰀㸀ഀ਀    㰀㸀ഀ਀        㰀㸀ഀ਀            㰀 㴀∀∀㸀ഀ਀            㰀㸀㰀 㴀∀∀㸀㄀㰀⼀㸀㰀 ⼀㸀㰀⼀㸀ഀ਀            㰀⼀㸀ഀ਀            㰀 㴀∀∀㸀ഀ਀            㰀㸀㰀 㴀∀∀㸀㰀 㴀∀∀㸀㰀⼀㸀  㴀㴀 㰀 㴀∀∀㸀✀✀㰀⼀㸀㨀㰀⼀㸀㰀 ⼀㸀㰀⼀㸀ഀ਀            㰀⼀㸀ഀ਀        㰀⼀㸀ഀ਀    㰀⼀㸀ഀ਀㰀⼀㸀ഀ਀㰀⼀㸀㰀⼀㸀ഀ਀㰀㸀☀㬀㰀⼀㸀ഀ਀㰀㸀䈀  Ⰰ         ⸀ 䤀 Ⰰ               ⴀ        Ⰰ              ⸀㰀 ⼀㸀ഀ਀   Ⰰ            ⸀ 䘀 Ⰰ  ☀㬀    䴀⸀Ⰰ            䌀Ⰰ       ☀㬀       㨀㰀 ⼀㸀ഀ਀ 䴀⸀㰀 ⼀㸀ഀ਀   ⸀Ⰰ        ⴀ                 Ⰰ 
联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!