Python辅导：CS4117 Music Artist Lyrics Model辅导Python编程

NLP，。，。

Core Description

For the core, you will implement a program that creates a model of a music artist’s lyrics. This model receives lyric data as input and ultimately generates new lyrics in the style of that artist. To do this, you will leverage an NLP concept called an n-gram and use an NLP technique called language modeling.
Your understanding of the linked concepts and definitions is crucial to your success, so make sure to understand n-grams, language modeling, Python dictionaries as taught in the warmup, and classes and inheritance in Python before attempting to implement the core.
The core does not require you to include any external libraries beyond what has already been included for you. Use of any other external libraries is prohibited on this part of the project.

Core Structure

In the language-models/folder, you will find four files which contain class definitions: nGramModel.py, unigramModel.py, bigramModel.py, and trigramModel.py. You must complete the prepData, weightedChoice, and getNextToken functions in nGramModel.py. You must also complete the trainModel, trainingDataHasNGram, and getCandidateDictionary functions in each of the other three files.
In the root CreativeAI repository, there is a file called generate.py, which will be the driver for generating both lyrics and music. For the core, you will implement the trainLyricsModels, selectNGramModel, generateSentence, and runLyricsGenerator functions; these functions will be called, directly or indirectly, by main, which is written for you.
We recommend that you implement the functions in the order they are listed in the spec; start with prepData and work your way down to runLyricsGenerator.

Getting New Lyrics (Optional)

If your group chooses to use lyrics from an artist other than the Beatles, you can use the web scraper we have written to get the lyrics of the new artist and save them in the data/lyrics directory for you. A web scraper is a program that gets information from web pages: ours, which lives in the data/scrapers directory.
If you navigate to the data/scrapers folder and run the lyricsWikiaScraper.py file, you will be prompted to input the name of an artist. If that artist is found on lyrics.wikia.com, the program will make a folder in the data/lyrics directory for that artist, and save each of the artist’s songs as a .txt file in that folder.

Explanation of Functions to Implement

prepData

The purpose of this function is to take input data in the form of a list of lists, and return a copy of that list with symbols added to both ends of each inner list.
For the core, these inner lists will be sentences, which are represented as lists of strings. The symbols added to the beginning of each sentence will be ^::^ followed by ^:::^, and the symbol added to the end of each sentence will be $:::$ . These are arbitrary symbols, but make sure to use them exactly and in the correct order.
For example, if the function is passed this list of lists:

[ [apos;heyapos;, apos;judeapos;], [apos;yellowapos;, apos;submarineapos;] ]

Then it would return a new list that looks like this:

[ [apos;^::^apos;, apos;^:::^apos;, apos;heyapos;, apos;judeapos;, apos;$:::$apos;], [apos;^::^apos;, apos;^:::^apos;, apos;yellowapos;, apos;submarineapos;, apos;$:::$apos;] ]

The purpose of adding two symbols at the beginning of each sentence is so that you can look at a trigram containing only the first English word of that sentence. This captures information about which words are most likely to begin a sentence; without these symbols, you would not be able to use the trigam model at the beginning of sentences because there would be no trigrams to look at until the third word.
The purpose of adding a symbol to the end of each sentence is to be able to generate sentence endings. If you ever see $:::$ while generating a sentence in the generateSentence function, you know the sentence is complete.

trainModel

This function trains the NGramModel child classes on the input data by building their dictionary of n-grams and respective counts, self.nGramCounts. Note that the special starting and ending symbols also count as words for all NGramModels, which is why you should use the return value of prepData before you create the self.nGramCounts dictionary for each language model.