we’re taking the raw data we obtained from Milestone 1, and we’re building a data model for
this. This can be anything you like (for example: SQL relationships, a class hierarchy, setting up
Pandas dataframes, SQLalchemy, etc. This list is not exhaustive!) You have the fredom to
interface with your data however you’d like, but kep in mind that regardles of how simple you
think the data is, your solution wil be graded on how useful, extensible, modular and robust your
solution is. Better solutions get better scores!
You are to turn in your Python code for your project so far, including the code you wrote in
Milestone (i.e. this new code should integrate with the old code). You can turn in any number
of suporting files (libraries, modules, etc.) but you must folow the same format as before:
Name your script. LASTNAME_FIRSTNAME_M2.py (you wil LOSE points if you don’t do this!)
Your script. should be modular in that it allows you to obtain the data from the scraper/API (as
in Milestone 1) but also obtain it from local storage. How you implemented this (text files, CSV,
cached webpages, SQL files, Feather serialized dataframes, etc.) is up to you. There should be a
–source=remote or –source=local comand line parameter (remember the lecture on args and
kwargs!)
When invoked, your Python script. should grab the data (either locally or remotely) stick it into
your data model, and then retrieve it and manipulate it in some way. How you do this is up to
you; just imagine doing one of whatever computation you’l end up doing for the project 1. For
example, if your data sources were, say, lat/long combinations, a gogle API and voting records,
you might grab the lat/long, ask the gogle API for the closest city, and then get the voting
records for that city. You’d isplay a “result” (just one!) [You’l save the “final” result/conclusion
for the last part of the project]
Questions:
In addition, you should turn in a plain text file named LASTNAME_FIRSTNAME_M2_1.txt
(NO DOC, PDF, OR ANYTHING ELSE), that answers the folowing questions:
1. What are the strengths of your data modeling format?
2. hat are the weakneses? (Does your data model suport? Sorting the information?
Re-ordering it? Only obtaining a certain subset of the information?)
3. How do you store your data on disk?
4. Let’s say you find another data source that relates to all 3 of your data sources (i.e. a
data source that relates to your existing data). How ould you extend your model to
include this new data source? How ould that change the interface?
5. How ould you add a new attribute to your data (i.e. imagine you had a lat/long column
in a database. You might use that to acces an API to get a city name. How ould you
add city name to your data?)
Now that we’ve acquired our data, and built a structure to store/manipulate it, we’re going
to draw some conclusions!
you’re to submit a zip file with all the code you used, as wel as a README text file that
documents:
1. How to run your code (what comand-line switches they are, what happens when
you invoke the code, etc.)
2. Any major “gotchas” to the code (i.e. things that don’t work, go slowly, could
be improved, etc.)
3. Anything else you fel is relevant to your project.
Again,call this file README and put it in the rot of your archive.
Questions:
Next, answer the folowing questions:
1. What did you set out to study? (i.e. what was the point of your project? This should
be close to your Milestone 1 assignment, but if you switched gears or changed
things, note it here.)
2. What did you Discover/what were your conclusions (i.e. what were your findings?
ere your original assumptions confirmed, etc.?)
3. What dificulties did you have in completing the project?
4. hat skils did you wish you had while you were doing the project?
5. What would you do “next” to expand or augment the project?
Turn in a in a plain text file named LASTNAME_FIRSTNAME_M2_2.txt (NO DOC, PDF,
OR ANYTHING ELSE), that answers these questions, in the rot of your zip file as wel.
The rubric for M2 is as follows: