Project Overview
You wil create a program to scrape and search information about National Sites (Parks,
Heritage Sites, Trails, and other entities) from nps.gov. You wil also ad the ability to lok up
nearby places using the Google Places API and to display National Sites and Nearby Places on
a map using plotly.
Also please observe the folowing:
● Do not change the name of the file proj2_nps.py
● Do not change any of the contents of the file proj2_nps_test.py
○ You can create other files, including other test files, if you would like, but you may
not change this file or rename the main program file.
Failure to folow these guidelines may result in point deductions, or worse.
Part 1 (6 points)
In part 1 you wil scrape nps.gov with the goal of being able to print out information about any
National Site listed on the site, organized by state. Information wil include the site name, site
type, and the physical (or mailing) adres. Your program wil start crawling at
https:/ww.nps.gov/index.htm, and from there crawl pages for particular states and then pages
for particular sites. The links to state pages can be acesed from the dropdown box labeled
“FIND A PARK.”
To pas the included tests, you wil ned to create a function
get_sites_for_state(state_abbr) that takes a state abreviation and returns a list of
NationalSites that are in that state. The required atributes for the NationalSite clas can be
seen in the skeleton code file (proj2_nps.py).
At the basic level, each NationalSite (instance) should be created with a name, type (e.g.,
‘National Park,’ ‘National Monument’, ‘National Historic Site’), and description. Al of these can
be found on the landing page for a particular state (e.g.,
https:/ww.nps.gov/state/mi/index.htm).
In addition, you should visit the detail page for each site to extract additional information--in
particular the physical address of the site. To do this, you wil have to crawl one level deeper
into the site, and extract information from the site-specific pages (e.g.,
https:/ww.nps.gov/isro/index.htm).
NationalSites should return a string representation of themselves (using __str__( )) of the
folowing form. ():
For example:
Isle Royale (National Park): 800 East Lakeshore Drive, Houghton, MI
Finaly, though you should realy consider doing this first to dramaticaly sped up your
development time, implement caching so that you only have to visit each URL within nps.gov
once (and subsequents attempts to visit, say https:/ww.nps.gov/state/mi/index.htm or
https:/ww.nps.gov/isro/index.htm are satisfied using the cache rather than another HTP
request).
Grading
● [2 points] Implement basic searching by state and creation of NationalSites with name
and type. Pas TestStateSearch.test_basic_search( )
● [2 points] Implement adding address information to NationalSites by crawling. Pass
TestStateSearch.test_addresses( )
● [1 points] Implement __str__( ) as specified. Pass TestStateSearch.test_str( )
● [1 points] Add caching so that you never have to visit a page on the nps site more than
once.
Part 2 (6 points)
Implement a function get_nearby_places(site_object) that loks up a site by name
using the Google Places API and returns a list of up to 20 nearby places, where “nearby” is
defined as within 10km (note: 20 results is the default maximum number returned by the Google
Places API without paging).
Geting the list of nearby places wil require two calls to the google API: one to get the GPS
coordinates for a site (tip: do a text search for to ensure a more precise
match--it turns out there are lots of places called “Death Valley” that aren’t National Parks!), and
another one to get the nearby places. Documentation on the Google Places API can be found
here: https:/developers.google.com/places/web-service/search.
You will ned to get a Gogle API key following instructions linked to the above page (se “Get
a Key”). Implementing caching for this portion of the project is STRONGLY recommended, as
Google Places limits you to 100 API cals per day, and text search cals count at 10 each. It is
incredibly easy to burn through these calls quickly. (Note: you can increase your API limit by
entering a credit card. Google says they won’t charge the card, they just use it for identification.
It’s fine if you want to do this, but you are by no means required to do this and if you use
caching and restrict yourself to working on a few examples at a time you can avoid neding to
do it.)
get_nearby_places(site_object)should return a list of NearbyPlace objects.
At a minimum, a NearbyPlace neds to have the name of the place as an atribute. You may
find it useful to ad other attributes as wel. A NearbyPlace should include a __str__( )
method, which simply prints the Place name.
Note:
● If you do a search on a NationalSite using (e.g., “Death Valey
National Park” or “Motor Cities National Heritage Area”) and Gogle Places does not
return any results (or returns results, but none of them have the specific name you
searched for), you the list of “Nearby Places” should be an empty list.
Grading:
● [4 points] Return a list of up to 20 NearbyPlaces from
get_nearby_places(site_object). Each place has a properly configured name
attribute. Pass test_nearby_search( ).
● [2 points] adding caching so that you only do a particular nearby search once (items in
your cache don’t ned to expire, even though technicaly the data could change given
enough time.)
Part 3 (6 Points)
Use plotly to display maps of NationalSites within a state and NearbyPlaces near at
NationalSite.
Implement two functions:
plot_sites_for_state(state_abbr) and plot_nearby_for_site(site_object)
Here are some details about each function:
● plot_sites_for_state(state_abbr):
○ Takes a state abreviation
○ Creates a map scater plot on the plotly site that contains all of the NationalSites
found for that state that Gogle Places was able to find GPS cordinates for.
■ Any Sites that don’t have GPS cordinates should be removed before
creating the map
○ The map should be centered and scaled apropriately so that all of the sites are
visible and that there is a reasonable amount of “pading” around the edges of
the map (i.e., so that al of the sites are comfortably within the map frame. and not
all the way at the edge)
○ All Sites should be displayed with the same type of marker, and each should
display the name of the site when a user hovers over the marker (this is the
default behavior. in plotly if each data point has a ‘text’ field correctly set).
● plot_nearby_for_site(site_object)
○ Takes a NationalSite object
○ Creates a map scater plot on the plotly site that contains al of the NearbyPlaces
for the specified site.
■ If a NationalSite is provided that Gogle Places can’t find GPS
coordinates for, the map should not be created (you can handle the error
however you deem appropriate)
○ The map should be centered and scaled apropriately so that al of the places
are visible and that there is a reasonable amount of “padding” around the edges
of the map (i.e., so that all of the sites are comfortably within the map frame. and
not all the way at the edge)
○ The NationalSite should be displayed with a diferent marker than the
NearbyPlaces. Note that the NationalSite may be returned as a result by Gogle
Places, in which case it neds to be removed before the map is plotted. The
NationalSite and al NearbyPlaces should display their name when a user hovers
over the marker in plotly.
Here are examples of each:
On the left is the result of caling plot_sites_for_state(‘mi’).
On the right is the result of calling plot_nearby_for_site(NationalSite(‘National
Lakeshore’, ‘Sleeping Bear Dunes’))
Don’t wory about the fact that some of the markers apear to be of by a few fractions of a
degree. This seems to have something to do with the projection we are using for plotly in our
code (‘albers usa’) which doesn’t agre with the coordinates being produced by Google. If the
data is correct and the maps are more or less in the right area, you wil not get points off.
Grading:
● [3 points] Correct implementation of plot_sites_for_state(state_abbr)
● [3 points] Correct implementation of plot_nearby_for_site(site_object)
Part 4 (6 Points)
Make the program interactive. Here is a list of commands your program should acept and how
it should handle them:
list
available anytime
lists all National Sites in a state
valid inputs: a two-letter state abbreviation
nearby
available only if there is an active result set
lists all Places nearby a given result
valid inputs: an integer 1-len(result_set_size)
map
available only if there is an active result set
displays the current results on a map
exit
exits the program
help
lists available commands (these instructions)
Note: a “result set” here refers to a list of NationalSites for a state or a list of NearbyPlaces for a
NationalSite. You can implement this concept however you like, as long as the above semantics
are preserved. Here is a sample run of the program:
Grading:
● [1 points] Implement ‘list’ command correctly
● [1 points] Implement ‘nearby’ command
● [1 points] Implement ‘map’ command
● [1 points] Implement ‘help’ command
● [1 points] Implement ‘exit’ command
● [1 points] Handle bad inputs elegantly