Project 8-f (Final Project): World-wide Earthquake Watch
Due: Tuesday, November 21, 2017 11.59p
FINAL PROJECTS MUST BE SUBMITED; GRADE WIL NOT BE DROPED.
Goals
Use file procesing and data mining to discover paterns of earthquake activity around the world
over the past year; plot results on a world map.
Practice with
• data mining and file procesing
• reading, understanding, and revising a program that has mostly ben already writen. This
wil require a thorough understanding of the k-means cluster analysis algorithm and file
procesing.
Background – Data Mining
In this project we continue to study earthquake activity. We wil use a file with data once again
obtained from the United States Geological Survey website, about earthquakes of magnitude 5 or
greater that have occurred across the planet over the past year (since November 1, 2016).
We wil implement a k-means cluster analysis data mining algorithm from Ch. 7 of the text and
use it to analyze the earthquake data. We wil then use Python turtle graphics to graphically
report the earthquake clusters discovered by the k-means algorithm on a world map.
Requirements
Functions to read and proces earthquake data saved in a file are adapted from Ch. 7 of the text.
Function readFile is based on the function in ch. 7 of the text (p. 255). You wil need to
revise it to acomodate the earthquake data file analyzed in this project: the earthquake file we
are using has a header line, each line has a return (\n) character, the values are coma (not
space) separated, and the longitude and latitude are in different fields (“columns”) than the data in
the text.
(readFile wil be very similar to equake_readf from Project 7-2, except that it will return
a dictionary rather than a list.)
Functions euclidD, createCentroids, and createClusters are all copied from Ch. 7
of the text. You wil ned to ad docstrings and helpful in-line coments.
Note that the createClusters function includes code to report the results of the cluster
analysis (for c in clusters …). This is helpful for testing createClusters with a smal data
dictionary (data file), but you may want to comment this out for the large earthquake file.
visualizeQuakes is based on the function in Ch. 7 of the text. visualizeQuakes in this
project has k (the number of clusters) and r (the number of times to repeat cluster analysis) as
parameters, along with the dataFile parameter. As in the text, visualizeQuakes should
call readFile, createCentroids, and createClusters.
visualizeQuakes wil also call a new function, eqDraw, to plot the earthquake data on a
world map. visualizeQuakes should return None. You wil ned to write a docstring and
add coments to visualizeQuakes.
eqDraw is a new function that wil be caled by visualizeQuakes to do the work of plotting
the results of the k-means analysis on a world map. eqDraw wil have thre
parameters: k, eqDict, the earthquake data dictionary, and eqClusters, the list of clusters
generated for the earthquake data by the k-means cluster analysis.
eqDraw wil comprise the drawing code that is included in visualizeQuakes in the text.
You should change the scrensize method arguments to 180 and 90, to reflect the size of
the world map .gif that we will use. You can also use the anonymous turtle from the turtle
module (i.e., you do not ned to define quakeT). Tip: include speed('fastest').
Finally, define a main function to asign values to k, r, and f, and call visualizeQuakes.
main should return None. Include code in your .py file to cal the main function.
Testing your code
doctest.testmod wil be useful for testing individual functions. Use smal, made-up
examples to test each individual function before calling it from another function. Start with a
short version of the earthquake file (around 15 lines not including the header should be smal
enough to understand, but large enough to generate results).
Geting Started
Open a new Python editor window. Save it with the name p8f_equakes_vis.py.
Download the earthquake file (earthquakes.txt) and the world map
(worldmap1800_900.gif) you wil use for plotting earthquake points. Make sure the
earthquake and map files are in the same folder as p8f_equakes_vis.py.
This program is somewhat longer than prior programs you have writen, but when it is viewed
function by function it should not sem so diferent. A god strategy for implementing a
medium-sized project such as this one is to understand the code from the top down, and
implement the code from the botom up, with careful testing of each function at each step.
For example:
Start by carefully reading Ch. 7 of the text and reviewing clas notes. When you are comfortable
with your understanding of the k-means cluster algorithm, have a closer lok at the code that is
given in Ch. 7 to implement the algorithm. Type the code into a Python file, one function at a
time.
For each auxiliary cluster algorithm function in turn (euclidD,
createCentroids and createClusters), write the docstring for the function.
Test each individual function to make sure you understand and trust the function. Create your
own smal test dictionaries to test the functions. Use simple examples, like individual scores or
earthquake magnitudes, at first. Then, try examples with two-dimensional points, such as
earthquake longitude and latitudes.
Next, write the function that will access earthquake data, readFile. Note that you wrote a
similar function for Project 7-2. You may want to start with that function and revise it to aces
longitude and latitude data (rather than magnitudes) and store the data into a dictionary (rather
than a list).
You may want to create a short version of the earthquake files, for testing readFile.
The data in the earthquake files is structured as for project 7-2:
time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontal
Error,depthError,magError,magNst,status,locationSource,magSource
2010-07-28T16:12:05.610Z,43.756,-125.815,10,5.2,mwc,193,143.9,,0.93,us,usp000hh0t,2017-
08-01T16:34:36.951Z,"off the coast of Oregon",earthquake,,,reviewed,us,gcmt
1993-12-04T22:15:19.720Z,42.2915,-
122.0086667,4.797,5.1,md,126,113,,0.11,uw,uw10316468,2017-04-
13T22:06:07.852Z,"Oregon",earthquake,0.468,0.56,0.04,7,reviewed,uw,uw
The first line of the file is a header, which gives the meaning of each field (column) of the data.
The data itself is coma-separated values, with one value (or the empty string) per field. Use the
header to determine how to aces latitude and longitude data. Note that when you aces the
earthquake file to create the dictionary, you wil need to skip the header to get to the earthquake
data.
Next, start puting it together, by writing visualizeQuakes. First call only readFile.
When this is working, ad a cal to createCentroids, and so on.
When visualizeQuakes is working, write the main function and ad a cal to main
(main()) to your program.
Programing style. is important!
Programs are read much more than they are writen. All programs must include
a program header at the top of the file with
• project identification
• author
• credit for any and all sources of significant help with the project excluding the class text
and CIS 210 instructional staff. This includes programming partners, other texts, online
information, etc.
• short, precise file description
Also remember to
• include a docstring in every function (type contract, brief description of the function,
examples of use)
• use whitespace betwen operators and operands
• use descriptive variable names
• add appropriate in-code comments (why, not what).
Finishing & submiting your work
When you have completed all of the problems, use the Save comand from the IDLE File menu
to save the Editor window as a file with the name p8f_equakes_vis.py.
Submit your file via Canvas. You may re-submit the project until the project deadline (as
long as the submission link is available). Only the final submision wil be graded.
Note: as per CIS 210 clas policy (se Sylabus), it is not posible to submit a project after the
deadline. Projects that are not submitted by the deadline will receive a default grade of zero.
Four project grades wil be droped at the end of the term, to provide the flexibility you ned for
busy weks, individual technical dificulties, misunderstandings, etc. You do not ned to contact
the CIS 210 instructional staff about this; it will hapen automatically.
Even if you do not submit a project, you should complete the project.
Always check the solution posted at the clas website.
Grading Rubric
All submited programs should be a god faith atempt to met the project specification. In
particular, to get credit programs must at a minium include (1) header (title, author,
credits, brief description), (2) complete docstrings (type contract, brief description,
examples of use) for every function, and (3) code that can be executed (no syntax errors).
20 pts. possible
• 12 pts. – 2 pts. per function - Program has corect structure and conforms with CIS 210 style.
guidelines, including a file header with project identification, author, credits, and short file
description at the top of the program file;
each function (readFile, euclidD, createCentroids, createClusters,
visualizeQuakes, eqDraw) includes a docstring with type contract, description, and
simple examples of function use; each function has a return statement; functions have helpful
in-line coments as neded;
• 8 pts. – 2 pts. per function - Code for new and revised functions (readFile,
visualizeQuakes, eqDraw, and main) runs and returns correct results per project
specification.
Optional (and optimal) – Extra Credit
(0) Improve the visualization part of the program. For example, the size of the dots might reflect
the magnitude of the earthquake. (This would mean information about the magnitude of the
earthquake would need to be stored in the datadict.)
(1) We've ben loking at earthquake data, but there are many other types of data you may be
interested in exploring, e.g., newspapers, electoral votes, bus schedules, etc. The CORGIS
project includes many data sets (https:/think.cs.vt.edu/corgis/csv/index.html); many others
(sometimes not as clean) are available on the internet. Find a data set that interests you and
analyze the data using your data analysis and data mining functions.