CSE 6242/CX 4242: Data and Visual Analytics | Georgia Tech | Fall 2017
Homework 2 : D3 Graphs and Visualization
Due: Wednesday, October 11, 2017, 11:55 PM EST
Prepared by Kiran Sudhir, Varun Bezzam, Yuyu Zhang, Akanksha Bindal,
Vishal Bhatnagar, Vivek Iyer, Polo Chau
Submission Instructions and Important Notes:
Itisimportantthatyoureadthefollowinginstructionscarefullyandalsothoseaboutthedeliverablesattheendof
each question or you may lose points.
❏Alwayschecktomakesureyouareusingthemostup-to-dateassignmentPDF(e.g.,re-downloaditfromthe
course homepage if unsure).
❏Submit asinglezippedfile, called“HW2-{YOUR_LAST_NAME}-{YOUR_FIRST_NAME}.zip”, containingall
thedeliverablesincludingsourcecode/scripts,datafiles,andreadme.Example:‘HW2-Doe-John.zip’ifyour
name is John Doe. Only .zip is allowed (no other format will be accepted)
❏Youmaycollaboratewithotherstudentsonthisassignment,butyoumustwriteyourowncodeandgivethe
explanationsinyourownwords,andalsomentionthecollaborators’namesonT-Square’ssubmissionpage.
All GTstudentsmustobservethehonorcode.Suspectedplagiarismandacademicmisconduct will be
reportedtoanddirectlyhandledbytheOfficeofStudentIntegrity(OSI).Herearesomeexamplessimilar
to Prof. Jacob Eisenstein’s NLP course page (grading policy):
❏OK: discuss concepts and strategies (e.g., how cross-validation works, use hashmap instead of array)
❏Not OK:severalstudentsworkononemastercopytogether(e.g.,bydividingitup),sharingsolutions,or
using solution from previous years or from the web.
❏Ifyouuseany“slipdays”,youmustwritedownthenumberofdaysused intheT-squaresubmissionpage.
For example, “Slipdaysused:1”.Eachslipdayequals24hours.E.g.,ifasubmissionislatefor30hours,
that counts as 2 slip days.
❏Attheendofthisassignment,wehavespecifiedafolderstructureabouthowtoorganizeyourfilesinasingle
zipped file. 5 points will be deducted for not following this strictly.
❏Wewilluseauto-gradingscriptstogradesomeofyourdeliverables(therearehundredsofstudents),soitis
extremelyimportant that youstrictlyfollowourrequirements.Marksmaybedeductedifourgradingscripts
cannot execute on your deliverables.
❏Whereveryouareaskedtowritedownanexplanationforthetaskyouperform,staywithinthewordlimitor
you may lose points.
❏Inyourfinalzipfile,pleasedonotincludeanyintermediatefilesyoumayhavegeneratedtoworkonthetask,
unless your script is absolutely dependent on it to get the final result (which it ideally should not be).
❏After all slip days are used, 5% deduction for every 24 hours of delay. (e.g., 5 pts for 100-point homework)
❏Wewillnotconsiderlatesubmissionofanymissingpartsofahomeworkassignmentorprojectdeliverable.
To make sure you have submitted everything, download your submitted files to double check.
1 Version 2
Grading
Themaximumpossiblescorefor thishomeworkis120points.Studentsintheundergraduatesection
(CX4242) can choose tocompleteany100pointsworthof worktoreceivethefull 15%of thefinal
coursegrade.Forexample,ifaCX4242studentscores120pts,thatstudentwillreceive(120/100)*
15=18ptstowardsthefinal coursegrade. Toreceivethefull 15%score,studentsintheCSE6242
sections will need to complete all 120 points.
Important Prerequisites
Download the HW2 Skeleton that contains files you will use in this homework.
Wehighlyrecommendthat youusethelatest Firefoxbrowser tocompletethishomework.Wewill
grade your work using Firefox 55.0.2 (or newer).
For thishomework, youwill workwithversion3of D3, providedtoyouinthelibfolder.Youmust
NOT use any other d3 libraries (d3*.js) other than the ones provided.
You may need to setup an HTTPserver to runyour D3visualizations(dependingonwhichweb
browseryouareusing,asdiscussedintheD3lecture).TheeasiestwayistouseSimpleHTTPServer
in Python (for Python version 2.x).You should run your local HTTP server in the root
(hw2-skeleton) folder.
All d3*.jsfilesinthelibfolder must bereferencedusingrelativepaths, e.g.,“../lib/”in
your html files(e.g., thoseinfoldersQ2, Q3, etc.). For example, supposethefile“Q2/graph.html”
uses d3, its header should contain:
It is incorrect to use an absolute path such as:
Youcanandareencouragedtodecouplethestyle, functionalityandmarkupinthecodeforeach
question. That is, you can use separate files for css, javascript and html.
Q1 [10 pts] Designing a good table. Visualizing data with Tableau.
Imagine you are a data scientist working with United Nations High Commissioner for Refugees
(UNHCR) andtheUniformCrimeReportingdivisionofFederalBureauofInvestigation(FBI).Perform.
thefirst subtasktoaidUNHCR’sunderstandingofpersonsofconcernandthesecondsubtasktoaid
FBI in analysing changing crime rates.
a. [5 pts]Good table design. Create a table to display the details of the refugees (Total
2 Version 2
Population) in the year 2012 fromthedata providedinunhcr_persons_of_concern.csv. You
1
canuseanytool(e.g.,Excel,HTML)tocreatethetable.Keepsuggestionsfromclassinmind
whendesigningyour table(seelecturesslides,specificallyslide#43“Howtofixthedefaults”,
for what totry, but youarenot limitedtothetechniquesdescribed).Describeyourreasonfor
choosing the techniques you use in explanation.txt in no more than 50 words.
b. [5 pts]Tableau: Visualizehowdifferentcrimerates(e.g.,Burglaryrate,PropertyCrimeRate,
etc) changeover thegivenyears, inthedatasetcrime_rates_FBI.csv(inQ1folder), usinga
2
single line chart or multiple line charts.
OurmaingoalhereisforyoutotryoutTableau,apopularinformationvisualizationtool.Thus,
we keep this part more open-ended, so you can practicemakingdesigndecisions.We will
accept most designs fromyouall.Weshowonepossibledesigninthefigurebelow,based
on the tips fromMulti-Measure Dual Axis Charts, and you arenot limitedtothetechniques
presented there.
○ Your design should visualize at least 3 crime rates over time. You are welcome to
choose which specific crime rates tovisualize. Youmayuseasinglelinechart, or a
combination of multiple line charts.
○ Scalealine’sthickness(overtheyears)basedonthecrimeratesthatitrepresents(over
the years).
Note: Tableau can visualize a line whosethickness varies over its path (see the
imagebelow).Ifyoucannotfigurehow,youmayuselineswithuniformthicknesses(and
decide on how you want those thicknesses to be computed).
○ Explain your choices of techniques in explanation.txt, using no more than 50 words.
○ Save the chart as timeseries.(png/pdf).
Tableau has provided us with student licenses. Go totableau activation and select “Get
Started”.Ontheform,enteryourGeorgiaTechemailaddressfor“Businessemail”and“Georgia
Institute of Technology” for "Organization". The Desktop Key for activation is available in
T-SquareResourcesas“TableauDesktopKey”.Thiskeyisforyouruseinthiscourseonly.Do
not share the key with anyone.
Q1 Deliverables:
The directory structure should be as follows:
Q1/
table.(png / pdf)
timeseries.(png / pdf)
explanation.txt
unhcr_persons_of_concern.csv
crime_rates_FBI.csv
● table.(png / pdf) - An image/screenshot of the table in Q1.a (png or pdf format only).
● timeseries.(png / pdf) - An image of the chart in Q1.b (png or pdf format only, Tableau
1
Source: http://popstats.unhcr.org/en/overview
2
Source: https://ucr.fbi.gov/
3 Version 2
workbooks will not be graded!). The image should not be an exact copy of the example
visualization. The image should be clear and of high-quality.
● explanation.txt - Your explanation for part Q1.a and Q1.b in this file (maximum 100 words).
● unhcr_persons_of_concern.csv and crime_rates_FBI.csv - the datasets
Fig1b: Examplevisualisationof different crimerates. At theleast, pleasemakesomechangestothis
visualization. We will deduct points for an exact copy.
Q2 [15 pts] Force-directed graph layout
YouwillexperimentwithmanyaspectsofD3forgraphvisualization.Tohelpyougetstarted,wehave
provided the graph.html file (in the Q2 folder).Note: You are welcome to split graph.html into
graph.html, graph.css, and graph.js.
a. [3 pts] Adding node labels: Modify graph.html to showa node label (thenodename, i.e., the
source) to the right of each node. If a node is dragged, its label must also move with the node.
b.[3 pts]Coloring links: Color the links based on the “value” field in the links array. Assignthe
followig colors:
4 Version 2
If the value of the edge is = 3.0 and 4.0 : assign Blue color to the link.
c. [3 pts] Scaling node sizes:
1. Scale the radius of each node in the graph based on the degree of the node.
2. Inexplanation.txt,usingnomorethan40words,discussyourscalingmethodyouhaveused
and explain whyyouthinkit isagoodchoice. Therearemanypossiblewaystoscale, e.g.,
scale the radii linearly, by the square root of the degree, etc.
d. [6 pts] Pinning nodes (fixing node positions):
1. Modifythehtmlsothatwhenyoudoubleclickanode,itpinsthenode’spositionsuchthatitwill
not bemodifiedbythegraphlayoutalgorithm(note:pinnednodescanstillbedraggedaround
by the user but they will remain at their positions otherwise). Node pinning is an effective
interaction technique to help users spatially organize nodes during graph exploration.
2. Markpinnednodestovisuallydistinguishthemfromunpinnednodes, e.g., pinnednodesare
shown in a different color, border thickness or visually annotated with an “asterisk” (*), etc.
3. Double clicking a pinned node should unpin (unfreeze) its position and unmark it.
Q2 Deliverables:
The directory structure should be as follows:
Q2/
graph.html
explanation.txt
graph.js, graph.css (if not included in graph.html)
● graph.html - the html file created.
● explanation.txt - the text file explaining your design choices for Q2.
● graph.(js / css) - the js / css files if not included in graph.html
Q3 [15 pts] Scatter plots
Use the dataset provided in the file diabetes.csv (in the folder Q3) to create a scatterplot.
Refer to the tutorial for scatter plot here.
Attributes in the dataset:
Feature 1: Number of times pregnant
Feature 2: Plasma glucose concentration at the 2nd hour in an oral glucose tolerance test
Feature 3: Diastolic blood pressure (mm Hg)
Feature 4: Triceps skin fold thickness (mm)
Feature 5: 2-hour serum insulin (mu U/ml)
3
Source: http://archive.ics.uci.edu/ml/datasets/Pima+Indians+Diabetes
5 Version 2
Feature 6: ody mass index (weight in kg/(height in m)^2)
Feature 7: Diabetes pedigree function
Feature 8: Age (years)
Class: 0 or 1 (class value 1 means “tested positive for diabetes”)
A. [8 pts] Creating scatter plots:
1. [6 pts]Create two scatter plots, one for each feature combination specified below. In the
scatterplots,visualize“negative”classinstancesasbluecircles,and“positive”instancesasred
triangles. Add a legend showing how symbols map to the classes.
○ Features 2 (plasma glucose) vs. Feature 5 (insulin)
○ Features 6 (BMI) vs. Feature 3 (blood pressure)
2. [2 pts] Inexplanation.txt,usenomorethan50wordstodiscusswhichfeaturecombinationis
better at separating the classes and why.
YourscatterplotsshouldbeplacedoneaftertheotheronasingleHTMLpage,similartotheexample
image below (Figure 3). Note that your design needs NOT be identical to the example.
Based on the scatter plot created for Feature 2 and Feature 5 (Plasma Glucose vs. Insulin),
create new plots for the following questions:
B. [3 pts] Scalingsymbol sizes. Set thesizeof eachsymbol intheplot tobeproportional tothe
productofplasmaglucoseandinsulinvalues.Createanewscatterplot forthispart(appendto
the HTML page). Set the scaling coefficient properly to make the scatter plot legible.
C. [4pts] AxisscalesinD3.Createtwoplotsforthispart(appendtotheHTMLpage)totryouttwo
axisscalesinD3:thefirstoneusesthesquarerootscaleforitsy-axis(only),andthesecondone
usesthelogscalefor itsy-axis(only). Notethat thex-axes(PlasmaGlucose) shouldbekeptin
linear scale, and only the y-axes (Insulin) are affected. Explain in no more than 50 words, in
explanation.txt, when we may want to use square root scale and log scale in charts.
Hint: You may need to carefully set the scale domain to handle the 0s in data.
6 Version 2
Q3 Deliverables:
The directory structure should be organized as follows:
Q3/
scatterplot.(html / js / css)
explanation.txt
scatter_plots.pdf
diabetes.csv
7 Version 2
● scatterplot.(html / js / css) - the html / js / css files created.
● explanation.txt - the text file explaining your observations for Q3.A.2 and Q3.C.
● scatter_plots.pdf-aPDFdocumentshowingthescreenshotsofthefivescatterplotscreated
above(twoforQ3.A.1,oneforQ3.BandtwoforQ3.C).YoumayprinttheHTMLpageasaPDF
file, andeachPDFpageshowsoneplot.Hint:Tomakeitwork,onewayistouseCSSpage
break(refertostackoverflow).Clearlytitletheplotsinthedocument(seeexamplesinFigure3),
using the following titles:
● Plasma Glucose vs. Insulin
● BMI vs. Blood Pressure
● Plasma Glucose vs. Insulin, scaled symbols
● Plasma Glucose vs. Insulin (square-root-scaled)
● Plasma Glucose vs. Insulin (log-scaled)
● diabetes.csv - the dataset.
Q4 [15 pts] Heatmap and Select Box
Example: 2D Histogram, Select Options
Usethedataset providedinheatmap.csv(inthefolderQ4)thatdescribesthenumberofappearances
4
ofcharactersfromeachhouseinHBO’sGameofThronesacrossepisodesandseasons.Visualizethe
data using D3 heatmaps.
a. [6 pts]Create a heatmap of thenumber of appearancesof charactersfromeachhousefor
season 1 of Game of Thrones. Place the episodeontheheatmap’shorizontal axisandthe
houseonitsverticalaxis.Numberofappearancesforeachhousewillberepresentedbycolors
in the heatmap. There should be 9 color gradations.
b. [3 pt]Addaxislabelsandalegendtothechart. Placethenameof thehouse(“Baratheon”,
“Lannister”, “Stark”, etc.) on the vertical axis and the episode number on the horizontal axis.
c. [6pt]NowcreateadropdownselectboxwithD3thatispopulatedwiththeseasonnumbers(1
to 6) in ascending order. When the user selects a different season in this select box, the
heatmap and the legend should both be updated with valuescorrespondingtotheselected
season.Notethedifferencesinthelegendsforseason1and3intheimagesbelow.Whilethe9
colorgradationsinthelegendremainthesame,thethresholdsvaluesaredifferent.Thedefault
season when the page loads should be 1 (i.e., the first season).
Source: https://github.com/fredhohman/a-viz-of-ice-and-fire
8 Version 2
Q4 Deliverables:
The directory structure should look like (remember to include the d3 library):
Q4/
heatmap.(html / js /css)
heatmap.csv
● heatmap.(html / js/ css) - the html / js / css files created.
● heatmap.csv - the dataset
Q5 [25 pts] Sankey Chart
Example: Sankey diagram from formatted JSON
Formula One racing is a championship sport in which racedriversrepresent teamstocompetefor
pointsover several races(alsocalledGrandPrix) inaseason.Theteamwiththemostpointsatthe
endofaseasonwinstheprestigiousFormulaOneWorldConstructors'Championshipaward.Youwill
visualizetheflowof pointsfor theracesheldin2016. Thedriverswinpointsaccordingtotheirfinal
standing in each race, which finally get added to their respective team’s total.
Note: The implementation of certain parts in this question may be quite challenging.
a. [15 pts] CreateaSankeyChart usingtheprovideddatasets(races.csvandteams.csv)inthe
Q5 folder. The chart should visualize the flow of points in the order:
race → driver → team
You must use thesankey.js provided in thelib folder. You can keep the blocks’ vertical
positionsstatic.YourchartshouldlooksimilartotheexampleSankeyChartforthe2015season
as shown in the above image.
Note:Forthispart,youwillhavetoreadinthecsvfilesandcombinethedataintoaformatthat
canbepassedtothesankeylibrary.Toaccomplishthis,youmayfindthefollowingjavascript.
functions useful: d3.nest(), array.filter(), array.map()
b. [6pts]Usethed3-tiplibrarytoaddtooltipsasshownintheaboveimage.Youarewelcometo
make your own visual style choices using css properties.
Note: You must create the tooltip by only using d3.tip.v0.6.3.js present in the lib folder.
c. [4 pts] From the visualization you have created, determine the following:
1. [1 pt] Which driver won the Grand Prix 2016?
2. [1 pt] Which team won the Grand Prix 2016?
3. [1 pt] Which driver won the Spanish Grand Prix?
4. [1 pt] Which team has the highest number of players?
Put your answersinobservations.txt. Modifythetemplateprovidedtoyou(inQ5folder) by
replacing team_name/driver_name with your answer
Samle observations.txt
1.driver_name
2.team_name
3.driver_name
4.team_name
Q5 Deliverables:
The directory structure should be as follows:
Q5/
races.csv
teams.csv
viz.(html/js/css)
observations.txt
● races.csv and teams.csv - the data sets (unmodified)
● viz.(html/js/css) - The html, javascript, css to render the visualization in Q5.a and b.
● observations.txt - Your answer for Q5.c.
Q6 [20 pts] Interactive visualization
Usethedataset providedinthedata.txtfile(intheQ6folder)tocreateaninteractivebarchart.Each
6
line in the file represents an English football club, and its value in millions over the past five years.
You will have to integrate the data provided in dataset.txt directly in an array variable in the script.
Example: var data=[];
a.[5pts]Createahorizontalbarchartwithitsverticalaxisdenotingtheclubnamesanditshorizontal
axisdenotingthetotalvalues(inmillions)overthepast5years.Eachbarshouldhavethetotalvalue
(in millions) labelled inside it. Refer to the example shown in Figure 6a.
Note: The vertical axis of the chart should use club names as labels.
b.[10pts]Onhoveringoverabar,asmallerlinechartrepresentingthevalueofthatclubforeachyear
(2013-2017)shouldbedisplayedinthetoprightcorner.Forexample,Liverpoolhasavalueof$651M,
$704M, $982M, $1548M, $1492Mfor theyears2013, 2014, 2015, 2016and2017respectively. On
hoveringover thebar representingLiverpool, alinechart depictingthese5valuesisdisplayed.See
Figure 6b for an example.
c. [3 pts] On mouse out, the line chart should no longer be visible.
d.[2pts]Onhoveringoveranyhorizontalbarrepresentingaclub,thecolorofthebarshouldchange.
Youcanuseanycolorthatisvisuallydistinctfromtheregularbars.Onmouseout,thecolorshouldbe
reset.
Figure 6b. On hovering over the bar for Liverpool, a smaller line chart representing its value in millions over the
past 5 years is displayed at the top right corner.
Q6 Deliverables:
The directory structure should be as follows:
Q6/
interactive.(html/js/css)
interactive.(html/js/css)-Thehtml,javascript,csstorenderthevisualizationinQ6(dataset.txtisNOT
requiredtobeincludedinthefinal directorystructureasthedataprovidedindataset.txtshouldhave
already been integrated into the “data” variable in your code)
Q7 [20 pts] Choropleth Map of County Data
Example: Unemployment rates
Use theprovideddataset inmedian_ages.csv,us.jsonandmedian_earnings.json(inthefolder Q7)
and visualize them as a choropleth map.
● Each record inmedian_ages.csv represents a county and is of the form.
, where
○ id corresponds to the state the county is in
○ name is the county name
○ median_age is the median age of the people living in the county
● Themedian_earnings.jsonfilecontainsalist of JSONobjects, eachhavingtwofields:anid
field corresponding to a state in the United States, and amedian_earnings field
corresponding to the median earnings of people in that state after 10 years.
● Theus.json file isaTopoJSONtopologycontaining three geometry collections:counties,
states, and nation.
a. [15 pts]Create a choropleth map using the provided datasets. The color of each state should
correspond to the median earnings in that state, i.e., darker colors correspond to higher median
earningsinthatstateandlightercolorscorrespondtolowermedianearningsinthatstate.Addalegend
showinghowcolorsmaptomedianearnings.Used3-queue(inthelibfolder)toeasilyloaddatafrom
multiple files into a function. Use topojson (present in lib) to draw the choropleth map.
b. [5 pts] Addatooltipusingthed3.tiplibrary(inthelibfolder)that,onhoveringoverastate,shows
the5countiesinthatstatewiththelowestmedianagesinascendingorder,alongwiththoseages.Ifa
statehasfewerthan5counties,showallcountiesavailable,alongwiththeirmedianages.Thetooltip
should appear on hovering over the state. On mouseout, the tooltip should disappear.
Note: You must create the tooltip by only using d3.tip.v0.6.3.js present in the lib folder.
Q7 Deliverables:
The directory structure should be organized as follows:
Q7/
q7.(html/js/css)
median_ages.csv
median_earnings.json
us.json
● q7.(html /js /css)- The html/js/css file to render the visualization.
● median_ages.csv and median_earnings.json - The datasets used.
● us.json - Dataset needed to draw the map.
8
d3-queue evaluates a number of asynchronous tasks concurrently -- in this question, each task would be loading
one data file. When all tasks have finished, d3-queue passes the results to a user-defined callback function.
14 Version 2
Important Instructions on Folder structure
The directory structure must be as follows. The files that should be included in each question’s folder
(e.g., Q1 for question 1) have been clearly specified at the end of each question’s problem description
above.
HW2-LastName-FirstName/
|---lib/
|---- d3.v3.min.js
|---- d3.tip.v0.6.3.js
|---- sankey.js
|---- d3-queue.v3.min.js
|---- topojson.v1.min.js
|---Q1/
|---- ...
|---Q2/
|---- ...
|---Q3/
|---- ...
|--- Q4/
|---- ...
|---Q5/
|---- ...
|---Q6/
|---- ...
|---Q7/
|---- ...
15 Version 2