Homework #2: Computational Section
All computations should be done in this notebook using the R kernel. Working in small groups is allowed,
but it is important that you make an effort to master the material and hand in your own work.
You will be required to submit this notebook, fully compiled with your solutions,
as an HTML file to Canvas by 2pm on Friday, Feburary 2.
Problem 1
Some claim that the final hours aboard the Titanic were marked by class warfare; other claim it was
characterized by male chivalry. Load the titanic data frame. into R. This dataset contains information
pertaining to class status , survival of passengers , and gender , among others. You can
learn more about the data here: https://cran.r-project.org/web/packages/PASWR2/PASWR2.pdf
(a) Determine the fraction of survivors from each passenger class.
(b) Compute the fraction of survivors according to class and gender. Did men in
the first class or women in the third class have a higher survival rate?
(c) How would you characterize the distribution of (e.g., is it symmetric,
positively/negatively skewed, unimodal, multimodal)?
(d) Were the median and mean ages for females who survived higher or lower
than for females who did not survive? Report the median and mean ages as well
as an appropriate measure of spread for each statistic.
(e) Were the median and mean ages for males who survived higher or lower
than for males who did not survive? Report the median and mean ages as well
as an appropriate measure of spread for each statistic.
(f) What was the age of the youngest female in the first class who survived?
(g) Do the data suggest that the final hours aboard the Titanic were
characterized by class warfare, male chivalry, some combination of both, or
neither? Justify your answer based on computations above, or based on other
explorations of the data.
Problem 2
(a) Conduct a simulation in to numerically illustrate the results from
theoretical question 3 (a) and (b).
(b) Verify theoretical question 1 (a) ii. with an example in R.
Problem 3
(a) Load the chocolate.csv data into R (perhaps using the function).
Print a summary of the data. Which variables are stored as factors, and which
are stored as numeric? The function may help.
(b) Change the variable names to: company, sorigin, ref, date, cocoa, location,
rating, type, borigin.
(c) Create a new data frame. with just company, cocoa, location, and rating. Use
this data frame. for all remaining questions.
(d) Should we clean this new data frame. in any way?
(e) Which company makes the highest rated chocolate bar? The lowest?
(f) Are there any relationships between cocoa, location, and rating? Explore
these variables graphicall and numerically, and write a short report (around a
paragraph) describing some possible relationships.