# COMP90014 Assignment 2
### Semester 2, 2022
Version 1.0 Last edited 21/09/2022.
# Activate notebook styling
from IPython.core.display import HTML
HTML(open("./src/style/custom.css", "r").read())
### Fill in your student details here
NAME = ""
ID = ""
## Completing the assignment
This assignment should be completed by each student individually. Make sure you read this entire document, and ask for help if anything is not clear. Any changes or clarifications to this document will be announced via the LMS.
Please make sure you review the University's rules on academic honesty and plagiarism: https://academichonesty.unimelb.edu.au/
Do not copy any code from other students or from the internet. This is considered plagiarism.
Your completed notebook file containing all your answers will be turned in via LMS. Please also submit an HTML file.
To complete the assignment, finish the tasks in this notebook.
The tasks are a combination of writing your own code and answering related short-answer questions.
In some cases, we have provided test input and test output that you can use to try out your solutions. These tests are just samples and are **not** exhaustive - they may warn you if you've made a mistake, but they are not guaranteed to. It's up to you to decide whether your code is correct.
**Remember to save your work early and often.**
## Marking
Cells that must be completed to receive marks are clearly labelled. Some cells are code cells, in which you must complete the code to solve a problem. Others are markdown cells, in which you must write your answers to short-answer questions.
Cells that must be completed to receive marks are labelled like this:
`# -- GRADED CELL (1 mark) - complete this cell --`
### Completing code cells
- You will see the following text in graded code cells:
``` python
# YOUR CODE HERE
raise NotImplementedError()
```
- ***You must remove the `raise NotImplementedError()` line from the cell, and replace it with your solution.***
- Only add answers to graded cells. If you want to import a library or use a helper function, this must be included in a graded cell.
- Include code comments in your solutions. Well commented code can help you to receive partial marks even if the final solution is incorrect.
### Editing the notebook
**Only** graded cells will be marked.
- Don't enter solutions outside of graded cells
- Do **NOT** duplicate or remove cells from the notebook
- You may add new cells to test code, but new cells will not be graded.
- Word limits, where stated, will be strictly enforced. Answers exceeding the limit **will not be marked**.
### Marks
No marks are allocated to commenting in your code. We do however, encourage efficient and well commented code.
The total marks for the assignment add up to 45, and it will be worth 15% of your overall subject grade.
Section 1: 23 marks
Section 2: 22 marks
### Pseudocode
Pseudocode for algorithms are a series of logical steps which any programmer can understand.
Here is the pseudocode for a function fizzbuzz to print numbers that are divisible by 3 or 5:
```
function fizzbuzz()
For i = 1 to 100
If i is divisible by 3 Then
Print "Fizz"
If i is divisible by 5 Then
Print "Buzz"
```
As you can see, the basic steps are shown, but there is no language-specific syntax.
In this manner, pseudocode explains the algorithm procedure in **direct, plain language.**
As a note, if your function calls another function (lets call this function **boom**), write it as **boom()** in the pseudocode. The open/close brackets show that 'boom' is another function call.
There are no real conventions aside from the above, so please use a style which you think is **clearest.**
## Submitting
Before you turn this assignment in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).
Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE"
Your completed notebook file containing all your answers must be turned in via LMS in `.ipynb` format.
You must also submit a copy of this notebook in `html` format with the output cleared.
You can do this by using the `clear all output` option in the menu.
Your submission should include **only two** files with names formatted as: **Assignment_2.ipynb** and **Assignment_2.html**
# Setup
### Setup
If you are using jupyter lab online, all packages will be available. If you are running this on your local computer, you may need to install some packages.
If you need to use additional packages, you must import them in a graded cell.
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
import altair
import pandas as pd
import numpy as np
import networkx as nx
import scipy
import re
from io import StringIO
from copy import deepcopy
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from scipy.cluster.hierarchy import linkage, dendrogram, fcluster
from scipy.spatial.distance import squareform
### Settings
We will set numpy and pandas to display numbers to just two decimal places in this notebook - this won't affect the actual numbers, just their display.
np.set_printoptions(precision=2)
pd.options.display.precision=2
# SECTION 1: GENE EXPRESSION NETWORKS
### Background and data
WGCNA stands for weighted gene co-expression network analysis. It is a data analysis technique used for studying biological networks based on pairwise correlations of gene expression data. WGCNA is good at identifying clusters of genes that may be co-regulated, and therefore may have shared biological function.
For this assignment, you will primarily be using the [FlyAtlas](http://flyatlas.org) dataset. For this assignment, instead of using the probe-wise dataset, we will be using the expression value for each gene.
### Read in data
To begin we will importing the fly atlas data into a pandas dataframe. We will then inspect the first few items in our data. It should have gene names as row names and sample names as column names.
# Import data
raw_expression = pd.read_csv('src/data/flyatlas_subset.csv.gz', index_col=0)
# Print first 5 rows
raw_expression.head()
The data frame has 3114 rows (genes) and 136 columns (samples) so it is certainly high dimensional. These 136 columns represent 4 replicates each from 34 different tissue types.
### Data labels
The following code snippet removes the replicate name from each sample, so we can use these labels as categories for plotting later.
# Make list of sample names without replicate number
tissues_list = [re.match('(.+?)(( biological)? rep\d+)', c).group(1)
for c in raw_expression.columns]
tissues = pd.Series(tissues_list, index = raw_expression.columns)
### Transforming the data
It's common practice to take the log of expression values. Here is a visual motivation as to why this may be useful:
log_expression = np.log2(raw_expression + 1)
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))
ax1.hist(raw_expression.values.flatten(), bins=200)
ax1.set_title('raw expression')
ax1.set_xlabel('expression value')
ax1.set_ylabel('num occurrences')
ax2.hist(log_expression.values.flatten(), bins=200)
ax2.set_title('log expression')
ax2.set_xlabel('expression value')
ax2.set_ylabel('num occurrences')
plt.show()
From here on, we will use 'log_expression' as our sample data
log_expression = np.log2(raw_expression + 1)
# Task 1 - Building a correlation matrix
Question 1a
Challenge: The [FlyAtlas](http://flyatlas.org) dataset contains four biological replicates for each tissue. Combine the biological replicates by calculating the mean expression value for each gene in each tissue.
- [ ] Input: Pandas dataframe of gene expression data; and corresponding tissue type labels as a list, array, or Pandas Series.
- [ ] Grouping samples by tissue type, calulate the mean expression value for each gene in each tissue type.
- [ ] Return: A pandas dataframe with columns corresponding to the input tissue labels; row names (gene symbols) as per the input dataframe; and mean gene expression as values.
# ~~ GRADED CELL (2 marks) - complete this cell ~~
def average_by_tissue(expression, tissues):
'''
Given a DataFrame of gene expression data,
and a list, array or Series of tissues corresponding to the columns of the dataframe,
average over the expression values in each gene for each tissue type and
return the resulting dataframe.
The columns of the new dataframe should correspond to the provided tissues.
'''
# YOUR CODE HERE
raise NotImplementedError()
return averaged_expression
### Test your function
The below test case should return
```
A B
0 4.5 2.5
1 10.0 7.0
```
### Test your function here
test_df = pd.DataFrame([[5,4,3,2],[10,10,6,8]])
average_by_tissue(test_df, ['A','A','B','B'])
### Process the data
# Calculate average expression for each tissue in the flyatlas data
tissue_expression = average_by_tissue(log_expression, tissues)
# Inspect the new expression dataframe. There should be 34 columns, corresponding to the 34 tissue types.
tissue_expression.head()
### Grading Cells
# Testing cell - Do not alter.
# Inspect mean expression values for tissue "Adult Eye"
log_expression_copy = deepcopy(log_expression)
mean_tissue_expression = average_by_tissue(log_expression_copy, tissues)
print(f'\n1a student:\n{mean_tissue_expression["Adult Eye"]}\n\n')
Question 1b
Weighted gene co-expression network analysis (WGCNA) starts by building a pairwise correlation matrix of genes.
Challenge: Using the tissue-averaged log-expression matrix you just created, produce an *unsigned* correlation matrix where each cell contains the absolute value of the correlation coefficients.
Note: You can calculate the Pearson correlation values yourself, or use an existing function (from pandas,numpy,scipy etc) to do so.
- [ ] Input: Pandas dataframe of tissue-averaged log-expression values for genes by tissue type.
- [ ] Calculate pearson correlation values between **pairs of genes**
- [ ] Convert to **unsigned** values
- [ ] Return: Unsigned pearson correlation values as numpy **array** (not pandas dataframe)
# ~~ GRADED CELL (2 marks) - complete this cell ~~
def calculate_unsigned_correlation(expression):
'''
Produce the unsigned correlation matrix for a table of gene expression values.
Assume that the columns of the expression matrix are samples and the rows are
genes, and return an array of arrays giving the Pearson correlation between each pair of genes,
in the same order as the rows of the expression table.
'''
# YOUR CODE HERE
raise NotImplementedError()
return corrMatrix.values
### Test your function
The below test case should return (if displayed to a precision of two decimal places)
```
array([[ 1. , 0.95, 0.96, 0.44, 0.3 , 0.15],
[ 0.95, 1. , 1. , 0.71, 0.59, 0.46],
[ 0.96, 1. , 1. , 0.67, 0.54, 0.41],
[ 0.44, 0.71, 0.67, 1. , 0.99, 0.95],
[ 0.3 , 0.59, 0.54, 0.99, 1. , 0.99],
[ 0.15, 0.46, 0.41, 0.95, 0.99, 1. ]])
```
test_df = pd.DataFrame([[ 3.8, 2.7, 4.5],
[ 4.3, 3.4, 6.2],
[ 5.3, 4.3, 7. ],
[ 4.6, 6. , 7.7],
[ 5.2, 7.3, 8.8],
[ 6.2, 8.5, 9.4]],
columns=['Tissue1', 'Tissue2', 'Tissue3'],
index=['GeneA', 'GeneB', 'GeneC', 'GeneD', 'GeneE', 'GeneF'])
calculate_unsigned_correlation(test_df)
The below test case should return (if displayed to a precision of two decimal places)
```
array([[ 1. , 0.95, 0.3 , 0.15],
[ 0.95, 1. , 0.59, 0.46],
[ 0.3 , 0.59, 1. , 0.99],
[ 0.15, 0.46, 0.99, 1. ]])
```
test_df = pd.DataFrame([[ 3.8, 2.7, 4.5],
[ 4.3, 3.4, 6.2],
[ 5.2, 7.3, 8.8],
[ 6.2, 8.5, 9.4]],
columns=['Tissue1', 'Tissue2', 'Tissue3'],
index=['GeneA', 'GeneB', 'GeneC', 'GeneD'])
calculate_unsigned_correlation(test_df)
### Process the data
# Calculate the correlation matrix for the flyatlas data
unsigned_correlation = calculate_unsigned_correlation(tissue_expression)
_ = plt.hist(unsigned_correlation.flatten(), bins=100)
### Grading Cell
# Testing cell - Do not alter.
# Inspect correlation matrix
student_correlation_matrix = calculate_unsigned_correlation(mean_tissue_expression)
print(f'\n1b student:\n{student_correlation_matrix[:5, :5]}\n\n')
## Short answer question
Question 1c
Question: Why are we using an unsigned correlation matrix instead of a signed correlation matrix?
Answer in the cell below.
**2 marks**, maximum of 50 words
-- GRADED CELL (2 marks) - complete this cell --
YOUR ANSWER HERE
# Task 2 - Building an adjacency matrix
To use the correlation matrix to create a network, we will transform it into an adjacency matrix. You will create two types of adjacency matrix, a **binary adjacency matrix** and a **weighted adjacency matrix**.
Question 2a
Challenge: To create the binary adjacency matrix, transform the correlation matrix such that every correlation greater than or equal to a given threshold value is considered adjacent (represented by a 1 in the matrix), and every correlation below that value is considered not adjacent (represented by a 0). Set the diagonal of the adjacency matrix to 0, so that we don't consider a node to be adjacent to itself.
- [ ] Input: Correlation matrix as numpy array, threshold value as float.
- [ ] Set values below threshold to 0
- [ ] Set values >= threshold to 1
- [ ] Set diagonal values to 0
- [ ] Return: Binary adjacency matrix - an array of same dims as input array.
- [ ] Your function should **not** modify the original input array.
# ~~ GRADED CELL (1 mark) - complete this cell ~~
def calculate_binary_adjacencies(correlation, threshold):
'''
Given a correlation matrix between genes of shape (N,N),
return the corresponding binary adjacency matrix of shape (N,N),
where correlation values are above the given threshold.
'''
# YOUR CODE HERE
raise NotImplementedError()
return binary_adjacency_matrix
### Test your function
The below test case should return (if displayed to a precision of two decimal places)
```
array([[ 0., 1., 0., 0.],
[ 1., 0., 1., 0.],
[ 0., 1., 0., 1.],
[ 0., 0., 1., 0.]])
```
test_corr = np.array([[ 1. , 0.95, 0.3 , 0.15],
[ 0.95, 1. , 0.59, 0.46],
[ 0.3 , 0.59, 1. , 0.99],
[ 0.15, 0.46, 0.99, 1. ]])
calculate_binary_adjacencies(test_corr, 0.5)
# Check that the input array has not been modified by your function
print(test_corr)
The below test case should return (if displayed to a precision of two decimal places)
```
array([[ 0., 1., 0., 0.],
[ 1., 0., 0., 0.],
[ 0., 0., 0., 1.],
[ 0., 0., 1., 0.]])
```
test_corr = np.array([[ 1. , 0.95, 0.3 , 0.15],
[ 0.95, 1. , 0.59, 0.46],
[ 0.3 , 0.59, 1. , 0.99],
[ 0.15, 0.46, 0.99, 1. ]])
calculate_binary_adjacencies(test_corr, 0.6)
### Process the data
# Calculate the binary adjacency matrix for the flyatlas data
unsigned_correlation = calculate_unsigned_correlation(tissue_expression)
adjacency_binary = calculate_binary_adjacencies(unsigned_correlation, 0.85)
unsigned_correlation
### Grading cell
# Testing cell - Do not alter.
# Inspect binary adjacency matrix
student_correlation_matrix = calculate_unsigned_correlation(mean_tissue_expression)
student_binary = calculate_binary_adjacencies(student_correlation_matrix, 0.5)
print(f'\n2a student:\n{student_binary[:10, :10]}\n\n')
Question 2b
Challenge: Calculate the connectivity of the adjacency matrix by dividing the total number of edges by the number of possible edges.
- [ ] Input: Binary adjacency matrix as array
- [ ] Return: Connectivity score as float
# ~~ GRADED CELL (1 mark) - complete this cell ~~
def calculate_connectivity(adjacency):
'''
Calculate the number of edges that exist in a given binary adjacency matrix,
divided by the total number of possible edges between all nodes.
'''
# YOUR CODE HERE
raise NotImplementedError()
return edges / possible_edges
### Test your function
# Should return 0.5
calculate_connectivity(np.array([[ 0., 1., 0., 0.],
[ 1., 0., 1., 0.],
[ 0., 1., 0., 1.],
[ 0., 0., 1., 0.]]))
# Should return 0.33
calculate_connectivity(np.array([[ 0., 1., 0., 0.],
[ 1., 0., 0., 0.],
[ 0., 0., 0., 1.],
[ 0., 0., 1., 0.]]))
### Process the data
# Calculate connectivity for the flyatlas binary adjacency matrix
calculate_connectivity(adjacency_binary)
### Grading cell
# Testing cell - Do not alter.
# Check connectivity for our ajacency matrix
student_connectivity = calculate_connectivity(student_binary)
print(f'\n2b student:\n{student_connectivity}\n\n')
Question 2c
The weighted adjacency matrix can be created by raising the correlation matrix to some power.
Challenge: Write a function that raises the correlation matrix to some power, `beta`, and sets the diagonal to `0`. For the rest of the assignment we will use `beta = 4` but your function should accept any integer.
- [ ] Input: Correlation matrix as array, beta value as int
- [ ] Raise values in the matrix by power beta
- [ ] Set diagonal values to 0
- [ ] Return: Weighted adjacency matrix as array
- [ ] Your function should **not** modify the original input array.
# ~~ GRADED CELL (2 marks) - complete this cell ~~
def calculate_weighted_adjacencies(correlation, beta):
'''
Given a correlation matrix between genes of shape (N,N),
return the corresponding binary adjacency matrix of shape (N,N),
where we use a power-law soft threshold with parameter beta.
'''
# YOUR CODE HERE
raise NotImplementedError()
return weighted_adj_matrix
### Test your function
The below test case should return (if displayed to a precision of two decimal places)
```
array([[ 0. , 0.9 , 0.09, 0.02],
[ 0.9 , 0. , 0.35, 0.21],
[ 0.09, 0.35, 0. , 0.98],
[ 0.02, 0.21, 0.98, 0. ]])
```
test_corr = np.array([[ 1. , 0.95, 0.3 , 0.15],
[ 0.95, 1. , 0.59, 0.46],
[ 0.3 , 0.59, 1. , 0.99],
[ 0.15, 0.46, 0.99, 1. ]])
calculate_weighted_adjacencies(test_corr, 2)
# Check that the input array has not been modified
print(test_corr)
The below test case should return (if displayed to a precision of two decimal places)
```
array([[ 0. , 0.86, 0.03, 0. ],
[ 0.86, 0. , 0.21, 0.1 ],
[ 0.03, 0.21, 0. , 0.97],
[ 0. , 0.1 , 0.97, 0. ]])
```
test_corr = np.array([[ 1. , 0.95, 0.3 , 0.15],
[ 0.95, 1. , 0.59, 0.46],
[ 0.3 , 0.59, 1. , 0.99],
[ 0.15, 0.46, 0.99, 1. ]])
calculate_weighted_adjacencies(test_corr, 3)
### Process the data
# Calculate the weighted adjacency matrix for the flyatlas data
unsigned_correlation = calculate_unsigned_correlation(tissue_expression)
adjacency_weighted = calculate_weighted_adjacencies(unsigned_correlation, 4)
### Grading cell
# Testing cell - Do not alter.
# Inspect weighted ajacencies
student_weighted_adjacencies = calculate_weighted_adjacencies(student_correlation_matrix, 4)
print(f'\n2c student:\n{student_weighted_adjacencies[:5, :5]}\n\n')
Question 2d
Question: How do you expect the network connectivity would change if the threshold for the binary adjacency matrix is increased or decreased?
Answer in the cell below.
**2 marks**, maximum of 50 words
~~ GRADED CELL (2 marks) - your answer here --
# Task 3 - Dimensionality Reduction
In this task we will be performing Priciple Components Analysis to determine which gene in the first principle component has the highest contribution to the variance.
Question 3a
Challenge: Write a function that calculates the number of components required to explain X% of the variance.
- [ ] Input: sklearn pca object; target percentage explained variance as float.
- [ ] Return: The number of principle components (n, as int) required to explain X% of the variance.
# ~~ GRADED CELL (2 marks) - complete this cell ~~
# Here we will run a PCA on the log-transformed gene expression matrix
pca = PCA(n_components = 100)
expression_pca = pca.fit_transform(log_expression.values.T)
def find_n_components(pca, percent_exp_variance):
# YOUR CODE HERE
raise NotImplementedError()
### Grading Cell
# Testing cell - Do not alter.
n = find_n_components(pca, 90.0)
pca = PCA(n_components = n)
expression_pca = pca.fit_transform(log_expression.values.T)
print(f'\nExpected components: {n}')
# Percentage of variance explained by each of the selected components.
print(f'\nExplained Variance ratio: {pca.explained_variance_ratio_}')
# The amount of variance explained by each of the selected components.
print(f'\nExplained variance: {pca.explained_variance_}')
Question 3b
Challenge: Find the gene that contributes most to the first eigenvector (the first principle component) of the PCA.
- [ ] Input: `pca` object from previous question.
- [ ] Calculate loadings
- [ ] Identify the index of the variable (gene) that contributes most to the first principle component
- [ ] Return a tuple consisting of A) the index position of the gene that has the greatest contribution to PC1, and B) the name of the gene.
Hint: Gene names are stored in `log_expression.index.values`
# ~~ GRADED CELL (2 marks) - complete this cell ~~
def find_pc1_max_contributor(pca):
# YOUR CODE HERE
raise NotImplementedError()
return max_gene_index, max_gene_name
### Grading cell
# Testing cell - Do not alter.
# Task 4 - Graph Metrics
Graph metrics are important parameters to assist in characterising a network as a whole or even the relative importance of specific nodes in a network and could give us hints regarding their importance.
## Short-answer questions
Question 4a
Challange: Describe an algorithm in pseudocode that returns the **normalised degree centrality** of a node (degree divided by the maximum node degree in the graph), receiving as parameters a node index **i** and its binary adjacency matrix **m**.
Answer in the cell below.
**2 marks**
-- GRADED CELL (2 marks) - complete this cell --
YOUR ANSWER HERE
Question 4b
Challange: Describe an algorithm in **pseudocode** that returns the **normalised closeness centrality** of a node, receiving as parameters a node index **i** and its binary adjacency matrix **m**.
As part of your answer you can assume the function *min_dist(a,b,m)* is available. This function will return the minimum distance between nodes a and b in a graph represented by the adjecendy matrix m.
Answer in the cell below.
**2 marks**
-- GRADED CELL (2 marks) - complete this cell --
YOUR ANSWER HERE
Question 4c
Clustering Coefficient and Average Path Length are two important properties to distinguish between different network types.
Challenge: Consider that you are working with a particular biological network. Describe briefly, with your own words, how would you use these properties to verify whether your network is consistent with a:
A. Random Network,
B. Small-World Network, or
C. a Regular Lattice Network.
(Use bullet points in your answer.)
Answer in the cell below.
**4 marks**, maximum of 150 words
-- GRADED CELL (4 marks) - complete this cell --
YOUR ANSWER HERE
# Section 2 - Genome Assembly
## Short-answer questions
Question 5a
Despite being slow, overlap-layout-consensus assembly algorithms can tolerate high error rates in the sequencing reads.
Question: How do these algorithms handle sequencing errors **during graph construction**?
Answer in the cell below.
**2 marks**, maximum of 50 words
-- GRADED CELL (2 marks) - complete this cell --
YOUR ANSWER HERE
Question 5b
Question: Starting with the same set of reads, why would it be faster to construct a de Bruijn graph of *k*-mers from the reads than to construct a graph of read overlaps?
Answer in the cell below.
**4 marks**, maximum of 50 words
-- GRADED CELL (4 marks) - complete this cell --
YOUR ANSWER HERE
Question 5c
Remember that in the worst case, a single error in a sequencing read will generate *k* erroneous *k*-mers.
Question: Describe how these incorrect *k*-mers will appear in the de Bruijn graph.
Answer in the cell below.
**5 marks**, maximum of 100 words
-- GRADED CELL (5 marks) - complete this cell --
YOUR ANSWER HERE
Question 5d
Question: If the sequencing error rate is low, and your read depth is high, how could an algorithm deal with the issues you described in Question 5c during graph simplification?
Answer in the cell below.
**4 marks**, maximum of 50 words
-- GRADED CELL (4 marks) - complete this cell --
YOUR ANSWER HERE
Question 5e
You are assembling a genome for a newly-discovered native bee species. Male bees of this species are haploid, meaning they have no heterozygosity.
You have received 12 gigabases ($12 \times 10^{9} \text{ bases}$) of 100 bp sequencing reads, which were produced from DNA from a single male bee.
From these reads, you produce a *k*-mer depth spectrum using $k = 51$. The main peak on the spectrum is at a depth of 40.
Challenge: Estimate the size of the bee's genome in megabases (Mbp).
(Hint: you could start by calculating the number of reads.)
Answer in the cell below.
**2 marks**, maximum of 100 words
-- GRADED CELL (2 marks) - complete this cell --
YOUR ANSWER HERE
Question 5f
Question: What additional feature would you expect to see on the k-mer depth spectrum described in Question 5e if the bee had been a heterozygous diploid?
Answer in the cell below.
**4 marks**, maximum of 100 words
-- GRADED CELL (4 marks) - complete this cell --
YOUR ANSWER HERE
# END OF ASSIGNMENT
## Submitting
Before you turn this assignment in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).
Make sure you have filled in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE"
Your completed notebook file containing all your answers must be turned in via LMS in `.ipynb` format.
You must also submit a copy of this notebook in `html` format with the output cleared.
You can do this by using the `clear all output` option in the menu.
Your submission should include **only two** files with names formatted as: **Assignment_2.ipynb** and **Assignment_2.html**