Molecular evolution assignment -- Part 2

Read the README-FIRST (README-FIRST.ipynb) notebook first!

Fitting and interpreting maximum likelihood models

The questions in this notebook are worth 13%

There is an extension question in this notebook worth 1%.

Q1 – Hypothesis test comparing GTR and HKY85

Q1a

Perform a hypothesis test that compares GTR and HKY85. Refer to the online notes

( ) and the hypothesis test demo

(demo_hypothesis_test.ipynb).

Apply your hypothesis test to the alignment at aln_path .

In [ ]:

Q1b

Which model is the alternate hypothesis

What is your conclusion from the hypothesis test

ANUID = ""

aln_path = "data/part2/cds/ENSG00000011007.fasta"

# This part worth 1

# Enter your code here

raise NotImplementedError # No Answer - remove if you provide an answer2022/6/28 1111 - Jupyter Notebook

YOUR ANSWER HERE

Q1c

Assuming the null hypothesis is correct, what is the probability of a LR ≥ that observed?

YOUR ANSWER HERE

Q1d

Display the MLEs for the model you accepted

In [ ]:

Q1e

In ≤ 50 words.

Identify what you think is the most striking difference between the MLEs from the two models.

Explain why you thought it was striking and what it reveals about the nature of substitutions affecting

this gene.

Feel free to add a compute cell to display those estimates.

YOUR ANSWER HERE

Q1f

In ≤50 words.

List the assumptions common to both HKY85 and GTR models.

List the assumptions unique to HKY85

List the assumptions unique to GTR

YOUR ANSWER HERE

Q2 – Analysis of the substitution rates by sequence

"class"

Background

# This part worth 0.1

# Enter your code here

raise NotImplementedError # No Answer - remove if you provide an answer2022/6/28 1111 - Jupyter Notebook

3/5

In the sequence comparison assignment you have sought an understanding of where the information

specifying Transcription Factor (TF) binding occurs. As discussed in the background material for that

topic, there is a tendency for these elements to lie within the "proximal promoter" region, i.e. 5'- of the

transcription start site of the gene.

Motivated by this evidence for function in the 5' proximal region, we are now seeking to evaluate

whether the amount of substitutions in such "promoter" regions reflects this function.

Experimental design

Read the data sampling for likelihood analyses (data_description.html) as it contains critical

information regarding the study design.

That work produced a tab delimited result file ( data/part2/results-summed_lengths.tsv ) that

contains estimates of the sum of branch lengths from the entire tree for each gene. (See the online

notes regarding branch lengths

( /molevol/substitution_models.html#time-in-molecular-evolution).)

Q2a

In ≤ 100 words.

with reference to the Neutral Theory, make a prediction regarding how you expect the branch

lengths to differ (or not) between at least 2 of the sequence types. Your prediction must include

the logical reasoning by which you arrived at the prediction.

YOUR ANSWER HERE

Q2b

This part worth 2

In ≤ 300 words.

Choose an appropriate statistical testing procedure (see the online notes

( /cogent3/statistical_tests.html) or use scipy.stats ) to test the

prediction you made above

select the sequence types you will use and justify the decision

specify what hypothesis tests you will conduct (each one will require specifying the null and

alternate) and what testing procedure you will use, to evaluate this prediction

justify why you chose that procedure

NOTE

If you think the best statistical procedure is not implemented in cogent3 or in scipy :

implement it yourself if you can; or2022/6/28 1111 - Jupyter Notebook

4/5

pick what you think is closest to it from those in cogent3 and explain what you see as

limitations of the choice.

Some literature research to choose what seems the most suitable test for the hypothesis. Here's one

reference page (https://en.wikipedia.org/wiki/Location_test) for statistical testing, and another

(https://en.wikipedia.org/wiki/Paired_difference_test) to get you started.

YOUR ANSWER HERE

Q2c

Apply your chosen procedure to the stored data at tsv_path .

For information on how to manipulate the tabular data, refer to the online notes

( /cogent3/tables.html).

In [ ]:

Q2d

In ≤ 300 words.

From the results of your hypothesis test(s) you have performed, draw your conclusion regarding your

hypothesis.

Explain your result(s) with reference to the Neutral Theory. In your answer consider the following:

do the results make sense?

was there anything surprising?

if you think there are limitations of the design / analysis that compromise the ability to draw

conclusions, state them

by design I mean, the sequence sampling protocol

by analyses I mean the properties of the methods for estimating the branch lengths

are there alternate explanations

This part worth 3.2

YOUR ANSWER HERE

Q3 – extension question

Worth 1

tsv_path = "data/part2/results-summed_lengths.tsv"

# Enter your code here

raise NotImplementedError # No Answer - remove if you provide an answer2022/6/28 1111 - Jupyter Notebook

5/5

Is there a way to sample columns from the protein coding sequence alignments so that the variation

is more likely to be neutral?

Explain and justify (≤ 100 words)

implement it using only pure python and cogent3

apply it to a couple of alignments and compare the results to not using your procedure

what is the limitation of the approach

Your answer YOUR ANSWER HERE

In [ ]: # your code

# Enter your code here

联系我们

QQ：99515681
邮箱：99515681@qq.com
工作时间：8:00-21:00
微信：codinghelp

热点文章

辅导 comm2000 creating socia... 2026-01-08
讲解 isen1000 – introductio... 2026-01-08
讲解 cme213 radix sort讲解 c... 2026-01-08
辅导 csc370 database讲解迭代 2026-01-08
讲解 ca2401 a list of colleg... 2026-01-08
讲解 nfe2140 midi scale play... 2026-01-08
讲解 ca2401 the universal li... 2026-01-08
辅导 engg7302 advanced compu... 2026-01-08
辅导 comp331/557 – class te... 2026-01-08
讲解 soft2412 comp9412 exam辅... 2026-01-08
讲解 scenario # 1 honesty讲解... 2026-01-08
讲解 002499 accounting infor... 2026-01-08
讲解 comp9313 2021t3 project... 2026-01-08
讲解 stat1201 analysis of sc... 2026-01-08
辅导 stat5611: statistical m... 2026-01-08
辅导 mth2010-mth2015 - multi... 2026-01-08
辅导 eeet2387 switched mode ... 2026-01-08
讲解 an online payment servi... 2026-01-08
讲解 textfilter辅导 r语言 2026-01-08
讲解 rutgers ece 434 linux o... 2026-01-08

热点标签

msinm014/msing014/msing014b

联系我们 - QQ: 99515681 微信：codinghelp

程序辅导网！