Homework 8
(10 point)
- Download at least 200 Pubmed articles (abstract, title or whatever you prefer). 100 should
include the term “obesity” and the other 100 should include the term “cancer”. If your system is
running too slow for the next step you can download 50 from each category. - Analyze their textual corpus in a way that we can feed them into a dimensionality reduction
method.
- Apply following dimensionality reductions on the text: PCA (3 points), tSNE (3 points),
UMAP (4 points). -You need to prepare a report on your tasks and findings along with a video file describing what
you have done. You can copy paste your codes, its results and your description into a Word
document, Python Notebook or you can use R notebook.
-Your deadline for delivering this home work is written on the blackboard online. Please feel
free to ask your question and prepare it for presentation for the next session.
Hint: You might use this resource for python: https://www.scikityb.org/en/latest/api/text/index.html, and R: https://cran.rproject.org/web/packages/umap/vignettes/umap.html , https://www.r-bloggers.com/quick-andeasy-t-sne-analysis-in-r/
But, there is no guarantee that if those links and hints are helpful.