首页 > > 详细

Homework 8

 Homework 8

(10 point)
- Download at least 200 Pubmed articles (abstract, title or whatever you prefer). 100 should 
include the term “obesity” and the other 100 should include the term “cancer”. If your system is 
running too slow for the next step you can download 50 from each category. - Analyze their textual corpus in a way that we can feed them into a dimensionality reduction 
method.
- Apply following dimensionality reductions on the text: PCA (3 points), tSNE (3 points), 
UMAP (4 points). -You need to prepare a report on your tasks and findings along with a video file describing what 
you have done. You can copy paste your codes, its results and your description into a Word 
document, Python Notebook or you can use R notebook.
-Your deadline for delivering this home work is written on the blackboard online. Please feel 
free to ask your question and prepare it for presentation for the next session.
Hint: You might use this resource for python: https://www.scikit￾yb.org/en/latest/api/text/index.html, and R: https://cran.r￾project.org/web/packages/umap/vignettes/umap.html , https://www.r-bloggers.com/quick-and￾easy-t-sne-analysis-in-r/
But, there is no guarantee that if those links and hints are helpful.
联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!