DAT 500S – Machine Learning - Project Guidelines
Final Goal: Optimize the portfolio of (experimental) varieties to be grown at the target
farm. Information about the target farm is available in the evaluation dataset. The optimal
portfolio can have at most 5 varieties of soybean. It is not necessary but you are welcome
to use the methods you learn in prescriptive analytics class to construct the optimal
portfolio. If you are not familiar with optimization, come up with a meaningful heuristics to
construct the portfolio. An example heuristic approach was discussed in class on
November 21, 2020.
You are encouraged to divide the project work into three components: Descriptive
Analytics, Predictive Analytics, and Prescriptive Analytics.
I. Descriptive Analytics
Perform an exploratory data analytics to unearth patterns in the given data to educate
yourself about the given data. For example,
1. Plot the latitudes and longitudes on a map to visualize the locations of farms.
Identify where the target/evaluation farm is located. It should be noted that most of
the farms are located in the Midwest of the US.
2. Generate frequency distribution for varieties. Decide if you have enough data for
each variety to build dedicated prediction models for every variety.
3. Check to see if there is any relationship between the locations and varieties.
Explore if certain varieties are grown more often in some regions than in other
regions.
4. Look for patterns in weather variables. Explore relationships between locations
and weather related variables.
5. Plot the distribution of the yield variables. Based on the plot, what do you think a
realistic goal for the optimal portfolio at the target farm?
II. Predictive Analytics
Decide a target variable to help you with the project goal. Variety_Yield and
Yield_Difference are good candidates for the target variable. Based on the frequency distribution generated in the descriptive analytics, decide which varieties will have its own
prediction model. Also, decide which varieties are going to be combined in the same
model. Have an identifier for varieties in the combined model so that predictions can be
made for individual varieties. Generate models using the following algorithms (if your
target variable is continuous):
1. Linear Regression
2. LASSO
3. Regression Tree
4. Bagging
5. Random Forest
6. Boosted Trees
7. Neural Network
Generate models using the following algorithms (if your target variable is categorical):
1. Logistic Regression
2. Classification Tree
3. Bagging
4. Random Forest
5. Boosted Trees
6. Neural Network
7. Support Vector Machine
Using these models, predict the yield or yield difference for every potential variety at
the target/evaluation farm. Depending upon the choice of your target variable, these
predictions need not be yield or yield difference. Make predictions for multiple weather
related uncertainties. Ensure that chosen weather related scenarios are suitable for
the location of the target / evaluation farm.
III. Prescriptive Analytics
Optimize the portfolio of (experimental) varieties to be grown at the target farm.
Experimental varieties are in the column identified as ‘Varieties’. The optimal portfolio can
have at most 5 varieties of soybean. It is not necessary but you are welcome to use the methods from the prescriptive analytics class or other optimization classes to construct
the optimal portfolio. If you are not familiar with optimization, you can invent your own
heuristic to make the recommendation. There will not be any grade penalty for not using
optimization. Using a good heuristic will be sufficient to get a good score for this part of
the project.
Your recommendation should explicitly identify the varieties to be grown and percentage
of the farm land allocated for growing those varieties. The percentage of the farm land
should add up to 100 percent. Here are two sample heuristics,
1. Naïve Heuristics
Based on the predictions, rank the varieties according to their yield potential and
recommend the top 5 varieties to be grown at the farm. You could potentially allocate 20
percent of the land for each variety.
2. Mean-Risk Heuristics
Based on the predictions, rank varieties based on the mean yield and risk in yield.
Recommend the top 5 varieties in these rankings. Allocate land based on the mean yield
and risk in yield.
Key things to remember while writing the report
Perform a literature search using library resources to identify journal publications relevant
to your topic. In the literature, do you find interesting methods to make similar
recommendations? What do you think about those methods? How is your approach
different from those methods? Did your project add incremental value to these existing
publications? Note: Utilize at least six peer-reviewed journal (Management Science,
Interfaces, Operations Research, Journal of Operations Management, Production of
Operations Management, Journal of Portfolio Management, Journal of Finance, etc) or
conference articles to synthesize your arguments about the existing methods in the
literature.The final report should include Title of the project, Abstract, Keywords, Introduction,
Literature Review, Methodology and Analysis, Conclusion, and References.
Submit your project as a PDF file on Canvas by May 4, 11:59 PM CST.
Remember to include the following components in your report:
Note: Please listen carefully to plagiarism issues described in the class. If you have
any question on plagiarism related issues, you should contact the instructor by
April 20th for clarifications.
1) Title. Convey a message using 12 words. Readers should understand the content
of the entire report by just reading the title. Note: The very first page of the
report should include the Title, your ID number, Abstract, Keywords, etc.
You should not include a blank page at the beginning of the report.
2) Abstract. Summarize your report using 300 words. Some readers would read just
the abstract to figure out if they would like to read the entire report. You should
write a captivating summary of the entire report here.
Note: You should just read the title and abstract of many publications as part of
your literature review before deciding on the articles that you would like to utilize
in your project.
3) Keywords. Include three to five keywords relevant to your project.
4) Introduction. This section should introduce your project. You should include
discussions about: What is the motivation behind this project? What is the goal of
this project? Which organization benefits from this study? What are the Research
Question(s) answered by this project? What methods were utilized? What are the
important results and conclusions?
5) Literature Review. Utilize library databases like JSTOR, INFORMS, PUBMED,
etc. to find relevant studies (peer-reviewed articles) to your topic. Do you find publications addressing this same problem? Did you add more value to the
existing literature by completing this project?
Note: Do not base your opinion/findings based on articles that are not peer
reviewed i.e. utilizing newspapers and magazines articles alone are not adequate.
Note: Do not copy and paste content from other resources.
6) Methodology and Analysis. Concisely describe the methods and analysis used in
the project. Note: At least 60 percent of the report should focus on
Methodology and Analysis.
7) Conclusion – Summarize your methods, analysis, results, and recommendations.
What is unique about your work? What are the findings? Are there any surprises?
Are the findings beneficial to any organization?
8) References – Include references from your Literature Review.
9) Tables and figures should be numbered and titled. Table titles should appear on
the top. Figure titles should appear on the bottom. Every table and figure
presented in the report should be discussed in the report.
10)Formatting: Submit a Word report with all of the above discussed components.
Include your ID number on the first page (no title page) and include page numbers
on all pages. Your report should be not less than 9 pages in length. It should not
exceed 10 pages. You cannot have anything beyond 10 pages.
Font: Arial
Font Size: 12
Margins: 1 inch on all four sides
Spacing: 1.5 line spacing