首页 >
> 详细

Assignment 3

COMP 250, Fall 2019

[document last modified: November 4, 2019]

General instructions

• Add code where instructed, namely in the ADD CODE HERE block. You may also add

helper methods and extra fields if you wish.

• All import statements in your KDTree.java file will be removed, except for the ones that

are in the starter code.

• The starter code includes a tester class that you can run to test if your methods are correct.

• Your code will be tested on valid inputs only.

• You may submit multiple times, e.g. if you realize you made an error after you submit. For

any run of the Grader, you will receive the best score on your submissions.

• Late assignments will be accepted up to two days late and will be penalized by 10 points per

day. If you submit one minute late, this is equivalent to submitting 23 hours and 59 minutes

late, etc. So, make sure you are nowhere near that threshold when you submit.

• See the Submission Instructions at the end of this document for more details.

• Search for the keyword updated to find any the places where the PDF has been altered since

the original posting.

1 Introduction

In this assignment, we extend the powerful idea of binary search trees from one dimension to

multiple dimensions. In particular, we will work with points in <

k where k can be any positive

integer. Such points might represent image data, audio data, or text. For example, machine learning

methods often map such common data to feature vectors in a k − D feature space, and then use

these features in various ways.1

In binary search, one is given a key and tries to find that key amongst many that are stored in a

data structure. We will examine a related but different problem for k − D data. Our problem will

be to take a given point, which we will call a query point rather than a “key”, and search for the

nearest point in the data set. In the field of Machine Learning, the nearest point is typically referred

to as the nearest neighbor.

One way to find the nearest point would be to iterate by brute force over all data points, compute

the distance from the query point to each data point, and then return the data point that has

the minimum distance.2 However, this brute force search is inefficient, for the same reason that

linear search through a list is inefficient relative to binary search, since in the worst case it requires

examining each data point. One would instead like to have a “nearest point” method that is must

faster, by restricting the search to a small subset of the data points.

1.1 KD-Trees

One way to restrict the search for a query point is to use a tree data structure called a kd-tree. This

is a binary tree which is similar to a binary search tree, but there are important differences. One

difference between a kd-tree and a binary search tree concerns the data itself. In a binary search

tree, the data points (keys) are ordered as if on a line, or in Java terms they are Comparable. In a

kd-tree, there is no such overall ordering. Instead, we only have an ordering within each of the k

dimensions, and this ordering can differ between dimensions. For example, consider k = 2, and let

points be denoted (x1, x2). We can order two points by their x1 value and we can order two points

by their x2 value, but we don’t want to order (x1, x2) and (x01, x02) in general. For example, in the

case x1 < x01 but x2 > x02, we do not want to impose an ordering of these two points.

Another important difference is how binary search trees and kd-trees are typically used. With

a binary search tree, one typically searches for an exact match for the search key. With a kd-tree,

one typically searches for the nearest data point. We will use the Euclidean distance. Give a query

point xq ∈ Z

k we want to find a data point x[ ] that minimizes the squared distance,

sqrdist(xq[ ], x[ ]) ≡Xi(xq[i] − x[i])2.

We used squared distance rather than distance because taking the square root buys us no extra

information about which point is closest.

Note that there may be multiple data points that are the same (squared) distance from a query

point xq. One could define the problem slightly differently by returning all points in this non-unique

case. Or, one could ask for the nearest n points, and indeed this is common. We will just keep it

simple and ask for a single nearest point. If multiple points lie at that same nearest distance, then

any of these points may be returned.

Kd-trees which are a particular type of binary tree. Each internal node implicitly defines a

k−dimensional region in

联系我们

- QQ：99515681
- 邮箱：99515681@qq.com
- 工作时间：8:00-23:00
- 微信：codinghelp

- Csci 340作业代做、代写java程序语言作业、代做java实验作业代写 2019-12-12
- Data课程作业代做、代写nbershade作业、代做r课程设计作业、代写r 2019-12-12
- 代写csci 1100作业、Program课程作业代做、Python语言作业 2019-12-12
- Data留学生作业代做、代写sql实验作业、Sql编程有作业调试、Pseud 2019-12-12
- 代做g6077留学生作业、System课程作业代写、代做web编程语言作业、 2019-12-12
- 代写comp529作业、代做analysis留学生作业、代写java语言作业 2019-12-12
- Ce235留学生作业代写、Program课程作业代写、C/C++程序语言作业 2019-12-12
- 代写system留学生作业、代做python语言作业、代写java，C/C+ 2019-12-12
- 代写ma705留学生作业、代写python程序语言作业、代写python实验 2019-12-11
- Stat 3312作业代做、R语言作业代写、代做r编程设计作业、代写sas 2019-12-11
- Comp201作业代做、代写software Engineering作业、J 2019-12-11
- Statistics 3022作业代做、代写data留学生作业、R编程设计作 2019-12-11
- 代写canvas留学生作业、Python, R，Matlab编程作业代做、代 2019-12-11
- Cs 112留学生作业代做、Program编程语言作业、Python程序语言 2019-12-11
- 代写fre6831留学生作业、代做python程序语言作业、Python实验 2019-12-11
- Mathjax课程作业代写、代做html，Css作业、代写r编程设计作业、代 2019-12-11
- Stsci 5060作业代做、Sql编程语言作业调试、代写sql课程设计作业 2019-12-11
- 代做parser留学生作业、Programs课程作业代写、代写c++实验作业 2019-12-10
- 代写econ215留学生作业、代写python课程设计作业、Python编程 2019-12-10
- 代写databases作业、代做java，Python编程设计作业、代写c/ 2019-12-10