首页 >
> 详细

Assignment 3

COMP 250, Fall 2019

[document last modified: November 4, 2019]

General instructions

• Add code where instructed, namely in the ADD CODE HERE block. You may also add

helper methods and extra fields if you wish.

• All import statements in your KDTree.java file will be removed, except for the ones that

are in the starter code.

• The starter code includes a tester class that you can run to test if your methods are correct.

• Your code will be tested on valid inputs only.

• You may submit multiple times, e.g. if you realize you made an error after you submit. For

any run of the Grader, you will receive the best score on your submissions.

• Late assignments will be accepted up to two days late and will be penalized by 10 points per

day. If you submit one minute late, this is equivalent to submitting 23 hours and 59 minutes

late, etc. So, make sure you are nowhere near that threshold when you submit.

• See the Submission Instructions at the end of this document for more details.

• Search for the keyword updated to find any the places where the PDF has been altered since

the original posting.

1 Introduction

In this assignment, we extend the powerful idea of binary search trees from one dimension to

multiple dimensions. In particular, we will work with points in <

k where k can be any positive

integer. Such points might represent image data, audio data, or text. For example, machine learning

methods often map such common data to feature vectors in a k − D feature space, and then use

these features in various ways.1

In binary search, one is given a key and tries to find that key amongst many that are stored in a

data structure. We will examine a related but different problem for k − D data. Our problem will

be to take a given point, which we will call a query point rather than a “key”, and search for the

nearest point in the data set. In the field of Machine Learning, the nearest point is typically referred

to as the nearest neighbor.

One way to find the nearest point would be to iterate by brute force over all data points, compute

the distance from the query point to each data point, and then return the data point that has

the minimum distance.2 However, this brute force search is inefficient, for the same reason that

linear search through a list is inefficient relative to binary search, since in the worst case it requires

examining each data point. One would instead like to have a “nearest point” method that is must

faster, by restricting the search to a small subset of the data points.

1.1 KD-Trees

One way to restrict the search for a query point is to use a tree data structure called a kd-tree. This

is a binary tree which is similar to a binary search tree, but there are important differences. One

difference between a kd-tree and a binary search tree concerns the data itself. In a binary search

tree, the data points (keys) are ordered as if on a line, or in Java terms they are Comparable. In a

kd-tree, there is no such overall ordering. Instead, we only have an ordering within each of the k

dimensions, and this ordering can differ between dimensions. For example, consider k = 2, and let

points be denoted (x1, x2). We can order two points by their x1 value and we can order two points

by their x2 value, but we don’t want to order (x1, x2) and (x01, x02) in general. For example, in the

case x1 < x01 but x2 > x02, we do not want to impose an ordering of these two points.

Another important difference is how binary search trees and kd-trees are typically used. With

a binary search tree, one typically searches for an exact match for the search key. With a kd-tree,

one typically searches for the nearest data point. We will use the Euclidean distance. Give a query

point xq ∈ Z

k we want to find a data point x[ ] that minimizes the squared distance,

sqrdist(xq[ ], x[ ]) ≡Xi(xq[i] − x[i])2.

We used squared distance rather than distance because taking the square root buys us no extra

information about which point is closest.

Note that there may be multiple data points that are the same (squared) distance from a query

point xq. One could define the problem slightly differently by returning all points in this non-unique

case. Or, one could ask for the nearest n points, and indeed this is common. We will just keep it

simple and ask for a single nearest point. If multiple points lie at that same nearest distance, then

any of these points may be returned.

Kd-trees which are a particular type of binary tree. Each internal node implicitly defines a

k−dimensional region in

联系我们

- QQ：99515681
- 邮箱：99515681@qq.com
- 工作时间：8:00-23:00
- 微信：codinghelp2

- Tsp课程作业代写、代做algorithms留学生作业、代做java，C/C 2020-06-23
- Kit107留学生作业代做、C++编程语言作业调试、Data课程作业代写、代 2020-06-23
- Sta302h1f作业代做、代写r课程设计作业、代写r编程语言作业、代做da 2020-06-22
- 代写seng 474作业、代做data Mining作业、Python，Ja 2020-06-22
- Cmpsci 187 Binary Search Trees 2020-06-21
- Comp226 Assignment 2: Strategy 2020-06-21
- Math 504 Homework 12 2020-06-21
- Math4007 Assessed Coursework 2 2020-06-21
- Optimization In Machine Learning Assig... 2020-06-21
- Homework 1 – Math 104B 2020-06-20
- Comp1000 Unix And C Programming 2020-06-20
- General Specifications Use Python In T... 2020-06-20
- Comp-206 Mini Assignment 6 2020-06-20
- Aps 105 Lab 9: Search And Link 2020-06-20
- Aps 105 Lab 9: Search And Link 2020-06-20
- Mech 203 – End-Of-Semester Project 2020-06-20
- Ms980 Business Analytics 2020-06-20
- Cs952 Database And Web Systems Develop... 2020-06-20
- Homework 4 Using Data From The China H... 2020-06-20
- Assignment 1 Build A Shopping Cart 2020-06-20