首页 > > 详细

B+树/CPP讲解、btree程序辅导、讲解C/C++编程、解析cpp/btree报告解析Haskell程序|辅导留学生Prolog

Please keep this in mind when you are working on the lab.?For grading, you need to compile?random_tester_1.cpp?and?random_tester_2.cpp?with your implementation of B-trees.Double-check youself and make sure that you are doing it.?In multiple semesters, students have copied my?random_tester_1?and?random_tester_2?rather than compile their own. Of course, mine pass the gradescripts. You want to make sure that?yours?are passing the gradescripts.

I don't have B-Tree lecture notes that I have written. Instead, I rely on?this web page, by David Carlson and Isidore Minerd, which I think is very well done.
You are going to implement the "B+ Tree" variant, where internal nodes only hold keys and pointers to other nodes, and external nodes hold keys and pointers to values. One change from their description is that the value of a key in an internal node is going to be held in the largest val pointer in the key's predecessor subtree. Let me show an example, lifted from their notes:

In their example, the val pointer for J is the pointer to the left of K. In our trees, the val pointer for J will be the pointer to the right of I. Similarly:
The val pointer for F is the pointer to the right of E.
The val pointer for C is the pointer to the right of B.
The val pointer for M is the pointer to the right of L.
The val pointer for R is the pointer to the right of P.
The val pointer for U is the pointer to the right of T.
The val pointer to the right of Z is unused.
(As a corollary to this, when you delete an internal node, you swap it with the previous node in the tree, not the subsequent one. Since we are not doing deletion, that doesn't matter.....).

Jdisk: A disk emulator
We are using the same disk emulator,?jdisk?as in?The FAT lab. Please see that lab writeup for information on?jdisk.c/jdisk.h/jdisk_tester. Please also see that lab for information on how to view and modify binary files in VI.

What you have to write: b_tree.c
Your job is to write?b_tree.c. Its job is to implement the interface in?b_tree.h:
#ifndef _BP_TREE_
#define _BP_TREE_

#include "jdisk.h"

void *b_tree_create(char *filename, long size, int key_size);
void *b_tree_attach(char *filename);

unsigned int b_tree_insert(void *b_tree, void *key, void *record);
unsigned int b_tree_find(void *b_tree, void *key);

void *b_tree_disk(void *b_tree);
int b_tree_key_size(void *b_tree);
void b_tree_print_tree(void *b_tree);
#endif

What you are going to do is implement B-trees on top of jdisks. The keys will be buffers of exactly?key_size?bytes, which is defined when you create a btree. The vals will be buffers of exactly JDISK_SECTOR_SIZE bytes. Each node of the tree will fit into a sector of the disk.
The jdisks?must?have a specific format. That means that the jdisks that your programs create must be readable by my btree programs and vice versa (so long as they have the same sizes for longs and the same endian-ness). They don't have to be identical to mine. They just have to work. Here is the format:
Sector 0
Sector zero can have anything in it, so long as the first 16 (or 12) bytes contain the following:
?Bytes 0-3: The key size as an integer.
?Bytes 4-7: The LBA of the sector that holds the root node of the B-tree.
?Bytes 8-15 (or 8-11 if longs are 4 bytes): The first free LBA on the jdisk. You are going to allocate sectors consecutively from sector 0, and since you never deallocate a sector (our B-Trees don't allow deletion), you can keep track of the free sectors with a single number.
?Yes, the remaining 1008 bytes are wasted. You can use them, but since your program has to be interoperable with mine and others, their values will be ignored.
Let's stop there. Take a look at a file that is a jdisk for a B-Tree:
UNIX> ls -l tree-1.jdisk
-rw-r--r-- 1 plank loci 2048000 Feb 13 15:57 tree-1.jdisk
UNIX> jdisk_test R tree-1.jdisk hex 0 16
17 00 00 00 29 00 00 00 f1 01 00 00 00 00 00 00
UNIX>
The file is roughly 2MB, composed of 2,000 sectors. When we look at the first 16 bytes, we see the numbers in little endian format (which is the format of our current Intel processors). They key size is 0x17, or 23 bytes. The LBA of the root node is 0x29, and the first free sector is LBA 0x1f1 = 497.
Nodes
Now, the format of a sector holding a node of the tree is as follows. The first byte is a zero or one, specifying whether the node is external (0) or internal (1). The next byte says how many keys are in the node. How many keys can you fit into a node? The answer is (JDISK_SECTOR_SIZE - 6) / (key_size + 4). You'll see why in a minute. Let's call that value MAXKEY. Keys must be between 4 and 254 bytes (inclusive). Therefore, even if keys are four bytes, you can fit the number of keys into an unsigned char.
The next MAXKEY * key_size bytes are the keys. Then there can be some wasted bytes. The last (MAXKEY+1)*4 bytes are the LBA's, which are the pointers of the B-Trees. If the node is internal, then they are the LBA's of nodes that are pointed to by the node. If the node is external, then they are the LBA's of vals. If there are?nkeys?keys in the node, then there are?nkeys+1?LBA's.
Data (the vals)
Data is stored in a sector. The data *must* be JDISK_SECTOR_SIZE bytes.

A detailed example
Let's explore this a little. Let's try?tree-2.jdisk:
UNIX> ls -l tree-2.jdisk
-rw-r--r-- 1 plank staff 20480 Feb 11 22:58 tree-2.jdisk
UNIX> jdisk_test R tree-2.jdisk hex 0 16
c8 00 00 00 08 00 00 00 0c 00 00 00 00 00 00 00
UNIX>
This is a file with 20 sectors. The key size is 200 (0xc8), of which 12 (0x0c) are currently in use. The LBA of the root node is 8. Let's take a look at the first few bytes of that node:
UNIX> echo '8 1024 * p' | dc
8192
UNIX> jdisk_test R tree-2.jdisk hex 8192 32
01 01 45 6c 69 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
UNIX>
So, this is an internal node, because the first byte is one. It holds one key, because the second byte is also one. The first key is in the next 200 bytes -- if you examine them, they are all zeros except for the first three. I know they hold a string (because I created this file), so let's look at them:
UNIX> jdisk_test R tree-2.jdisk string 8194 3
Eli
UNIX> jdisk_test R tree-2.jdisk string 8194 200
Eli
UNIX>
Just a note here about keys that are strings -- since they are null terminated,?jdisk_test?will print the string, regardless of whether you specify 3 characters or 200. In the example above,?jdisk_test?is reading 200 characters, but since the fourth character is '\0', it only prints out "Eli".
So, there is one key, which means that there are two pointers out of the node. Each of these pointers is an LBA of the sector holding the node to which the pointer points. How do we find these LBA's? Well, first, let's figure out what MAXKEY is: (1024 - 6) / (200+4) = 4.99. That means MAXKEY is four (and there is quite a bit of wasted space in our nodes. So it goes). Since a node can hold four keys, it can hold 5 LBA pointers. Those are in the last 5*4 bytes of the sector. Let's look at them:
UNIX> jdisk_test R tree-2.jdisk hex 9196 20
01 00 00 00 07 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00
UNIX>
So, the first pointer is to block 1, and then second is to block 7. Let's look at block 1:
UNIX> jdisk_test R tree-2.jdisk hex 1024 32
00 04 41 6c 65 78 69 73 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
UNIX>
It is an external node that holds four keys. The keys will start at bytes 1026, 1226, 1426 and 1626:
UNIX> jdisk_test R tree-2.jdisk hex 1024 32
00 04 41 6c 65 78 69 73 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
UNIX> jdisk_test R tree-2.jdisk hex 1026 16
41 6c 65 78 69 73 00 00 00 00 00 00 00 00 00 00
UNIX> jdisk_test R tree-2.jdisk hex 1226 16
41 6c 6c 69 73 6f 6e 00 00 00 00 00 00 00 00 00
UNIX> jdisk_test R tree-2.jdisk hex 1426 16
43 61 6c 65 62 00 00 00 00 00 00 00 00 00 00 00
UNIX> jdisk_test R tree-2.jdisk hex 1626 16
44 61 6e 69 65 6c 00 00 00 00 00 00 00 00 00 00
UNIX> jdisk_test R tree-2.jdisk string 1026 16
Alexis
UNIX> jdisk_test R tree-2.jdisk string 1226 16
Allison
UNIX> jdisk_test R tree-2.jdisk string 1426 16
Caleb
UNIX> jdisk_test R tree-2.jdisk string 1626 16
Daniel
UNIX>
Nice -- this is looking like the keys are all string-based (however, they are still 200 characters -- it just so happens that all of the characters after a string are byte 0x0).
Now, let's look at the LBA's, which will start 20 bytes from the end of the sector:
UNIX> jdisk_test R tree-2.jdisk hex 2028 20
04 00 00 00 0b 00 00 00 0a 00 00 00 02 00 00 00
06 00 00 00
UNIX>
These are the vals:
Alexis' val is the sector at LBA 4.
Allison's val is the sector at LBA 11.
Caleb's val is the sector at LBA 10.
Daniel's val is the sector at LBA 2.
And Eli's val is the sector at LBA 6. Remember how vals work for internal nodes!
If you examine these LBA's, you'll see that they are also strings, where all of the bytes after the strings are zero. Here, I'll prove it for LBA 4, and then we'll look at the strings in those five sectors:
UNIX> jdisk_test R tree-2.jdisk hex 4096 32
47 79 72 6f 73 63 6f 70 65 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
UNIX> echo 1024 16 / p | dc
64
UNIX> jdisk_test R tree-2.jdisk hex 4096 1024 | tail -n 63 | sort -u
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
UNIX> jdisk_test R tree-2.jdisk string 4096 16
Gyroscope
UNIX> echo 11 1024 '*' p | dc
11264
UNIX> jdisk_test R tree-2.jdisk string 11264 16
Embellish
UNIX> jdisk_test R tree-2.jdisk string 10240 16
Sudanese
UNIX> jdisk_test R tree-2.jdisk string 2048 16
Shadowy
UNIX> echo 6 1024 '*' p | dc
6144
UNIX> jdisk_test R tree-2.jdisk string 6144 16
Eider
UNIX>
So, now we know:
?Key: Alexis - Val: Gyroscope
?Key: Allison - Val: Embellish
?Key: Caleb - Val: Sudanese
?Key: Daniel - Val: Shadowy
?Key: Eli - Val: Eider
Let's look at LBA 7 to see the keys greater than "Eli":
UNIX> echo 7 1024 '*' p | dc
7168
UNIX> jdisk_test R tree-2.jdisk hex 7168 16
00 03 47 72 61 63 65 00 00 00 00 00 00 00 00 00
UNIX> jdisk_test R tree-2.jdisk string 7170 16
Grace
UNIX> jdisk_test R tree-2.jdisk string 7370 16
James
UNIX> jdisk_test R tree-2.jdisk string 7570 16
Landon
UNIX> echo 8 1024 '*' 20 - p | dc
8172
UNIX> jdisk_test R tree-2.jdisk hex 8172 20
05 00 00 00 09 00 00 00 03 00 00 00 00 00 00 00
00 00 00 00
UNIX> jdisk_test R tree-2.jdisk string 5120 16
Procter
UNIX> jdisk_test R tree-2.jdisk string `echo 9 1024 '*' p | dc` 16
Chug
UNIX> jdisk_test R tree-2.jdisk string `echo 3 1024 '*' p | dc` 16
Delinquent
UNIX>
So, armed with that information, we can draw our B-Tree as follows:
 

联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!