Homework#4
CS590-07 Big Data and Cloud Computing, Fall 2019
Due: December 13 (Friday)
Submit your homework (HW4_YourLastName_FirstName.zip ) to the Blackboard course webpage.
Part I. HBase setup and test
1. Step up HBase, and verify that it’s running with HBase shell. Refer to
https://hbase.apache.org/book.html
2. Save the source code of MyLittleHBaseClient in Example API Usage at
https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/package-summary.html
, and show that the program is compiled and executed.
Part II. HBase Column-Based Store
You will use the HBase shell for Part II. You may find HBase Shell Commands from various sources, e.g.,
http://hadooptutorial.info/hbase-shell-commands-in-practice/
Write your shell commend for following each question and submit the execution result.
3. In the HBase shell, create a table called BookBigT whose schema will able to house book's ISBN,
book-title, book-author, year-of-publication, publisher, and image-url. ISBN will be used as the row key.
Suppose that book’s title and author will be saved and retrieved together. So they should be grouped.
Year-of-publication and publisher should also be grouped. Image-url is in a separate group. In this case,
it would be wise to place them into 3 families such as name, publish, and picture. Then, book’s title and
author will come columns of “name”, year-of-publication and publisher of “publish”, and image-url of
“picture”.
4. Add the following data records to BookBigT table.
ISBN Book-Title
BookAuthor
Year-OfPublication Publisher Image-URL
195153448
Classical
Mythology
Mark P. O.
Morford 2002
Oxford
University Press http://images.amazon.com/images/P/0195153448.01.LZZZZZZZ.jpg
2005018
Clara
Callan
Richard
Bruce
Wright 2001
HarperFlamingo
Canada http://images.amazon.com/images/P/0002005018.01.LZZZZZZZ.jpg
60973129
Decision
in
Normandy
Carlo
D'Este 1991 HarperPerennial http://images.amazon.com/images/P/0060973129.01.LZZZZZZZ.jpg
5. Retrieve all the records in the BookBigT table.
6. Retrieve an entire record with ISBN 20050108 from the BookBigT table.
7. Only retrieve title and author for record with ISBN 20050108 from BookBigT.
8. Change the name of an author for the record with Richard Bruce Wright to Richard B. Write.
Query the record to verify the change. Display both new and old value.
9. Retrieve title, author, and image url for the first 2 records.
10. Additionally add some new records below into the BookBigT table. These records have variant
columns. For example, some books have co-authors. In that case, include columns, e.g., co-author1, coauthor2, … based on the need, in the name family.
< 1491913703, Book-Title: Data Analytics with Hadoop, Benjamin Bengfort, Jenny Kim, 2016, O'Reilly
Media, no image >
< 0321321367, Introduction to Data Mining 1st Edition, Pang-Ning Tan, Michael Steinbach, Vipin Kumar,
2005, Pearson, no image>
<1491901632, Hadoop: The Definitive Guide, Tom White, 2015, O'Reilly Media, no image >
11. Retrieve all the records, and then delete all the three records.
Part III. HBase API Programming
For Part III, you will continue to use BookBigT table you created in Part II. Currently the BookBigT
would be empty.
First, download Books-sample-250records.csv provided.
Develop a java program with HBase API (and MapReduce API) for the following tasks, and show the
execution result.
12. First, read the given input file and insert all the data records into BookBigT table.
13. And then, include sequence program codes with HBase API for following each task:
(1) Retrieve all the records for the first 10 records and display them to screen (standard output).
(2) Retrieve an entire record with ISBN 20050108 and display it to screen.
(3) Only retrieve title and author for record with ISBN 20050108 and display it to screen.
(4) Change the name of an author for the record with Richard Bruce Wright to Richard B. Write.
(5) Retrieve an entire record with ISBN 20050108 and display it to screen for the previous update
verification.
(6) Delete all the records and display the total number of records deleted.
Part IV. MongoDB Document-Based Store
For Part III, You will use MongoDB shell. First, step up MongoDB in your computer.
You can find MongoDB Shell Commands from various sources, e.g., Getting Started with the mongo
Shell
References: MongoDB Homepage, MongoDB Manual, Getting Started with the mongo Shell ,
MongoDB Operator
For each question below, write a MongoDB script statement and submit the execution result.
14. Create a collection called ‘games’. We’re going to put some games in it.
15. Add 5 games to the database. Each document below has three properties, name, genre and rating.
name: "Spy Hunter", genre: "Racing", rating: 76
name: "Mario Kart 64", genre: "Racing", rating: 96
name: "Tetris", genre: "Puzzle", rating: 83
name: "Mega Man 5", genre: "Platformer", rating: 81
name: "Star Fox", genre: "Action", rating: 71
If you make some mistakes and want to clean it out, use remove() on your collection.
16. Write a query that returns all the games.
17. Write a query to find one of your games by name, i.e., “Mario Kart 64” without using limit(). Use the
findOne method.
18. Write a query that returns the 3 highest rated games.
19. Using update(), update a game named with “Star Fox” to have two achievements which has properties
below:
‘name’: ‘Game Master’, ‘points’: 100
‘name’: ‘Speed Demon’, ‘points’: 135
20. Write a query that returns all the games that have both the ‘Game Maser’ and the ‘Speed Demon’
achievements.
21. Write a query that returns only games that have achievements. Note that not all of your games should
have achievements.