CSE 2431 Lab 4: Thread-Level Parallelism
Instructor: Adam C. Champion, Ph.D.
Due: Wednesday, November 6, 2019, 11:59 p.m. (40 points)
Group Size: 1, which means you are required to finish this lab assignment by yourself.
Goal: Gaining experience decomposing solutions using multiple threads to increase performance
(i.e., decreasing runtime) leveraging thread-level parallelism in computer systems.
Introduction: In this project, you will design programming solutions for matrix multiplication
that take advantage of multiple threads executing in parallel.
Let A be an n × m matrix and B be an m × p matrix. Then C = AB yields C, an n × p matrix.
The code below illustrates an algorithm for matrix multiplication C = AB. 1 for (i = 0; i < n; i++) {
2 for (j = 0; j < p; j++) {
3 c[i][j] = 0;
4 for (k = 0; k < m; k++) {
5 c[i][j] += a[i][k] * b[k][j];
6 }
7 }
8 }
Lab Assignment: In your program, matrix A should be 1200 × 1000 and matrix B should be
1000 × 500 (i.e., n = 1200, m = 1000, and p = 500). Initialize members of matrix A as follows:
a[i][j] = i + j for i = 0, 1, 2, . . . , nn 1 and j = 0, 1, 2, . . . , mm 1. Initialize members of matrix
B as follows: b[i][j] = j for i = 0, 1, 2, . . . , m m 1 and j = 0, 1, 2, . . . , pp 1. Please develop your
program using the following steps:
(1) Your program should first multiply these two matrices according to the algorithm given
above. Let matrix C1 store the result of this multiplication. Your program will display the time
needed to perform the calculation. Do not include the time required for initializing matrices
A and B.
(2) At this point, you should develop an algorithm for matrix multiplication that can exploit two
or more threads executing in parallel. Try to develop an algorithm that works for n threads
executing in parallel (where n = 2, 3, 4, . . .).
(3) Next, your program should perform multiplication based on the algorithm that you developed
in step (2). Your main thread should create two threads that will perform multiplication; each
thread terminates when it completes its task. Let matrix C store the result of this calculation.
Your program should display the time needed to perform the calculation. If the system has at
least two processors, you should see time improvements. Then compare matrix C with matrix
C1 and indicate if there are any differences in the values of each matrix. A well-developed
solution should have no differences.
Page 1 of 2
(4) Next, your program should perform multiplication based on an algorithm that decomposes
matrix multiplication such that three threads work in parallel. Your main thread should
initialize all integers in matrix C to zero and create three threads that will perform the
calculation; each thread terminates when it completes its task. Let matrix C store the result
of this calculation. Your program should display the time needed to perform the calculation.
If the system has at least three processors, you should see time improvements. Then compare
matrix C with matrix C1 and indicate if there are any differences. A well-developed solution
should have no differences.
(5) Repeat the calculation with four, five, and six threads similar to Steps (3) and (4) above.
(6) Bonus: Perform calculation with seven threads.
Output: Your program output should be similar to the following:
Threads Seconds Error
1 3.38
2 1.72 No error
3 1.16 No error
4 0.92 No error
5 1.05 No error
6 0.91 No error
Compilation: Use the following to compile your code in matrix.c and obtain the executable
exec_matrix:
gcc -O1 -Wall -o exec_matrix matrix.c -lpthread
Submission: Submit your file matrix.c in a .tar.gz or .zip file on Carmen using the commands
described for Lab 0. At the beginning of the file matrix.c, you need to tell the grader your full name,
how to compile your file, and how to run your compiled program. Make sure your instructions
work.