Programming Assignment -- CS 7820

The following resources will help you learn the basics of Pthreads/MPI/TM programming. After experimenting with the toy example programs, solve the following problem with shared-memory (using Pthreads) and message-passing (using LAM-MPI). Think about how transactional memory would influence performance and the programming experience. Due on Wednesday 28th February.


Resources:

Optional reading:

Problem: Consider an application with one master thread, one producer thread, and four consumer threads.

The application has a central data structure: A 4-element array of doubly-linked lists (you will need a head pointer and a tail pointer to represent each doubly-linked list). Each linked list consists of a task queue for one of the four consumer threads. Each element in the linked list stores an "oldvalue" and "newvalue" (in addition to pointers for the next and previous elements).

The producer thread reads an input file (example format here ) that contains a number of entries. Each entry contains the id of the consumer thread that should handle this task, and two other integers ("oldvalue" and "newvalue"). After reading an entry, the producer inserts this task at the tail of the task queue for the corresponding consumer. It then sleeps for a random amount of time between 0 to 1 milli-seconds and then reads the next entry in the input file.

The consumer thread pulls out the head of its task queue (does nothing if the queue is empty) and appends the values of "oldvalue" and "newvalue" to its output file. It then sleeps for a random amount of time between 0 to 4 milli-seconds and checks the task queue again. At the end of the program, the output file (for the input file listed above) should be something like this for consumer 0.

While all of the above is happening, the master thread keeps walking through all the linked lists and incrementing the value of "newvalue" in every element. After walking through every linked list, it sleeps for a random amount of time between 0 to 1 nano-second (in reality, it sleeps longer).

For the pthread implementation, carry out the following analysis: (i) Write versions of your program with pthread_mutex_lock and pthread_mutex_trylock (read the corresponding man pages for more details). (ii) Write versions of your program with coarse-grained and fine-grained locks. (iii) With the help of pthread_mutex_trylock, keep track of the percentage of successful lock acquires with the coarse-grained and fine-grained version (you may have to take the average of multiple runs to draw reasonable conclusions).

Feel free to experiment with other milli-second times (the above values were chosen to allow the program to finish in less than 10 seconds for the given input file), input files, processors (multi-core/SMT). In my example, an "oldvalue" of 0 is used to indicate that this is the last task for the consumer, so it can quit. Allow the master program to walk through the linked lists for a fixed number of iterations, so it can also complete in a reasonable time. The MPI algorithm may deviate a little from the Pthread algorithm. Here's a skeleton single-thread program that you can use as a starting point.

What you need to submit back to me (email me the tarball if that is most convenient): the C programs (appropriately titled and commented), a README file that provides the above analysis for the pthreads version, and a short note describing your programming experience (which was easier to program in, what did you learn, how would transactions help?). Also describe any additional experiments you may have tried. The entire assignment will likely take you no more than twenty hours. This is due on Wednesday 28th February.

For more practice, try this second problem (taken from Dan Sorin's web-page): A multi-threaded application that computes the Nth prime number (where N is a user-specified input).