Preparing serial C code for parallelization

Preparing serial C code for parallelization

  • Post author:
  • Post category:Programming

High performance software requires some amount of multi-threading. Threading allows parallel execution of code where data is shared between the threads. In C/C++ global variables are shared. Updating source code to support multi-threading will require analysis of data flow, isolation, and thought out sharing.

Many applications were written without multi-threading in mind and there is significant code bases to bring up to date. I worked on a very old application of historical significance written in C and here share how I prepared the code for parallelization. The example is Mapmaker/Exp 3.0b (aka Mapmaper/QTL 1.1), a well cited tool used in QTL analysis from Eric Lander at the Whitehead Institute for Biomedical Research. The original and updated code, with all the intermediate steps, is available on GitHub

Rough notes and step

1. Insert a clean copy of latest distribution into source control.

2. Use the best tools available.

Today, I use CLion for editing code. The ability to quickly move between use and definition and declaration of a function is useful. CLion is not well able to tie together calls and definitions in the case of older K&R C code, where declarations lack arguments. I wait until later to clean up the arguments. A small warning: the code formatting is wonderful, but delay making large formatting changes because these can create a lot of noise when comparing versions.

3. Modify only what is necessary so that the code compiles.

The Mapmaker was maintain over the 1987-1992 period on operating systems available at the time. Code that touches the operating system or user interface are the most likely to need updating. In the case of Mapmaker, a copy of a modified GNU readline library was included. The code was updated to use a current version installed on the system and dynamically linked. Mapmaker was also supported on various operating systems including DOS, Ultrix, AUX, Solaris, and Mac. The differences were wrapped in preprocessor flags and controlled various return types primarily. Eventually, these will be removed entirely, but initially find the minimal set of changes to get the code compiling.

4. Create a gold standard test cases.

Once the code can be run, create as many test cases as possible, exercising as many options as possible, capturing as many results as possible. Results should include a transcript of the user interface interaction, if practical. All input-output pairs and logs should be archived. Ideally, the tests should be able to be scripted. The goal is to have a set of results to compare to as subsequent modifications are made. While building the results, if some commands fail, e.g., there are bugs in the original code, consider repairing them now or leaving the repair till later. An argument could be made for either. If the repair is simple, I would probably make it now, but if the repair is complex, I might wait. For each of the subsequent steps, repeated many times during the step, I rerun the test cases and compare to the gold standards. Eventually, there’s likely to be an initial change that changes the accepted results, perhaps a bug is discovered and repaired that changes the output. That’s fine, the gold standards must be updated then, but do so only when certain of correctness.

5. Use the best tools available.

This one is repeated. At this point, or perhaps in step 3, I prefer to convert old Makefile or even old autoconf files to cmake. This is compatible CLion and portable across platforms.

5. Clean up compiler warnings.

Compiler warnings can hint at problems with the code that might create surprises later. Enable all the GCC/G++ warnings with the option -Wall. On some codes, a style or method might have been used so often that the warnings are distracting, but try to let the compiler help you but at least fixing the most argeous problems including “uninitialized variables” “functions with a type not returning a value” “fallthrough in switch statements” “printf/scanf calls with format types not matching arguments” and on and on. The gcc compiler will generate different warnings with the optimizer enabled. I start with only -g, clean up those warnings, and then add -O2.

6. Run the software through Valgrind, checking for memory problems.

Problems potentially leading to memory corruption, double frees, conditions dependent on uninitialized data, etc. are all examples of what Valgrind can catch.

7. Clean up argument lists.

Older C code was lax on arguments. It was common to have a forward declaration with an empty argument list. Without the argument list, the compiler can’t check if types match. For some codes, cleaning up and properly declaring complete argument lists for all functions will be a lot of work. I don’t know of a good tool for converting form the older argument-types-outside-of-the-argument-list to the more recent form. If you do, please let me know. Making these changes will also make updating argument lists, removing unused parameters, and adding const where possible.

So, if your code looks like this, this is called the K&R C style, with initialization lists instead of argument lists. This can create limit static analysis of the code. It’s best to update to ANSI style.

DISTRIBUTION *make_distribution(sigma,inc,limit,mean,prob_func)
real sigma, inc, limit, mean;   /* inc and limit here are offsets from mean */
real (*prob_func)();        /* a ptr to a function */

To only generate a list of declarations, such was those in header files, check out ‘cproto’, which can optionally update the definitions (the code body) in the .c files with ‘cproto -a <filename>’. The style of code that cproto generates might not be quite to your taste, but cproto will save a lot of work getting to a state where static analysis can be performed and there are other tools for updating coding styles.

make_distribution (
    real sigma,
    real inc,
    real limit,
    real mean,   /* inc and limit here are offsets from mean */
    real (*prob_func)(void)        /* a ptr to a function */

8. Remove unused functions.

At this point, CLion should be able to identify functions reliably that are not being called. I comment these out and later remove them. These functions may have been used for something once and replaced, or intended for a future feature. Regardless, for code that’s as old as Mapmaker, it’s unlikely someone will attempt to figure out what the original authors were planning to implement, and finish the code. It’s hard to discard the unused code thinking there may be valuable work represented there, but discarding unused code also prevents time from being spent considering it for thread safety during parallelization.

9. Clean up argument lists and globals

At this point, I remove unused arguments from functions and to add const where possible. This helps with understanding how each function modifies data. The modification of data initially needs to be understood when converting a program to be multi-threaded. These are the areas that are most likely to be thread unsafe, especially modification of global variables. Start to isolate these by changing global variables to have file scope to start, if possible. This at least restricts the functions that need to be examined for this global.

10. Bundle data that’ll be modified

Data that each thread needs to function I like to bundle into a structure or class, creating thread-specific copies where needed. Data that is read-only can be shared with all the threads without making a copy, conserving memory. Perhaps a large matrix is shared and each thread receives as an argument the range the thread will process. More generally, the bundled data will include space for processing and returning results allocated on the heap. The bundles can be inserted into a queue that a thread pool processes and the results reduced by the main thread.

This concludes the notes taken while prepping MapMaker for parallelization.