linux-sysprog · advanced · ~15 min

Sum an array using multiple threads

Parallel reduction without shared mutation — pass each thread a unique partial-sum slot.

Challenge

Sum an integer array in parallel: split it into contiguous chunks, let each thread total its own chunk into a private slot, then add the partial sums. No shared accumulator, so no mutex is needed.

Task

Implement long parallel_sum(const int *a, int n, int nthreads) that returns the sum of a[0..n-1] computed across nthreads threads.

Input

a, n: the array and its length.
nthreads: how many threads to split the work across.

Output

Divides the array into nthreads roughly-equal contiguous chunks, has each thread sum its chunk into its own partial-result slot, joins all threads, and returns the sum of the partials. Returns -1 if nthreads <= 0 or nthreads > 64.

Example

a = [0,1,2,...,99]   (sum = 4950)
parallel_sum(a, 100, 4)   ->   4950
parallel_sum(a, 100, 1)   ->   4950
parallel_sum(a, 100, 7)   ->   4950   (uneven split still totals correctly)
parallel_sum(a, 0,   4)   ->   0
parallel_sum(a, 100, 0)   ->   -1     (invalid thread count)

Edge cases

n == 0: returns 0.
nthreads <= 0 or > 64: returns -1.
The last chunk absorbs any remainder when n doesn't divide evenly.

Rules

Each thread writes only its own partial slot — no shared accumulator, no mutex.

Why this matters

Parallel reduction — split work into chunks, each thread sums its chunk, main adds the partial sums.

Input format

An int array a of length n, plus nthreads (the number of worker threads).

Output format

The total sum of the array, or -1 if nthreads <= 0 or > 64.

Constraints

Split into nthreads contiguous chunks; each thread sums into its own slot. n == 0 gives 0.

Starter code

#include <pthread.h>
long parallel_sum(const int *a, int n, int nthreads) {
    /* TODO */
    return -1;
}

Background lessons

Up next

Fix the race with a pthread_mutex_t

Solve this exercise in the browser editor — compile and run against the test harness, no setup required.