Performance Tests — CPU vs WebAssembly vs Node.js vs OpenCL vs Browser Pthread
This project benchmarks performance of the same arithmetic workload across multiple execution models:
c[i] = a[i] + b[i]
Array size: ~15 million float elements.
Compared implementations:
• Native C — single core • Native C — Pthreads • Native C — OpenMP • Node.js — JavaScript • Node.js — WebAssembly (single core) • Node.js — WebAssembly + Pthreads (multi-core) • Browser — WebAssembly + Pthreads (multi-core) • OpenCL — CPU and GPU
Build & Run (All Tests)
Run every benchmark automatically (native, Node.js, WebAssembly, OpenCL):
bash compile.sh
The script:
- Compiles all binaries (.c → ./binaries/*)
- Builds WebAssembly versions using Emscripten
- Executes each performance test in sequence
- Prints timing + result samples
Requirements:
• gcc / clang • Node.js • Emscripten (for WASM builds) • OpenCL dev libs (optional, for OpenCL tests)
Run Browser WebAssembly + Pthreads Version
cd wasm_pthread_fast/web
node server.js
Then open:
http://localhost:1234
This runs the multithreaded WASM benchmark inside the browser with correct SharedArrayBuffer support.
Example Results
All implementations validate correct output values (c[0] = 0, c[1] = 3, …)
Times in milliseconds:
| Method | Platform | Cores | Total / Calc Time (ms) | Status |
|---|---|---|---|---|
| Native C | CPU | 1 | 210.63 | OK |
| Node.js | CPU | 1 | 215.15 | OK |
| Wasm Node.js | CPU | 1 | 219.81 | OK |
| OpenMP | CPU | multi | 140.58 | OK |
| C Pthreads | CPU | multi | 21.98 (calc) | Fastest CPU |
| Wasm + Pthreads (Node.js) | CPU | multi | 23.08 (calc) | Very fast |
| Wasm + Pthreads (Browser) | CPU | multi | 35.21 (calc) | Fast |
| OpenCL CPU only | CPU | many? | 162.36 total | OK |
| OpenCL GPU | GPU | many | Crash | Driver dependent |
Folder Overview
| Path | Description |
|---|---|
| add_single_core.c | Single-threaded C baseline |
| pthread_add.c | Multi-core with Pthreads |
| openmp_add.c | Multi-core with OpenMP |
| opencl_add_cpu.c | CPU via OpenCL runtime |
| opencl_add_gpu.c | GPU compute attempt |
| wasm_add.c | WebAssembly (single-core) |
| wasm_add_pthread.c | WebAssembly (multi-core) |
| wasm_node.js | Node test for single-core WASM |
| wasm_pthread_fast/ | Multi-threaded WASM version |
| wasm_pthread_fast/web | Browser runner + local server |
| compile.sh | Complete build + test pipeline |
Findings
• Multi-core CPU execution is far superior to single-threaded versions • Node.js + WebAssembly threads approach native CPU performance • Browser WASM threading provides strong performance with minimal overhead • GPU workloads are not benefited due to memory transfer bottlenecks – GPU will win when computation-per-element is higher
Future Expansion
• Higher compute complexity test kernels • Multi-run average statistics • Visual charts comparing performance gaps • GPU-friendly workloads showing real acceleration crossover