2025-11-18 12:55:09 +01:00
|
|
|
|
# Performance Tests — CPU vs WebAssembly vs Node.js vs OpenCL vs Browser Pthread
|
|
|
|
|
|
|
|
|
|
|
|
This project benchmarks performance of the same arithmetic workload across multiple execution models:
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
c[i] = a[i] + b[i]
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
Array size: ~15 million `float` elements.
|
|
|
|
|
|
|
|
|
|
|
|
Compared implementations:
|
|
|
|
|
|
|
|
|
|
|
|
• Native C — single core
|
|
|
|
|
|
• Native C — Pthreads
|
|
|
|
|
|
• Native C — OpenMP
|
|
|
|
|
|
• Node.js — JavaScript
|
|
|
|
|
|
• Node.js — WebAssembly (single core)
|
|
|
|
|
|
• Node.js — WebAssembly + Pthreads (multi-core)
|
|
|
|
|
|
• Browser — WebAssembly + Pthreads (multi-core)
|
|
|
|
|
|
• OpenCL — CPU and GPU
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## Build & Run (All Tests)
|
|
|
|
|
|
|
|
|
|
|
|
Run every benchmark automatically (native, Node.js, WebAssembly, OpenCL):
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
bash compile.sh
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
The script:
|
|
|
|
|
|
|
|
|
|
|
|
1. Compiles all binaries (.c → ./binaries/*)
|
|
|
|
|
|
2. Builds WebAssembly versions using Emscripten
|
|
|
|
|
|
3. Executes each performance test in sequence
|
|
|
|
|
|
4. Prints timing + result samples
|
|
|
|
|
|
|
|
|
|
|
|
Requirements:
|
|
|
|
|
|
|
|
|
|
|
|
• gcc / clang
|
|
|
|
|
|
• Node.js
|
|
|
|
|
|
• Emscripten (for WASM builds)
|
|
|
|
|
|
• OpenCL dev libs (optional, for OpenCL tests)
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## Run Browser WebAssembly + Pthreads Version
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
cd wasm_pthread_fast/web
|
|
|
|
|
|
node server.js
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
Then open:
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
http://localhost:1234
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
This runs the multithreaded WASM benchmark inside the browser with correct SharedArrayBuffer support.
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## Example Results
|
|
|
|
|
|
|
|
|
|
|
|
All implementations validate correct output values (`c[0] = 0`, `c[1] = 3`, …)
|
|
|
|
|
|
|
|
|
|
|
|
Times in milliseconds:
|
|
|
|
|
|
|
|
|
|
|
|
| Method | Platform | Cores | Total / Calc Time (ms) | Status |
|
|
|
|
|
|
| ------------------------- | -------- | ----- | ---------------------- | ---------------- |
|
|
|
|
|
|
| Native C | CPU | 1 | 210.63 | OK |
|
|
|
|
|
|
| Node.js | CPU | 1 | 215.15 | OK |
|
|
|
|
|
|
| Wasm Node.js | CPU | 1 | 219.81 | OK |
|
|
|
|
|
|
| OpenMP | CPU | multi | 140.58 | OK |
|
|
|
|
|
|
| C Pthreads | CPU | multi | 21.98 (calc) | **Fastest CPU** |
|
|
|
|
|
|
| Wasm + Pthreads (Node.js) | CPU | multi | 23.08 (calc) | **Very fast** |
|
|
|
|
|
|
| Wasm + Pthreads (Browser) | CPU | multi | 35.21 (calc) | **Fast** |
|
|
|
|
|
|
| OpenCL CPU only | CPU | many? | 162.36 total | OK |
|
|
|
|
|
|
| OpenCL GPU | GPU | many | Crash | Driver dependent |
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## Folder Overview
|
|
|
|
|
|
|
|
|
|
|
|
| Path | Description |
|
|
|
|
|
|
| --------------------- | ------------------------------ |
|
|
|
|
|
|
| add_single_core.c | Single-threaded C baseline |
|
|
|
|
|
|
| pthread_add.c | Multi-core with Pthreads |
|
|
|
|
|
|
| openmp_add.c | Multi-core with OpenMP |
|
|
|
|
|
|
| opencl_add_cpu.c | CPU via OpenCL runtime |
|
|
|
|
|
|
| opencl_add_gpu.c | GPU compute attempt |
|
|
|
|
|
|
| wasm_add.c | WebAssembly (single-core) |
|
|
|
|
|
|
| wasm_add_pthread.c | WebAssembly (multi-core) |
|
|
|
|
|
|
| wasm_node.js | Node test for single-core WASM |
|
|
|
|
|
|
| wasm_pthread_fast/ | Multi-threaded WASM version |
|
|
|
|
|
|
| wasm_pthread_fast/web | Browser runner + local server |
|
|
|
|
|
|
| compile.sh | Complete build + test pipeline |
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## Findings
|
|
|
|
|
|
|
|
|
|
|
|
• Multi-core CPU execution is far superior to single-threaded versions
|
|
|
|
|
|
• Node.js + WebAssembly threads approach native CPU performance
|
|
|
|
|
|
• Browser WASM threading provides strong performance with minimal overhead
|
|
|
|
|
|
• GPU workloads are not benefited due to memory transfer bottlenecks
|
|
|
|
|
|
– GPU will win when computation-per-element is higher
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## Future Expansion
|
|
|
|
|
|
|
|
|
|
|
|
• Higher compute complexity test kernels
|
|
|
|
|
|
• Multi-run average statistics
|
|
|
|
|
|
• Visual charts comparing performance gaps
|
|
|
|
|
|
• GPU-friendly workloads showing real acceleration crossover
|
|
|
|
|
|
|
2025-11-18 12:54:23 +01:00
|
|
|
|
|