README.md

# Performance Tests — CPU vs WebAssembly vs Node.js vs OpenCL vs Browser Pthread

This project benchmarks performance of the same arithmetic workload across multiple execution models:

```
c[i] = a[i] + b[i]
```

Array size: ~15 million `float` elements.

Compared implementations:

• Native C — single core
• Native C — Pthreads
• Native C — OpenMP
• Node.js — JavaScript
• Node.js — WebAssembly (single core)
• Node.js — WebAssembly + Pthreads (multi-core)
• Browser — WebAssembly + Pthreads (multi-core)
• OpenCL — CPU and GPU

---

## Build & Run (All Tests)

Run every benchmark automatically (native, Node.js, WebAssembly, OpenCL):

```
bash compile.sh
```

The script:

1. Compiles all binaries (.c → ./binaries/*)
2. Builds WebAssembly versions using Emscripten
3. Executes each performance test in sequence
4. Prints timing + result samples

Requirements:

• gcc / clang
• Node.js
• Emscripten (for WASM builds)
• OpenCL dev libs (optional, for OpenCL tests)

---

## Run Browser WebAssembly + Pthreads Version

```
cd wasm_pthread_fast/web
node server.js
```

Then open:

```
http://localhost:1234
```

This runs the multithreaded WASM benchmark inside the browser with correct SharedArrayBuffer support.

---

## Example Results

All implementations validate correct output values (`c[0] = 0`, `c[1] = 3`, …)

Times in milliseconds:

| Method                    | Platform | Cores | Total / Calc Time (ms) | Status           |
| ------------------------- | -------- | ----- | ---------------------- | ---------------- |
| Native C                  | CPU      | 1     | 210.63                 | OK               |
| Node.js                   | CPU      | 1     | 215.15                 | OK               |
| Wasm Node.js              | CPU      | 1     | 219.81                 | OK               |
| OpenMP                    | CPU      | multi | 140.58                 | OK               |
| C Pthreads                | CPU      | multi | 21.98 (calc)           | **Fastest CPU**  |
| Wasm + Pthreads (Node.js) | CPU      | multi | 23.08 (calc)           | **Very fast**    |
| Wasm + Pthreads (Browser) | CPU      | multi | 35.21 (calc)           | **Fast**         |
| OpenCL CPU only           | CPU      | many? | 162.36 total           | OK               |
| OpenCL GPU                | GPU      | many  | Crash                  | Driver dependent |

---

## Folder Overview

| Path                  | Description                    |
| --------------------- | ------------------------------ |
| add_single_core.c     | Single-threaded C baseline     |
| pthread_add.c         | Multi-core with Pthreads       |
| openmp_add.c          | Multi-core with OpenMP         |
| opencl_add_cpu.c      | CPU via OpenCL runtime         |
| opencl_add_gpu.c      | GPU compute attempt            |
| wasm_add.c            | WebAssembly (single-core)      |
| wasm_add_pthread.c    | WebAssembly (multi-core)       |
| wasm_node.js          | Node test for single-core WASM |
| wasm_pthread_fast/    | Multi-threaded WASM version    |
| wasm_pthread_fast/web | Browser runner + local server  |
| compile.sh            | Complete build + test pipeline |

---

## Findings

• Multi-core CPU execution is far superior to single-threaded versions
• Node.js + WebAssembly threads approach native CPU performance
• Browser WASM threading provides strong performance with minimal overhead
• GPU workloads are not benefited due to memory transfer bottlenecks
– GPU will win when computation-per-element is higher

---

## Future Expansion

• Higher compute complexity test kernels
• Multi-run average statistics
• Visual charts comparing performance gaps
• GPU-friendly workloads showing real acceleration crossover