# Minimal Attention Model Demo (Browser-Only)

This project is a small in-browser demonstration of key components of a transformer-style attention mechanism. It runs entirely in JavaScript using ES modules.

It includes:

• Word embeddings
• Positional encoding
• Scaled dot-product attention
• Softmax scoring
• Simple training loop (cross-entropy loss)
• Prediction of next token based on input context

No third-party machine learning libraries are used.

---

## Files

| File         | Purpose                              |
| ------------ | ------------------------------------ |
| `index.html` | Basic UI output + script inclusion   |
| `real.js`    | Full attention model implementation  |
| `Vector.js`  | Basic vector operations              |
| `Matrix.js`  | Basic dense matrix operations        |
| `server.js`  | Minimal static HTTP server (Node.js) |

---

## Vocabulary

The demo uses a tiny fixed vocabulary:

```
The, Cat, Sat, On, Mat, Bench, Book, Great, Is
```

Tokens are mapped to integer indices.

---

## Training

Training data sequences:

```
["The Book Is Great"]
["The Cat Sat On The Mat"]
["The Cat Sat On The Bench"]
…
```

Each epoch loops over all sequences and performs:

1. Embedding lookup
2. Positional encoding added to embeddings
3. Query / Key / Value projections
4. Scaled dot-product attention
5. Weighted sum → logits → softmax probabilities
6. Cross-entropy loss + weight updates on:
   • Output projection matrix
   • Token embeddings

The system prints intermediate progress into DOM elements.

---

## Output

Once trained, the model prints predictions:

```
Next word after 'The Book Is': ...
Next word after 'The Cat Sat': ...
Next word after 'The Cat': ...
...
```

Predictions are appended to `.prediction` container in the page.

---

## How to Run

### 1 — Start the server

From the folder containing `server.js` and the HTML/JS files:

```bash
node server.js
```

Server will listen on:

```
http://localhost:1234
```

### 2 — Open the demo in a browser

Navigate to:

```
http://localhost:1234
```

The demo will:

• Load embeddings
• Run training loop
• Display loss progression
• Show final predictions

---

## Notes

• This is a simplified demonstration intended for clarity, not accuracy
• No batching, dropout, layer-norm, or multi-head attention
• Update rules only modify embeddings + output projection (queries/keys/values not updated)