Files
Simple-Transformer-JS/README.md
2025-11-18 13:16:41 +01:00

2.4 KiB

Minimal Attention Model Demo (Browser-Only)

This project is a small in-browser demonstration of key components of a transformer-style attention mechanism. It runs entirely in JavaScript using ES modules.

It includes:

• Word embeddings • Positional encoding • Scaled dot-product attention • Softmax scoring • Simple training loop (cross-entropy loss) • Prediction of next token based on input context

No third-party machine learning libraries are used.


Files

File Purpose
index.html Basic UI output + script inclusion
real.js Full attention model implementation
Vector.js Basic vector operations
Matrix.js Basic dense matrix operations
server.js Minimal static HTTP server (Node.js)

Vocabulary

The demo uses a tiny fixed vocabulary:

The, Cat, Sat, On, Mat, Bench, Book, Great, Is

Tokens are mapped to integer indices.


Training

Training data sequences:

["The Book Is Great"]
["The Cat Sat On The Mat"]
["The Cat Sat On The Bench"]
…

Each epoch loops over all sequences and performs:

  1. Embedding lookup
  2. Positional encoding added to embeddings
  3. Query / Key / Value projections
  4. Scaled dot-product attention
  5. Weighted sum → logits → softmax probabilities
  6. Cross-entropy loss + weight updates on: • Output projection matrix • Token embeddings

The system prints intermediate progress into DOM elements.


Output

Once trained, the model prints predictions:

Next word after 'The Book Is': ...
Next word after 'The Cat Sat': ...
Next word after 'The Cat': ...
...

Predictions are appended to .prediction container in the page.


How to Run

1 — Start the server

From the folder containing server.js and the HTML/JS files:

node server.js

Server will listen on:

http://localhost:1234

2 — Open the demo in a browser

Navigate to:

http://localhost:1234

The demo will:

• Load embeddings • Run training loop • Display loss progression • Show final predictions


Notes

• This is a simplified demonstration intended for clarity, not accuracy • No batching, dropout, layer-norm, or multi-head attention • Update rules only modify embeddings + output projection (queries/keys/values not updated)