Embeddings
UniLM.jl supports text embeddings across multiple providers. The default is OpenAI's text-embedding-3-small (1536 dimensions), but you can target Ollama, Gemini, Mistral, or any OpenAI-compatible server via the service parameter.
Basic Usage
using UniLM
# Single text
emb = Embeddings("Julia is a high-performance programming language")
println("Model: ", emb.model)
println("Embedding dimensions: ", length(emb.embeddings))
println("Pre-allocated (all zeros): ", all(x -> x == 0.0, emb.embeddings))Model: text-embedding-3-small
Embedding dimensions: 1536
Pre-allocated (all zeros): trueAfter calling the API, the embeddings are filled in-place:
emb = Embeddings("Julia is a high-performance programming language for technical computing.")
embeddingrequest!(emb)
println("First 5 dimensions:")
for v in emb.embeddings[1:5]
println(" ", round(v, digits=6))
end
println("L2 norm: ", round(sqrt(sum(x^2 for x in emb.embeddings)), digits=4))First 5 dimensions:
-0.03952
-0.009293
0.001699
-0.028168
0.063354
L2 norm: 1.0004Batch Embeddings
Embed multiple texts in a single API call:
texts = [
"Julia is fast",
"Python is popular",
"Rust is safe"
]
emb = Embeddings(texts)
println("Model: ", emb.model)
println("Number of texts: ", length(emb.input))
println("Embeddings per text: ", length(emb.embeddings[1]), " dimensions")Model: text-embedding-3-small
Number of texts: 3
Embeddings per text: 1536 dimensionsembeddingrequest!(emb)
println("Embedding dimensions per text: ", length(emb.embeddings[1]))Embedding dimensions per text: 1536Computing Similarity
A common use case is computing cosine similarity between embeddings:
using LinearAlgebra
emb = Embeddings(["Julia", "Python", "Rust", "Fortran"])
embeddingrequest!(emb)
sim = dot(emb.embeddings[1], emb.embeddings[4]) /
(norm(emb.embeddings[1]) * norm(emb.embeddings[4]))
println("Cosine similarity (Julia vs Fortran): ", round(sim, digits=4))Cosine similarity (Julia vs Fortran): 0.3111Available Models
| Model | Dimensions | Notes |
|---|---|---|
text-embedding-3-small | 1536 | Default, good balance |
text-embedding-3-large | 3072 | Higher quality, more dimensions |
The default Embeddings constructor pre-allocates for 1536 dimensions (text-embedding-3-small). To use text-embedding-3-large, you would need to adjust the embedding vector size accordingly.
Using Other Providers
Pass service and model to embed with a different backend:
# Ollama (local)
emb = Embeddings("test"; service=OllamaEndpoint(), model="nomic-embed-text")
embeddingrequest!(emb)
# Gemini
emb = Embeddings("test"; service=GEMINIServiceEndpoint, model="gemini-embedding-001")
embeddingrequest!(emb)Different providers return different embedding dimensions. The default pre-allocation assumes 1536 dimensions (OpenAI's text-embedding-3-small).
In-Place Design
The Embeddings struct pre-allocates the embedding vectors at construction time. embeddingrequest! fills them in-place — no allocation on the hot path. This is idiomatic Julia for performance-sensitive workloads.
emb = Embeddings("test")
println("Pre-allocated length: ", length(emb.embeddings))
println("All zeros before API call: ", all(x -> x == 0.0, emb.embeddings))Pre-allocated length: 1536
All zeros before API call: trueRetry Behaviour
embeddingrequest! automatically retries on HTTP 429, 500, and 503 errors with exponential backoff and jitter (up to 30 attempts, max 60s delay). On 429 responses, the Retry-After header is respected.
API Reference
See the Embeddings API page for full type documentation.