Skip to main content
Related: Configuration, Relation, Compiler, DX, Models, Observability Vector search lets you find documents by meaning rather than exact keywords. The gem wraps Typesense’s vector search capabilities with a Ruby DSL covering schema declarations, query composition, and automatic compiler behaviour.

Configuration

Set a default embedding model once in your initializer. Every embedding declaration that does not specify its own model inherits this value.
SearchEngine.configure do |c|
  # Required for auto-embedding fields
  c.embedding.model = 'ts/all-MiniLM-L12-v2'

  # Optional: API key for remote providers (OpenAI, PaLM, etc.)
  c.embedding.api_key = ENV['OPENAI_API_KEY']

  # Optional: extra model_config merged into every embed block
  c.embedding.model_config = {
    indexing_prefix: 'passage:',
    query_prefix:    'query:'
  }
end
FieldTypeDefaultPurpose
embedding.modelStringnilTypesense model name (e.g. ts/all-MiniLM-L12-v2, openai/text-embedding-3-small)
embedding.api_keyStringnilAPI key for remote embedding providers
embedding.model_configHashnilExtra keys merged into the Typesense embed.model_config block
If no model is set globally or per-field, the embedding macro raises SearchEngine::Errors::ConfigurationError at load time.

Schema: the embedding macro

Declare embedding fields on your model with the embedding macro. It registers a :vector attribute internally and builds the Typesense embed block (or num_dim for external embeddings) at schema compile time.

Basic forms

class SearchEngine::Product < SearchEngine::Base
  collection 'products'

  attribute :name, :string, sort: true
  attribute :description, :string
  attribute :brand_name, :string, facet: true
  attribute :category, :string, facet: true

  # Shortest form: derive field name from class, explicit sources
  # => field "product_embedding", from: [:name, :description]
  embedding from: %i[name description]

  # Named with auto-suffix: single source inferred
  # => field "name_embedding", from: [:name]
  embedding :name

  # Named with explicit sources
  # => field "search_embedding", from: [:name, :description, :category]
  embedding :search, from: %i[name description category]

  # Already-suffixed name: used as-is
  # => field "product_embedding", from: [:name, :description]
  embedding :product_embedding, from: %i[name description]

  # Per-field model override
  embedding :visual, from: %i[name],
    model: 'openai/text-embedding-3-small', num_dim: 512

  # Suppress auto-suffix
  # => field "vectors", from: [:name, :description]
  embedding :vectors, from: %i[name description], suffix: false

  query_by %i[name description brand_name]
end

Unindexed embed sources

If an embed source field is only needed for embedding (not for keyword search, filtering, or sorting), declare it with index: false. The schema compiler detects the from: reference and emits the field with Typesense-native "index": false instead of omitting it entirely. This avoids keyword-indexing overhead and the auto-generated *_blank hidden field.
class SearchEngine::Product < SearchEngine::Base
  collection 'products'

  attribute :name, :string, sort: true
  attribute :brand_name, :string, index: false  # stored for embedding, not keyword-indexed

  embedding from: %i[name brand_name]
end
The compiled schema will include brand_name as { "name": "brand_name", "type": "string", "index": false, "optional": true }. Typesense stores the value on disk and feeds it to the embedding model, but it consumes no in-memory index space.

External embeddings

When your application generates vectors outside Typesense, declare num_dim: without from:. The mapper expects your map block to supply a float array of the declared dimension.
embedding :custom_embedding, num_dim: 768
No embed block is emitted in the schema; Typesense stores the field as float[] with the given dimension constraint.

Name resolution rules

#ConditionResolved field namefrom:
1No name, from: given"#{klass.demodulize.underscore}_embedding"as given
2Name given, suffix: true (default), name does not end with _embedding"#{name}_embedding"as given, or inferred as [name]
3Name given, name already ends with _embeddingas-isas given
4Name given, suffix: falseas-isas given, or inferred as [name]
5No name, no from:, no num_dim:raises ArgumentError
6num_dim: given without from:external embedding, no embed block
Model resolution precedence: per-field model: > config.embedding.model > raises ConfigurationError. from: inference: when from: is omitted and the field is not external, the macro infers from: [bare_name] where bare_name is the positional argument before the _embedding suffix.

Macro signature

embedding(name = nil,
  from: nil,
  suffix: true,
  model: nil,
  api_key: nil,
  num_dim: nil,
  hnsw: nil,
  model_config: nil)
ParamTypePurpose
nameSymbol, String, nilField name (auto-derived when omitted)
from:Array<Symbol>Source attribute names to embed from
suffix:BooleanAppend _embedding to the name (default: true)
model:StringPer-field embedding model override
api_key:StringPer-field API key for remote providers
num_dim:IntegerVector dimensions for external embeddings
hnsw:HashHNSW index tuning ({ ef_construction:, M: })
model_config:HashExtra model_config merged into the embed block
When the query string is not "*" and no explicit query: vector is provided, Typesense auto-embeds the query text with the same model used at index time and performs nearest-neighbor search.
SearchEngine::Product
  .search("comfortable ergonomic chair")
  .vector_search(:product_embedding, k: 200)
The compiler auto-appends product_embedding to query_by so Typesense performs rank fusion between keyword and vector results.

Field auto-resolution

When a model declares exactly one embedding, the field argument can be omitted entirely. The gem resolves it automatically:
# Given a model with a single embedding declaration:
class SearchEngine::Article < SearchEngine::Base
  attribute :title, :string
  attribute :body, :string
  embedding from: %i[title body]
end

# These two calls are equivalent:
SearchEngine::Article.search("climate change").vector_search
SearchEngine::Article.search("climate change").vector_search(:article_embedding)
This makes vector_search act as a behaviour modifier — just chain it to enable semantic/hybrid mode.
Auto-resolution raises InvalidVectorQuery when the model has zero or multiple embeddings. In those cases, pass the field explicitly.
When k: is omitted, Typesense applies its server default (10). You only need to specify k: when you want a different value. Combine keyword and vector results with an explicit alpha weight. Typesense uses rank fusion:
rank_fusion_score = (1 - alpha) * keyword_rank + alpha * vector_rank
SearchEngine::Product
  .search("chair")
  .vector_search(:product_embedding, alpha: 0.8, k: 200)
alphaBehaviour
0.0Pure keyword search
0.5Equal weight
1.0Pure vector search
Start with alpha: 0.7 and tune based on your dataset. Higher alpha favours semantic similarity; lower alpha favours exact keyword matches.

Find similar documents

find_similar is sugar over vector_search with id:. Typesense retrieves the stored embedding for the given document and finds its nearest neighbors.
SearchEngine::Product
  .find_similar("product-42", field: :product_embedding, k: 20)
When the model has a single embedding, field: can be omitted:
SearchEngine::Article.find_similar("article-7", k: 20)
Chain filters to refine results:
SearchEngine::Product
  .find_similar("product-42", field: :product_embedding)
  .where(category: "furniture")
  .where.not(id: "product-42")

Historical queries

Weight multiple past queries and let Typesense blend their embeddings server-side.
SearchEngine::Product
  .vector_search(:product_embedding,
    queries: ["ergonomic keyboard", "standing desk"],
    weights: [0.7, 0.3],
    k: 20)
weights: must sum to approximately 1.0 (tolerance: 0.01). The gem validates this and raises InvalidVectorQuery if violated.

External vector queries

When you generate embeddings in your own pipeline, pass the float array directly.
SearchEngine::Product
  .vector_search(:custom_embedding,
    query: [0.2, 0.4, 0.1, 0.8, ...],
    k: 100)
This bypasses Typesense’s auto-embedding; the provided vector is used as-is for nearest-neighbor search.

Sort by vector distance

Use order(vector_distance: :asc) as a secondary sort to rank keyword results by their semantic proximity.
SearchEngine::Product
  .search("chair")
  .vector_search(:product_embedding)
  .order(vector_distance: :asc)
The compiler resolves vector_distance to the Typesense token _vector_query(product_embedding:([])):asc.
order(vector_distance: ...) requires .vector_search to be chained on the same relation. Using it without vector search raises InvalidVectorQuery.

Distance threshold

Cap results by cosine distance to filter out low-relevance matches.
SearchEngine::Product
  .search("chair")
  .vector_search(:product_embedding, alpha: 0.7, distance_threshold: 0.3)

HNSW tuning

Override HNSW search parameters per query.
SearchEngine::Product
  .search("chair")
  .vector_search(:product_embedding, k: 200, ef: 100)
For small result sets, bypass HNSW with brute-force flat search:
SearchEngine::Product
  .where(category: "shoes")
  .vector_search(:product_embedding, k: 50, flat_search_cutoff: 20)

Auto-exclude behaviour

Embedding fields contain large float arrays (384—1536 dimensions). To avoid inflating response payloads, the compiler automatically adds the embedding field to exclude_fields unless you explicitly select it. To include the raw vectors in the response:
SearchEngine::Product
  .vector_search(:product_embedding, k: 10)
  .select(:product_embedding)
Vector search works inside multi_search blocks:
SearchEngine.multi_search do |m|
  m.add :semantic, SearchEngine::Product
    .search("chair")
    .vector_search(:product_embedding, k: 50)

  m.add :keyword, SearchEngine::Product
    .search("chair")
    .per(10)
end

Compiler mapping

The compiler:
  1. Builds the vector_query string: product_embedding:([], k:200, alpha:0.8)
  2. Auto-appends the embedding field to query_by in hybrid mode (text query + vector search, no explicit query: array)
  3. Auto-adds the embedding field to exclude_fields unless explicitly selected
  4. Resolves order(vector_distance: ...) to the real Typesense sort token

DX & explain

All DX helpers include vector search state. Raw float arrays are redacted to [<N dims>] in all surfaces.
rel = SearchEngine::Product
  .search("chair")
  .vector_search(:product_embedding, alpha: 0.7, k: 100)

rel.explain
# => includes vector search mode, field, k, alpha

rel.to_curl
# => curl command with vector_query param

rel.dry_run!
# => { url:, body: { ..., vector_query: "product_embedding:([], k:100, alpha:0.7)" }, ... }

Method signatures

def vector_search(field = nil, k: nil, alpha: nil, query: nil, id: nil,
                  distance_threshold: nil, queries: nil, weights: nil,
                  ef: nil, flat_search_cutoff: nil)
ParamTypePurpose
fieldSymbol, String, nilEmbedding field name. Auto-resolved when the model has exactly one embedding.
k:IntegerNumber of nearest neighbors (server default: 10)
alpha:Float (0.0—1.0)Hybrid blend weight
query:Array<Numeric>Explicit embedding vector
id:#to_sDocument ID for similarity search
distance_threshold:Float (>= 0)Maximum cosine distance
queries:Array<String>Historical query strings
weights:Array<Numeric>Per-query weights (sum to ~1.0)
ef:IntegerHNSW ef search override
flat_search_cutoff:IntegerBrute-force threshold
Mutually exclusive modes: query:, id:, and queries:. Providing more than one raises InvalidVectorQuery. Last vector_search call wins (Typesense supports one vector_query per search).

find_similar

def find_similar(document_id, field: nil, k: nil, distance_threshold: nil)
Sugar over vector_search(field, id: document_id, ...). When field: is omitted, the sole embedding on the model is used automatically (same resolution as vector_search).

Observability

The compiler emits a search_engine.vector.compile event with:
KeyValue
fieldEmbedding field name
mode:semantic, :hybrid, :similar, :historical, or :external
query_vector_presentWhether an explicit query: array was provided
dimsSize of query: array (nil if auto-embedded)
kRequested nearest neighbors
hybrid_weightAlpha value
ann_params_presentWhether ef or flat_search_cutoff was set
Compact logging redacts raw vector arrays. OTel spans include these attributes when enabled. See Observability for event payloads and log format.