Schema

Related: CLI, Troubleshooting → Schema The schema layer turns a model class (our DSL) into a Typesense-compatible schema hash and compares it to the live, currently aliased physical collection to surface drift.

API

SearchEngine::Schema.compile(klass) → returns a Typesense-compatible schema hash built from the DSL. Pure and deterministic (no network I/O).
SearchEngine::Schema.diff(klass) → resolves alias → physical, fetches the live schema, and returns a structured diff plus a compact human summary.
SearchEngine::Schema.update!(klass) → attempts an in-place schema patch (Typesense PATCH /collections/:name) when the diff only contains field additions/drops; returns true when the schema is already in sync or successfully patched.
SearchEngine::Schema.apply!(klass, force_rebuild: false) → blue/green lifecycle (create new physical, reindex, swap alias, retention). By default it first tries update! and only falls back to blue/green when incompatible changes are detected. Returns { logical, new_physical, previous_physical, alias_target, dropped_physicals, action: :update|:rebuild }.
SearchEngine::Schema.rollback(klass) → swap alias back to previous retained physical; returns { logical, new_target, previous_target }.

Both methods are documented with YARD. Keys are returned as symbols; empty/nil values are omitted. The returned schema is deeply frozen.

In-place schema updates (Typesense v29+)

SearchEngine::Schema.update!(klass, client: …) inspects the live diff and issues a PATCH /collections/:name when the changes are limited to field additions or drops. Type changes, reference changes, or collection-level option differences automatically return false, signalling that a full blue/green rebuild is required.
klass.update_collection! (available on every SearchEngine::Base subclass) is a convenience wrapper that logs console guidance and delegates to Schema.update!.
Schema.apply!(force_rebuild: false) now attempts an in-place update first. Pass force_rebuild: true when you explicitly need to skip PATCH (for example, when you want a new physical even if only field additions are pending).

CLI tasks such as bin/rails search_engine:schema:apply[Collection] inherit the same behavior because they call Schema.apply! under the hood.

Type mapping (DSL → Typesense)

:string → string
:integer → int64 (chosen consistently for wider range)
:float / :decimal → float
:boolean → bool
:time / :datetime → int64 (epoch seconds)
:time_string / :datetime_string → string (ISO8601 timestamps)
Arrays like [:string] → string[]
:auto (regex-style field names such as ".*_facet") → auto; enables Typesense auto schema detection and wildcard ingestion. The DSL enforces that :auto can only be used when the attribute name looks like a regex (contains metacharacters such as *, ., etc.).

Array empty filtering (hidden fields)

When declaring an array attribute, you can enable automatic empty filtering by adding empty_filtering: true:

attribute :promotion_ids, [:string], empty_filtering: true

Behavior:

Schema includes a hidden boolean field promotion_ids_empty.
The mapper auto-populates it per document as: promotion_ids.nil? || promotion_ids.empty?.
Hidden fields are not exposed via public APIs or inspect; they are internal.

Constraints:

empty_filtering is only valid for array types (e.g., [:string]); setting it on scalars raises an error.

Query rewrite:

.where(promotion_ids: []) → promotion_ids_empty:=true
.where.not(promotion_ids: []) → promotion_ids_empty:=false

Joins:

For joined filters like .joins(:brand).where(brand: { promotion_ids: [] }), the rewrite applies only if the joined collection has attribute :promotion_ids, [:string], empty_filtering: true (hidden $brand.promotion_ids_empty exists). Otherwise an empty array remains invalid.

System field: `doc_updated_at`

Always present on every collection. Cannot be disabled—the gem automatically injects this field during document creation/upsert.
Stored in Typesense as int64 (epoch seconds). If declared in the model DSL, its type will be coerced to int64 at compile time to ensure consistency.
On hydration and console output, it is converted to a Time in the current timezone (uses Time.zone when available, falling back to Time).
When using instance attributes, :doc_updated_at is returned as a Time object. Unknown fields remain available under :unknown_attributes.
Typesense limitation: This field is required by Typesense for internal tracking. The gem enforces its presence to maintain compatibility.

Collection options

If declared in the DSL in the future, the builder may include top-level options like default_sorting_field, token_separators, symbols_to_index. Today, these are omitted to avoid noisy diffs.

Nested fields (auto-enabled)

When any attribute is declared with type :object or [:object], the schema compiler will automatically set enable_nested_fields: true at the collection level.
This is required by Typesense to accept object / object[] field types; otherwise the server responds with 400 RequestMalformed.
The option is included in Schema.apply! create payloads and appears under collection_options in Schema.diff.
If you don’t need nested objects, consider flattening fields or storing JSON as a :string.

Declaring nested subfields

Declare subfields inline via the nested: option on the base attribute:

attribute :retail_prices, [:object], nested: {
  current_price: :float,
  general_price: :float,
  current_discount_percent: :float,
  current_minimum_quantity: :integer,
  price_type: :string
}

Multiplicity rule:

Base :object → subfields are scalars (float, int64, string).
Base [:object] → subfields are arrays (float[], int64[], string[]).

See also: Typesense docs on enable_nested_fields in collections.create (typesense.org).

Diff shape

{
  collection: { name: String, physical: String },
  added_fields: [ { name: String, type: String }, ... ],
  removed_fields: [ { name: String, type: String }, ... ],
  changed_fields: { "field" => { "type" => [compiled, live] } },
  collection_options: { /* option => [compiled, live] */ }
}

Field comparison is name-keyed and order-insensitive.
Only changed keys appear under changed_fields.
When the live collection is missing, added_fields contain all compiled fields and collection_options includes live: :missing.

Pretty print

The human summary includes:

Header: logical and physical names
+ Added fields: name:type
- Removed fields: name:type
~ Changed fields: field.attr compiled→live
~ Collection options: shown only when differing

Example (no changes):

Collection: products
No changes

Lifecycle (Blue/Green with retention)

Physical name format: “#YYYYMMDD_HHMMSS###” (3-digit zero-padded sequence).
Alias equals the logical name (e.g., products). Swap is performed via a single upsert call, which the server handles atomically.
Idempotent: if alias already points to the new physical, swap is a no-op.
Reindexing is required. Provide a block to apply! or implement klass.reindex_all_to(physical_name) to perform bulk import. On failure, no alias swap occurs and the new physical remains for inspection.

Creating a physical collection manually, importing into it, and calling Client#upsert_alias directly will NOT trigger retention cleanup. Old physical collections remain until removed explicitly. Retention cleanup only runs as part of Schema.apply! (and the search_engine:schema:apply[…] task), after a successful alias swap.

Retention

Global default: keep none.

SearchEngine.configure { |c| c.schema.retention.keep_last = 0 }

Per-collection override:

class SearchEngine::Book < SearchEngine::Base
  schema_retention keep_last: 2
end

After a successful swap, older physicals that match the naming pattern and are not the alias target are ordered by embedded timestamp (desc). Everything beyond the first keep_last is deleted. The alias target is never deleted. Typical operational pattern:

Run schema:apply (or Schema.apply!) to create a new physical, import, swap alias, then drop old physicals per retention.
Avoid manual create/import/alias routines in production unless you also implement a cleanup step; otherwise, old physicals will accumulate.

Rollback

SearchEngine::Schema.rollback(klass) will swap the alias back to the most recent retained physical (behind the current). If no previous physical exists, it raises an error (e.g., when keep_last is 0). No collections are deleted during rollback. See also: Client, Configuration, and Compiler.

Troubleshooting

Reindex step missing: Provide a block to apply! or implement klass.reindex_all_to(name).
Retention errors: Ensure keep_last is set appropriately; rollback requires a previous retained physical.

Backlinks: README, Indexer

Overview

Guidebook

Guides

API

CLI & DX

Operations & Testing

Community

API

In-place schema updates (Typesense v29+)

Type mapping (DSL → Typesense)

Array empty filtering (hidden fields)

System field: `doc_updated_at`

Collection options

Nested fields (auto-enabled)

Declaring nested subfields

Diff shape

Pretty print

Lifecycle (Blue/Green with retention)

Retention

Rollback

Troubleshooting

Overview

Guidebook

Guides

API

CLI & DX

Operations & Testing

Community

​API

​In-place schema updates (Typesense v29+)

​Type mapping (DSL → Typesense)

​Array empty filtering (hidden fields)

​System field: doc_updated_at

​Collection options

​Nested fields (auto-enabled)

​Declaring nested subfields

​Diff shape

​Pretty print

​Lifecycle (Blue/Green with retention)

​Retention

​Rollback

​Troubleshooting

API

In-place schema updates (Typesense v29+)

Type mapping (DSL → Typesense)

Array empty filtering (hidden fields)

System field: `doc_updated_at`

Collection options

Nested fields (auto-enabled)

Declaring nested subfields

Diff shape

Pretty print

Lifecycle (Blue/Green with retention)

Retention

Rollback

Troubleshooting