- Audience: operators and engineers running indexing in production
- Scope: engine APIs and CLI tasks; network‑safe, redaction‑aware
Overview
Blue/green keeps downtime minimal and rollbacks safe:- Apply schema changes to a new physical collection (versioned name)
- Rebuild data (full or partitioned) into that new physical
- Validate correctness and performance
- Atomically swap the alias to promote the new physical
- Retain prior physicals for a short window; drop beyond retention
If you create a physical, import into it, and call
Client#upsert_alias directly, retention cleanup will NOT run. Old physicals remain until explicitly deleted. Use Schema.apply! (or search_engine:schema:apply[…]) to run the full blue/green lifecycle including retention.Schema lifecycle: diff → apply → validate → swap → rollback
- Physical naming: “<logical>YYYYMMDD_HHMMSS###” (timestamp + sequence). Logical name is stable and exposed via alias (e.g.,
products). - Diff compares compiled vs live (aliased physical): fields, types, and selected options. Intentionally immutable aspects (e.g., field type narrowing) are flagged for safe migration.
- Apply creates a new physical collection; reindex populates it; failures leave the alias untouched for inspection.
- Validation before swap: sample queries, mapping sanity, counts/order checks on
key paths. Prefer
dry_run!and the offline client (or stub client) for exploratory checks. - Alias swap: traffic moves atomically to the new physical via an upsert; old physicals remain per retention.
- Rollback: re‑point alias to prior retained physical if available; consider documents indexed after the swap window.
- Observability hooks (predictive):
search_engine.schema.diff,search_engine.schema.apply,search_engine.alias.swap(swap often occurs withinschema.apply). See Observability.
Indexer overview: sources, Mapper DSL, partitions, hooks, dispatcher
- Sources: feed batched rows
- ActiveRecord (ORM) for in‑app queries and scopes
- SQL for tuned reads and keyset iteration
- Lambda/stream for external APIs and services
- Mapper DSL: transforms rows to documents (field renames, coercions, joins lookups, defaults), validates required fields, and emits mapping metrics.
- Partitions: choose a stable key (e.g., shop, shard,
updated_atbucket). Full rebuild runs once over all data. Partitioned rebuild reduces risk and isolates failures. - Hooks:
before_partition/after_partitionfor enrichment, throttling, metrics; must be idempotent. - Dispatcher: fans out partitions inline or via background jobs, with back‑pressure and retries; batch size and timeouts are tunable.
- Events (predictive):
search_engine.indexer.enqueue,search_engine.indexer.batch_import,search_engine.indexer.retry,search_engine.indexer.complete. Also see existing eventsindexer.partition_start,indexer.partition_finish,indexer.batch_import.
CLI tasks & expected outcomes
Use these tasks to drive the lifecycle. They respect configured timeouts, batch sizes, redaction, and dispatch mode. See CLI.-
- What: create new physical → reindex → alias swap → retention cleanup
- Preconditions: collection registered; API key valid; network reachable
- Expected: summary with created/updated fields, new/previous physical, retention results
-
- What: full rebuild or partition fan‑out via dispatcher; batches streamed with retries
- Preconditions: schema exists and alias resolves; sources accessible
- Expected: summary of batches/partitions processed; retry counts, durations
- Tip: model API supports optional dependency preflight
SearchEngine::Book.index_collection(pre: :ensure)(ensure presence only)SearchEngine::Book.index_collection(pre: :index)(ensure + fix drift)
-
- What: delete documents not in the current source‑of‑truth snapshot via your
stalerules (declared insideindexand OR‑merged into afilter_by) - Preconditions: at least one
stalerule resolves to a non‑empty filter; strict mode or dry‑run configured as needed - Expected: summary of candidates and deletions (or preview in dry‑run)
- What: delete documents not in the current source‑of‑truth snapshot via your
Apply flow
Partitioned rebuild flow
Retention & delete‑stale
- Retention: keep a limited number of recent physicals (configurable globally and per collection) for debug/rollback. Old physicals beyond the window are dropped; the alias target is never dropped. See Schema.
- Delete‑stale: identify documents that should no longer exist based on a source‑of‑truth snapshot. Declare
stalerules that compile into afilter_bystring (e.g., archived flag, partition + archived, date thresholds). Strict mode prevents catch‑alls. - Guardrails: dry‑run preview, maximum delete thresholds, and sampling pre‑checks before destructive steps. Logs are redacted and include a short filter hash.
Safety thresholds
Configure strict mode and optional estimation to avoid large, accidental purges. Exits with non‑success when thresholds are violated under strict settings.Safety, performance & tuning
- Idempotency: imports and hooks must tolerate retries and duplicate batches.
- Retries & back‑pressure: transient failures back off; 413 splits batches; timeouts are configurable. Avoid hot partitions by balancing partition keys.
- Tuning knobs: batch size, concurrency, dispatch mode, and request timeouts.
- Query behavior: schema changes can alter ranking/filtering; coordinate with presets and curation for consistent results.
Debugging & DX
- Use
dry_run!on representative relations to validate mapping and query readiness against the new schema without I/O. - Use
explainto surface selection, grouping, joins, and potential conflicts; useto_curlto reproduce requests (redacted). - Doctor task:
search_engine:doctorvalidates environment and connectivity.
Troubleshooting
- Insufficient privileges for schema tasks → see Schema permissions
- Alias not found or mispointed → see Alias swap
- Indexer job failure (batch too large, timeout) → see Indexer and CLI
- Delete‑stale threshold exceeded → see Safety thresholds
Schema permissions
Ensure the API key can create collections, update aliases, and delete collections. Validate withsearch_engine:doctor; confirm host/port/protocol and timeouts. For CI, inject keys via ENV. See CLI and Troubleshooting.
Related links: Schema, Indexer, CLI, Observability, DX, Troubleshooting, Deferred Typesense Features