Related: Models · Indexer · Deletion
Upserting documents
Use the model-level helpers on SearchEngine::Base to insert or replace documents without hand-rolling JSONL payloads or touching the indexer. These helpers reuse the same schema and mapper validations as .create, ensuring that documents remain consistent with the compiled Typesense schema.
| Helper | Purpose | Returns |
|---|
SearchEngine::Book.upsert(record: …) | Map a single source record and upsert it | Integer (0 or 1) |
SearchEngine::Book.upsert(data: …) | Validate pre-mapped data and upsert it | Integer (0 or 1) |
SearchEngine::Book.upsert_bulk(records: …) | Map many source records and import in one JSONL batch | Hash summary |
SearchEngine::Book.upsert_bulk(data: …) | Validate an array of mapped hashes and import once | Hash summary |
Single-document flow
book = Book.find(42)
SearchEngine::Book.upsert(record: book) # mapper runs
mapped = SearchEngine::Book.mapped_data_for(book)
SearchEngine::Book.upsert(data: mapped) # assume already mapped
- Provide either
record: or data: (passing both raises InvalidParams).
- The helper ensures an
id is present by running the model’s identify_by strategy when necessary.
doc_updated_at and hidden flags (*_empty, *_blank) are always refreshed before import.
- Returns
1 when the Typesense import endpoint reports success, 0 when no document was sent.
Bulk flow
records = Book.where(publisher_id: publisher.id).limit(100)
summary = SearchEngine::Book.upsert_bulk(records: records)
# => {
# collection: "books",
# docs_count: 100,
# success_count: 100,
# failure_count: 0,
# bytes_sent: 12_480,
# response: "..." # raw Typesense response (string or array)
# }
payloads = books.map { |b| SearchEngine::Book.mapped_data_for(b) }
SearchEngine::Book.upsert_bulk(data: payloads)
Bulk helpers stream a single JSONL payload through SearchEngine::Client#import_documents with action: :upsert. Input can be any enumerable—Arrays, batch enumerators, or ActiveRecord scopes. Internally the helper normalizes the enumerable, validates each document against the schema, and computes doc_updated_at before encoding.
The returned summary mirrors the indexer’s import statistics:
collection – physical collection chosen via alias resolution (respects into: and partition: overrides when provided).
docs_count – number of documents encoded into the JSONL payload.
success_count / failure_count – counts reported by the Typesense client (currently assumed to be equal to docs_count when no per-document errors are surfaced).
bytes_sent – size of the JSONL payload in bytes.
response – raw response object from Typesense (string of status lines or an array of hashes).
Wrap bulk imports in background jobs or maintenance tasks when you need to backfill a handful of documents without triggering a full indexer run.
Validation & mapping
- Records are mapped via the compiled mapper (
SearchEngine::Mapper.for(klass)) so the same schema rules as full indexation apply (required fields, coercions, hidden flags).
- Pre-mapped data must be provided as Hashes with string or symbol keys; any other shape raises
InvalidParams.
- When both
records: and data: are omitted, an InvalidParams error is raised.
- Invalid document shapes (missing required fields, unknown keys when strict mode is on, invalid types) raise the same
SearchEngine::Errors::InvalidParams or InvalidField exceptions you see during indexing.
Options
Both single and bulk helpers accept the same optional keywords:
into: – override the physical collection (defaults to alias resolution or configured partition resolver).
partition: – forward partition metadata to the resolver (pairs nicely with blue/green apply flows).
When to choose upsert helpers
- Small batches: perfect for synchronising a handful of documents during admin workflows, background jobs, or diagnostics.
- Ad-hoc replays: retry a small set of failed records without replaying the entire indexer.
- Hybrid flows: combine with
.mapped_data_for when your application already builds the mapped document (e.g., for audit snapshots).
For mapping source data without upserting to Typesense, use .from(data, mode: :hash) to get mapped hashes, or .from(data, mode: :instance) to get hydrated model instances. See Models → Mapping source data for details.
Comparison: Create vs Upsert
| Feature | .create | .upsert / .upsert_bulk |
|---|
| API call | /documents (single) | /documents/import?action=upsert |
| Document count | 1 at a time | 1 (single helper) or many (bulk) |
| Mapper usage | Optional (hash accepted) | Mandatory for record: inputs; always validates |
doc_updated_at | auto-set | auto-set |
| Return value | Hydrated model instance | Status counts / raw response |
Mermaid overview