fix: RecordPreparer incorrectly injected __typename into embedded concrete-typed fields by marcdaniels-toast · Pull Request #1109 · block/elasticgraph

marcdaniels-toast · 2026-04-05T14:48:39Z

Summary

RecordPreparer was incorrectly injecting __typename into embedded fields typed as a concrete object type. The bug arose because @types_requiring_typename — a set that includes concrete types whose shared index mapping contains __typename (e.g. Person indexed alongside Company in a union index) — was being used both at the root level (correct) and during embedded field processing (incorrect).

When a type like Person is indexed in a shared union index, it legitimately needs __typename at the root document level so the datastore can distinguish subtypes. However, when Person appears as the type of an embedded field (e.g. Manufacturer.ceo), it is a concrete type and __typename is neither needed nor valid there. Injecting it caused a KeyError in CountAccumulator when traversing the prepared document, since __typename is not present in the embedded object's index mapping properties.

Fix: Split @types_requiring_typename into two sets:

@types_with_typename_required_in_json_schema — types where the JSON schema marks __typename as required (i.e. abstract types that can hold multiple subtypes). Used when deciding whether to preserve __typename on embedded fields during prepare_value_for_indexing.
@types_requiring_typename — a superset that also includes concrete types inheriting a shared index. Used only at the root level in prepare_for_index to inject __typename when absent.

The test schema is also updated to give Person and Company their own dedicated indexes (people and companies), making them types that are both directly indexed and embedded in another indexed type (Manufacturer.ceo). This exercises the previously untested case Myron noted in the issue. Once index inheritance is fully implemented, the index can be moved up to the NamedInventor interface, at which point the acceptance test will also directly exercise the bug fix path.

Test Plan

Unit test (record_preparer_spec.rb): A new example directly demonstrates the bug and fix. It builds a schema where Person shares a union index (requiring __typename at root) and is also embedded as Manufacturer.ceo (where __typename must be absent). The test verifies that preparing a root Person record injects __typename, while preparing a Manufacturer record with an embedded Person ceo does not.
Acceptance test (search_spec.rb): The existing "supports fetching interface fields" test is extended to index a Manufacturer with a ceo: Person and query it end-to-end via named_entities. This confirms the full round-trip works correctly for a type that is both a root indexed type and embedded in another indexed type.

Fixes #1097

…crete-typed fields When a concrete type (e.g. Person) was indexed in a shared union index, it was added to @types_requiring_typename. The old prepare_value_for_indexing used that set for both root and embedded contexts, causing __typename to be injected into embedded fields like Manufacturer.ceo even though no __typename is needed or expected there. Fix by splitting the set into @types_with_typename_required_in_json_schema (used for embedded processing) and @types_requiring_typename (superset, used only at root level in prepare_for_index). Also adds Person and Company as directly indexed types in the test schema, exercising the case where a type is both a root indexed type and embedded in another indexed type. Generated with Claude Code

…company Now that Person and Company have their own indexes, the validation in common_project_specs picks them up and tries to validate them as indexed records -- which they aren't. They're embedded-only factories used for union fields (Widget.inventor) and concrete embedded fields (Manufacturer.ceo). Rename them to :embedded_person/:embedded_company to make the intent clear, and add them to ignored_factories in common_project_specs_spec.rb. Generated with Claude Code

myronmarston · 2026-04-07T16:06:44Z

Rakefile

      # have at most 1 widget, so we use `components.shift` here to ensure that a component isn't re-assigned.
      widget_components = Array.new(rand(3)) { components.shift }.compact
-      FactoryBot.build(:widget, components: widget_components)
+      FactoryBot.build(:widget, components: widget_components, inventor: inventors.sample)


Suggested change

FactoryBot.build(:widget, components: widget_components, inventor: inventors.sample)

FactoryBot.build(:widget, components: widget_components, inventor: Faker::Base.sample(inventors))

Our faker usage in tests is designed to be deterministic:

elasticgraph/spec_support/lib/elastic_graph/spec_support/factories.rb

Lines 26 to 29 in 14cb7cf

# Make faker usage deterministic, based on the `full_description` string of the

# RSpec example. `String#sum` is being used purely for speed, as we do not

# need a particularly even distribution in random number seeds.

Faker::Config.random = Random.new(ex.full_description.sum)

...but using Array#sample (which uses the standard RNG) rather than Faker::Base.sample (which uses the RNG we assign based on the test description) defeats that.

myronmarston · 2026-04-07T16:35:43Z

elasticgraph-indexer/lib/elastic_graph/indexer/record_preparer.rb

+        # Superset of the above: also includes concrete types that inherit an index from an abstract
+        # supertype (e.g. `Person` indexed in the `inventors` index). Used only at the root level
+        # in `prepare_for_index` to inject `__typename` into root documents when needed.
+        @types_requiring_typename = @types_with_typename_required_in_json_schema | types_with_typename_in_index_mapping


Your bug report (and this PR) has helped me realize something: it's fundamentally unsound to model "is __typename required?" at the type level. As you've demonstrated, whether or not __typename is required is not just a property of the type--it's also contextual. A type that is both embedded within another indexed type and indexed on its own may have require __typename in one context but not another. So I'd like to move away from modeling it this way.

I think a better source of truth is the index mapping: if we're at a position that has a __typename field in the mapping, we should populate it. If we're at a position that has no __typename field in the mapping, then we should omit it.

Claude helped me validate this hypthesis. Here's the diff that it came up with:

diff --git a/elasticgraph-indexer/lib/elastic_graph/indexer/operation/update.rb b/elasticgraph-indexer/lib/elastic_graph/indexer/operation/update.rb index 56bcf116..a2503756 100644 --- a/elasticgraph-indexer/lib/elastic_graph/indexer/operation/update.rb +++ b/elasticgraph-indexer/lib/elastic_graph/indexer/operation/update.rb @@ -27,7 +27,11 @@ module ElasticGraph update_target:, destination_index_mapping: ) - prepared_record = record_preparer.prepare_for_index(event["type"], event["record"] || {"id" => event["id"]}) + prepared_record = record_preparer.prepare_for_index( + event["type"], + event["record"] || {"id" => event["id"]}, + mapping_properties: destination_index_mapping.dig("properties") + ) Support::HashUtil .fetch_leaf_values_at_path(prepared_record, update_target.id_source.split(".")) diff --git a/elasticgraph-indexer/lib/elastic_graph/indexer/record_preparer.rb b/elasticgraph-indexer/lib/elastic_graph/indexer/record_preparer.rb index 347b4fc2..749a903f 100644 --- a/elasticgraph-indexer/lib/elastic_graph/indexer/record_preparer.rb +++ b/elasticgraph-indexer/lib/elastic_graph/indexer/record_preparer.rb @@ -21,18 +21,10 @@ module ElasticGraph hash[type_name] = scalar_types_by_name[type_name]&.load_indexing_preparer&.extension_class end # : ::Hash[::String, SchemaArtifacts::RuntimeMetadata::extensionClass?] - # Derive the set of types that have __typename in their index mappings to supplement the set of types that - # explicitly require __typename in the JSON schema - mappings = schema_artifacts.index_mappings_by_index_def_name - types_with_typename_in_index_mapping = schema_artifacts.runtime_metadata.object_types_by_name.filter_map do |type_name, meta| - type_name if meta.index_definition_names.any? { |idx| mappings.dig(idx, "properties", "__typename") } - end.to_set - @preparers_by_json_schema_version = ::Hash.new do |hash, version| hash[version] = RecordPreparer.new( indexing_preparer_by_scalar_type_name, - build_type_metas_from(@schema_artifacts.json_schemas_for(version)), - types_with_typename_in_index_mapping + build_type_metas_from(@schema_artifacts.json_schemas_for(version)) ) end end @@ -58,10 +50,6 @@ module ElasticGraph {} # : ::Hash[::String, untyped] end # : ::Hash[::String, untyped] - required_fields = type_def.fetch("required") do - [] # : ::Array[::String] - end # : ::Array[::String] - eg_meta_by_field_name = properties.filter_map do |prop_name, prop| eg_meta = prop["ElasticGraph"] [prop_name, eg_meta] if eg_meta @@ -69,7 +57,6 @@ module ElasticGraph TypeMetadata.new( name: type, - requires_typename: required_fields.include?("__typename"), eg_meta_by_field_name: eg_meta_by_field_name ) end @@ -84,27 +71,16 @@ module ElasticGraph # still build operations for it, and the operations require a `RecordPreparer`, but we do # not send them to the datastore. module Identity - def self.prepare_for_index(type_name, record) + def self.prepare_for_index(type_name, record, mapping_properties: nil) record end end - def initialize(indexing_preparer_by_scalar_type_name, type_metas, types_with_typename_in_index_mapping) + def initialize(indexing_preparer_by_scalar_type_name, type_metas) @indexing_preparer_by_scalar_type_name = indexing_preparer_by_scalar_type_name @eg_meta_by_field_name_by_concrete_type = type_metas.to_h do |meta| [meta.name, meta.eg_meta_by_field_name] end - - # Types where the JSON schema has `__typename` in `required` -- meaning the field can hold - # multiple subtypes and `__typename` is needed to identify which one. - @types_with_typename_required_in_json_schema = type_metas.filter_map do |meta| - meta.name if meta.requires_typename - end.to_set - - # Superset of the above: also includes concrete types that inherit an index from an abstract - # supertype (e.g. `Person` indexed in the `inventors` index). Used only at the root level - # in `prepare_for_index` to inject `__typename` into root documents when needed. - @types_requiring_typename = @types_with_typename_required_in_json_schema | types_with_typename_in_index_mapping end # Prepares the given payload for being indexed into the named index. @@ -121,18 +97,13 @@ module ElasticGraph # # Note: this method does not mutate the given `record`. Instead it returns a # copy with any updates applied to it. - def prepare_for_index(type_name, record) - result = prepare_value_for_indexing(record, type_name) # : ::Hash[::String, untyped] - # Add __typename if required but absent (e.g. for a mixed-type index). - if result.is_a?(::Hash) && @types_requiring_typename.include?(type_name) && !result.key?("__typename") - result["__typename"] = type_name - end - result + def prepare_for_index(type_name, record, mapping_properties: nil) + prepare_value_for_indexing(record, type_name, mapping_properties) # : ::Hash[::String, untyped] end private - def prepare_value_for_indexing(value, type_name) + def prepare_value_for_indexing(value, type_name, mapping_properties) type_name = type_name.delete_suffix("!") return nil if value.nil? @@ -144,7 +115,7 @@ module ElasticGraph case value when ::Array element_type_name = type_name.delete_prefix("[").delete_suffix("]") - value.map { |v| prepare_value_for_indexing(v, element_type_name) } + value.map { |v| prepare_value_for_indexing(v, element_type_name, mapping_properties) } when ::Hash # `@eg_meta_by_field_name_by_concrete_type` does not have abstract types in it (e.g. type unions). # Instead, it'll have each concrete subtype in it. @@ -153,15 +124,27 @@ module ElasticGraph # what the concrete subtype is. `__typename` is required on abstract types and indicates that. eg_meta_by_field_name = @eg_meta_by_field_name_by_concrete_type.fetch(value["__typename"] || type_name) - value.filter_map do |field_name, field_value| + # Determine whether __typename belongs at this position by checking the index mapping. + typename_in_mapping = mapping_properties&.key?("__typename") + + prepared_fields = value.filter_map do |field_name, field_value| if field_name == "__typename" - # We only want to include __typename if we're dealing with a type that requires it - # (i.e. a type that can represent multiple subtypes, so __typename can differentiate between them). - [field_name, field_value] if @types_with_typename_required_in_json_schema.include?(type_name) + # Only include __typename if the index mapping has it at this position. + [field_name, field_value] if typename_in_mapping elsif (eg_meta = eg_meta_by_field_name[field_name]) - [eg_meta.fetch("nameInIndex"), prepare_value_for_indexing(field_value, eg_meta.fetch("type"))] + name_in_index = eg_meta.fetch("nameInIndex") + nested_mapping_properties = mapping_properties&.dig(name_in_index, "properties") + [name_in_index, prepare_value_for_indexing(field_value, eg_meta.fetch("type"), nested_mapping_properties)] end end.to_h + + # Inject __typename if the mapping requires it but it's absent from the record + # (e.g. for a concrete type indexed in a mixed-type index). + if typename_in_mapping && !value.key?("__typename") + prepared_fields["__typename"] = type_name + end + + prepared_fields else # We won't have a registered preparer for enum types, since those aren't dumped in # runtime metadata `scalar_types_by_name`, and we can just return the value as-is in @@ -173,8 +156,6 @@ module ElasticGraph TypeMetadata = ::Data.define( # The name of the type this metadata object is for. :name, - # Indicates if this type requires a `__typename` field. - :requires_typename, # The per-field ElasticGraph metadata, keyed by field name. :eg_meta_by_field_name )

myronmarston · 2026-04-07T16:38:07Z

elasticgraph-indexer/spec/unit/elastic_graph/indexer/record_preparer_spec.rb

Following up on my comment above about it being better to include __typename based on the index mapping, I suspect there's a scenario which would fail with your implementation and pass with an implementation that consults the index mapping. Can you come up with a test that covers that?

myronmarston · 2026-04-07T16:58:27Z

spec_support/lib/elastic_graph/spec_support/factories/widgets.rb

  end

-  factory :person, parent: :hash_base do
+  factory :embedded_person, parent: :hash_base do


I'd prefer not to have duplicate factory definitions of :person and :company, and force callers to specify :embedded_xyz vs :indexed_xyz.

Instead, I think we can make the :person and :company factories use :indexed_type as their parent, and then we can offer an .as_embedded refinement:

diff --git a/spec_support/lib/elastic_graph/spec_support/factories.rb b/spec_support/lib/elastic_graph/spec_support/factories.rb index 371b0c55..798fae7a 100644 --- a/spec_support/lib/elastic_graph/spec_support/factories.rb +++ b/spec_support/lib/elastic_graph/spec_support/factories.rb @@ -17,6 +17,20 @@ require "faker" require "elastic_graph/indexer/test_support/converters" require "elastic_graph/support/hash_util" +module ElasticGraph + module SpecSupport + module HashAsEmbedded + refine Hash do + # Strips indexed-type-only keys, converting an indexed factory hash + # to one suitable for embedding in another record. + def as_embedded + except(:__version, :__typename, :__json_schema_version) + end + end + end + end +end + # A counter that we increment for the `__version` value on each new factory-generated record. version_counter = 0 diff --git a/spec_support/lib/elastic_graph/spec_support/factories/widgets.rb b/spec_support/lib/elastic_graph/spec_support/factories/widgets.rb index a692d477..91c0af50 100644 --- a/spec_support/lib/elastic_graph/spec_support/factories/widgets.rb +++ b/spec_support/lib/elastic_graph/spec_support/factories/widgets.rb @@ -8,6 +8,8 @@ require "date" +using ElasticGraph::SpecSupport::HashAsEmbedded + # Note: it is *essential* that all factories defined here generate records # deterministically, in order for the request bodies to (and responses from) # the datastore to not change between VCR cassettes being recorded and replayed. @@ -38,27 +40,13 @@ FactoryBot.define do is_draft { false } end - factory :embedded_person, parent: :hash_base do - __typename { "Person" } - id { Faker::Alphanumeric.alpha(number: 20) } - name { Faker::Name.name } - nationality { Faker::Nation.nationality } - end - - factory :indexed_person, parent: :indexed_type do + factory :person, parent: :indexed_type do __typename { "Person" } name { Faker::Name.name } nationality { Faker::Nation.nationality } end - factory :embedded_company, parent: :hash_base do - __typename { "Company" } - id { Faker::Alphanumeric.alpha(number: 20) } - name { Faker::Company.name } - stock_ticker { name[0..3].upcase } - end - - factory :indexed_company, parent: :indexed_type do + factory :company, parent: :indexed_type do __typename { "Company" } name { Faker::Company.name } stock_ticker { name[0..3].upcase } @@ -119,7 +107,7 @@ FactoryBot.define do components.map { |c| c.fetch(:id) } end - inventor { build Faker::Base.sample([:embedded_person, :embedded_company]) } + inventor { build(Faker::Base.sample([:person, :company])).as_embedded } named_inventor { inventor } weight_in_ng { Faker::Number.between(from: 2**51, to: (2**53) - 1) } weight_in_ng_str { Faker::Number.between(from: 2**60, to: 2**61) } @@ -154,7 +142,7 @@ FactoryBot.define do __typename { "Manufacturer" } name { Faker::Company.name } created_at { Faker::Time.between(from: recent_date - 30, to: recent_date).utc.iso8601 } - ceo { build(:embedded_person) } + ceo { build(:person).as_embedded } end factory :address, parent: :indexed_type do

marcdaniels-toast requested review from BrianSigafoos-SQ, ayousufi, bsorbo, jwils, jwondrusch and myronmarston as code owners April 5, 2026 14:48

myronmarston requested changes Apr 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: RecordPreparer incorrectly injected __typename into embedded concrete-typed fields#1109

fix: RecordPreparer incorrectly injected __typename into embedded concrete-typed fields#1109
marcdaniels-toast wants to merge 2 commits intoblock:mainfrom
marcdaniels-toast:fix/record-preparer-typename-injection

marcdaniels-toast commented Apr 5, 2026

Uh oh!

myronmarston Apr 7, 2026

Uh oh!

myronmarston Apr 7, 2026

Uh oh!

myronmarston Apr 7, 2026

Uh oh!

myronmarston Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	FactoryBot.build(:widget, components: widget_components, inventor: inventors.sample)
	FactoryBot.build(:widget, components: widget_components, inventor: Faker::Base.sample(inventors))

	# Make faker usage deterministic, based on the `full_description` string of the
	# RSpec example. `String#sum` is being used purely for speed, as we do not
	# need a particularly even distribution in random number seeds.
	Faker::Config.random = Random.new(ex.full_description.sum)

Conversation

marcdaniels-toast commented Apr 5, 2026

Summary

Test Plan

Uh oh!

myronmarston Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

myronmarston Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

myronmarston Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

myronmarston Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants