fix: RecordPreparer incorrectly injected __typename into embedded concrete-typed fields#1109
Conversation
…crete-typed fields When a concrete type (e.g. Person) was indexed in a shared union index, it was added to @types_requiring_typename. The old prepare_value_for_indexing used that set for both root and embedded contexts, causing __typename to be injected into embedded fields like Manufacturer.ceo even though no __typename is needed or expected there. Fix by splitting the set into @types_with_typename_required_in_json_schema (used for embedded processing) and @types_requiring_typename (superset, used only at root level in prepare_for_index). Also adds Person and Company as directly indexed types in the test schema, exercising the case where a type is both a root indexed type and embedded in another indexed type. Generated with Claude Code
…company Now that Person and Company have their own indexes, the validation in common_project_specs picks them up and tries to validate them as indexed records -- which they aren't. They're embedded-only factories used for union fields (Widget.inventor) and concrete embedded fields (Manufacturer.ceo). Rename them to :embedded_person/:embedded_company to make the intent clear, and add them to ignored_factories in common_project_specs_spec.rb. Generated with Claude Code
| # have at most 1 widget, so we use `components.shift` here to ensure that a component isn't re-assigned. | ||
| widget_components = Array.new(rand(3)) { components.shift }.compact | ||
| FactoryBot.build(:widget, components: widget_components) | ||
| FactoryBot.build(:widget, components: widget_components, inventor: inventors.sample) |
There was a problem hiding this comment.
| FactoryBot.build(:widget, components: widget_components, inventor: inventors.sample) | |
| FactoryBot.build(:widget, components: widget_components, inventor: Faker::Base.sample(inventors)) |
Our faker usage in tests is designed to be deterministic:
elasticgraph/spec_support/lib/elastic_graph/spec_support/factories.rb
Lines 26 to 29 in 14cb7cf
...but using Array#sample (which uses the standard RNG) rather than Faker::Base.sample (which uses the RNG we assign based on the test description) defeats that.
| # Superset of the above: also includes concrete types that inherit an index from an abstract | ||
| # supertype (e.g. `Person` indexed in the `inventors` index). Used only at the root level | ||
| # in `prepare_for_index` to inject `__typename` into root documents when needed. | ||
| @types_requiring_typename = @types_with_typename_required_in_json_schema | types_with_typename_in_index_mapping |
There was a problem hiding this comment.
Your bug report (and this PR) has helped me realize something: it's fundamentally unsound to model "is __typename required?" at the type level. As you've demonstrated, whether or not __typename is required is not just a property of the type--it's also contextual. A type that is both embedded within another indexed type and indexed on its own may have require __typename in one context but not another. So I'd like to move away from modeling it this way.
I think a better source of truth is the index mapping: if we're at a position that has a __typename field in the mapping, we should populate it. If we're at a position that has no __typename field in the mapping, then we should omit it.
Claude helped me validate this hypthesis. Here's the diff that it came up with:
diff --git a/elasticgraph-indexer/lib/elastic_graph/indexer/operation/update.rb b/elasticgraph-indexer/lib/elastic_graph/indexer/operation/update.rb
index 56bcf116..a2503756 100644
--- a/elasticgraph-indexer/lib/elastic_graph/indexer/operation/update.rb
+++ b/elasticgraph-indexer/lib/elastic_graph/indexer/operation/update.rb
@@ -27,7 +27,11 @@ module ElasticGraph
update_target:,
destination_index_mapping:
)
- prepared_record = record_preparer.prepare_for_index(event["type"], event["record"] || {"id" => event["id"]})
+ prepared_record = record_preparer.prepare_for_index(
+ event["type"],
+ event["record"] || {"id" => event["id"]},
+ mapping_properties: destination_index_mapping.dig("properties")
+ )
Support::HashUtil
.fetch_leaf_values_at_path(prepared_record, update_target.id_source.split("."))
diff --git a/elasticgraph-indexer/lib/elastic_graph/indexer/record_preparer.rb b/elasticgraph-indexer/lib/elastic_graph/indexer/record_preparer.rb
index 347b4fc2..749a903f 100644
--- a/elasticgraph-indexer/lib/elastic_graph/indexer/record_preparer.rb
+++ b/elasticgraph-indexer/lib/elastic_graph/indexer/record_preparer.rb
@@ -21,18 +21,10 @@ module ElasticGraph
hash[type_name] = scalar_types_by_name[type_name]&.load_indexing_preparer&.extension_class
end # : ::Hash[::String, SchemaArtifacts::RuntimeMetadata::extensionClass?]
- # Derive the set of types that have __typename in their index mappings to supplement the set of types that
- # explicitly require __typename in the JSON schema
- mappings = schema_artifacts.index_mappings_by_index_def_name
- types_with_typename_in_index_mapping = schema_artifacts.runtime_metadata.object_types_by_name.filter_map do |type_name, meta|
- type_name if meta.index_definition_names.any? { |idx| mappings.dig(idx, "properties", "__typename") }
- end.to_set
-
@preparers_by_json_schema_version = ::Hash.new do |hash, version|
hash[version] = RecordPreparer.new(
indexing_preparer_by_scalar_type_name,
- build_type_metas_from(@schema_artifacts.json_schemas_for(version)),
- types_with_typename_in_index_mapping
+ build_type_metas_from(@schema_artifacts.json_schemas_for(version))
)
end
end
@@ -58,10 +50,6 @@ module ElasticGraph
{} # : ::Hash[::String, untyped]
end # : ::Hash[::String, untyped]
- required_fields = type_def.fetch("required") do
- [] # : ::Array[::String]
- end # : ::Array[::String]
-
eg_meta_by_field_name = properties.filter_map do |prop_name, prop|
eg_meta = prop["ElasticGraph"]
[prop_name, eg_meta] if eg_meta
@@ -69,7 +57,6 @@ module ElasticGraph
TypeMetadata.new(
name: type,
- requires_typename: required_fields.include?("__typename"),
eg_meta_by_field_name: eg_meta_by_field_name
)
end
@@ -84,27 +71,16 @@ module ElasticGraph
# still build operations for it, and the operations require a `RecordPreparer`, but we do
# not send them to the datastore.
module Identity
- def self.prepare_for_index(type_name, record)
+ def self.prepare_for_index(type_name, record, mapping_properties: nil)
record
end
end
- def initialize(indexing_preparer_by_scalar_type_name, type_metas, types_with_typename_in_index_mapping)
+ def initialize(indexing_preparer_by_scalar_type_name, type_metas)
@indexing_preparer_by_scalar_type_name = indexing_preparer_by_scalar_type_name
@eg_meta_by_field_name_by_concrete_type = type_metas.to_h do |meta|
[meta.name, meta.eg_meta_by_field_name]
end
-
- # Types where the JSON schema has `__typename` in `required` -- meaning the field can hold
- # multiple subtypes and `__typename` is needed to identify which one.
- @types_with_typename_required_in_json_schema = type_metas.filter_map do |meta|
- meta.name if meta.requires_typename
- end.to_set
-
- # Superset of the above: also includes concrete types that inherit an index from an abstract
- # supertype (e.g. `Person` indexed in the `inventors` index). Used only at the root level
- # in `prepare_for_index` to inject `__typename` into root documents when needed.
- @types_requiring_typename = @types_with_typename_required_in_json_schema | types_with_typename_in_index_mapping
end
# Prepares the given payload for being indexed into the named index.
@@ -121,18 +97,13 @@ module ElasticGraph
#
# Note: this method does not mutate the given `record`. Instead it returns a
# copy with any updates applied to it.
- def prepare_for_index(type_name, record)
- result = prepare_value_for_indexing(record, type_name) # : ::Hash[::String, untyped]
- # Add __typename if required but absent (e.g. for a mixed-type index).
- if result.is_a?(::Hash) && @types_requiring_typename.include?(type_name) && !result.key?("__typename")
- result["__typename"] = type_name
- end
- result
+ def prepare_for_index(type_name, record, mapping_properties: nil)
+ prepare_value_for_indexing(record, type_name, mapping_properties) # : ::Hash[::String, untyped]
end
private
- def prepare_value_for_indexing(value, type_name)
+ def prepare_value_for_indexing(value, type_name, mapping_properties)
type_name = type_name.delete_suffix("!")
return nil if value.nil?
@@ -144,7 +115,7 @@ module ElasticGraph
case value
when ::Array
element_type_name = type_name.delete_prefix("[").delete_suffix("]")
- value.map { |v| prepare_value_for_indexing(v, element_type_name) }
+ value.map { |v| prepare_value_for_indexing(v, element_type_name, mapping_properties) }
when ::Hash
# `@eg_meta_by_field_name_by_concrete_type` does not have abstract types in it (e.g. type unions).
# Instead, it'll have each concrete subtype in it.
@@ -153,15 +124,27 @@ module ElasticGraph
# what the concrete subtype is. `__typename` is required on abstract types and indicates that.
eg_meta_by_field_name = @eg_meta_by_field_name_by_concrete_type.fetch(value["__typename"] || type_name)
- value.filter_map do |field_name, field_value|
+ # Determine whether __typename belongs at this position by checking the index mapping.
+ typename_in_mapping = mapping_properties&.key?("__typename")
+
+ prepared_fields = value.filter_map do |field_name, field_value|
if field_name == "__typename"
- # We only want to include __typename if we're dealing with a type that requires it
- # (i.e. a type that can represent multiple subtypes, so __typename can differentiate between them).
- [field_name, field_value] if @types_with_typename_required_in_json_schema.include?(type_name)
+ # Only include __typename if the index mapping has it at this position.
+ [field_name, field_value] if typename_in_mapping
elsif (eg_meta = eg_meta_by_field_name[field_name])
- [eg_meta.fetch("nameInIndex"), prepare_value_for_indexing(field_value, eg_meta.fetch("type"))]
+ name_in_index = eg_meta.fetch("nameInIndex")
+ nested_mapping_properties = mapping_properties&.dig(name_in_index, "properties")
+ [name_in_index, prepare_value_for_indexing(field_value, eg_meta.fetch("type"), nested_mapping_properties)]
end
end.to_h
+
+ # Inject __typename if the mapping requires it but it's absent from the record
+ # (e.g. for a concrete type indexed in a mixed-type index).
+ if typename_in_mapping && !value.key?("__typename")
+ prepared_fields["__typename"] = type_name
+ end
+
+ prepared_fields
else
# We won't have a registered preparer for enum types, since those aren't dumped in
# runtime metadata `scalar_types_by_name`, and we can just return the value as-is in
@@ -173,8 +156,6 @@ module ElasticGraph
TypeMetadata = ::Data.define(
# The name of the type this metadata object is for.
:name,
- # Indicates if this type requires a `__typename` field.
- :requires_typename,
# The per-field ElasticGraph metadata, keyed by field name.
:eg_meta_by_field_name
)There was a problem hiding this comment.
Following up on my comment above about it being better to include __typename based on the index mapping, I suspect there's a scenario which would fail with your implementation and pass with an implementation that consults the index mapping. Can you come up with a test that covers that?
| end | ||
|
|
||
| factory :person, parent: :hash_base do | ||
| factory :embedded_person, parent: :hash_base do |
There was a problem hiding this comment.
I'd prefer not to have duplicate factory definitions of :person and :company, and force callers to specify :embedded_xyz vs :indexed_xyz.
Instead, I think we can make the :person and :company factories use :indexed_type as their parent, and then we can offer an .as_embedded refinement:
diff --git a/spec_support/lib/elastic_graph/spec_support/factories.rb b/spec_support/lib/elastic_graph/spec_support/factories.rb
index 371b0c55..798fae7a 100644
--- a/spec_support/lib/elastic_graph/spec_support/factories.rb
+++ b/spec_support/lib/elastic_graph/spec_support/factories.rb
@@ -17,6 +17,20 @@ require "faker"
require "elastic_graph/indexer/test_support/converters"
require "elastic_graph/support/hash_util"
+module ElasticGraph
+ module SpecSupport
+ module HashAsEmbedded
+ refine Hash do
+ # Strips indexed-type-only keys, converting an indexed factory hash
+ # to one suitable for embedding in another record.
+ def as_embedded
+ except(:__version, :__typename, :__json_schema_version)
+ end
+ end
+ end
+ end
+end
+
# A counter that we increment for the `__version` value on each new factory-generated record.
version_counter = 0
diff --git a/spec_support/lib/elastic_graph/spec_support/factories/widgets.rb b/spec_support/lib/elastic_graph/spec_support/factories/widgets.rb
index a692d477..91c0af50 100644
--- a/spec_support/lib/elastic_graph/spec_support/factories/widgets.rb
+++ b/spec_support/lib/elastic_graph/spec_support/factories/widgets.rb
@@ -8,6 +8,8 @@
require "date"
+using ElasticGraph::SpecSupport::HashAsEmbedded
+
# Note: it is *essential* that all factories defined here generate records
# deterministically, in order for the request bodies to (and responses from)
# the datastore to not change between VCR cassettes being recorded and replayed.
@@ -38,27 +40,13 @@ FactoryBot.define do
is_draft { false }
end
- factory :embedded_person, parent: :hash_base do
- __typename { "Person" }
- id { Faker::Alphanumeric.alpha(number: 20) }
- name { Faker::Name.name }
- nationality { Faker::Nation.nationality }
- end
-
- factory :indexed_person, parent: :indexed_type do
+ factory :person, parent: :indexed_type do
__typename { "Person" }
name { Faker::Name.name }
nationality { Faker::Nation.nationality }
end
- factory :embedded_company, parent: :hash_base do
- __typename { "Company" }
- id { Faker::Alphanumeric.alpha(number: 20) }
- name { Faker::Company.name }
- stock_ticker { name[0..3].upcase }
- end
-
- factory :indexed_company, parent: :indexed_type do
+ factory :company, parent: :indexed_type do
__typename { "Company" }
name { Faker::Company.name }
stock_ticker { name[0..3].upcase }
@@ -119,7 +107,7 @@ FactoryBot.define do
components.map { |c| c.fetch(:id) }
end
- inventor { build Faker::Base.sample([:embedded_person, :embedded_company]) }
+ inventor { build(Faker::Base.sample([:person, :company])).as_embedded }
named_inventor { inventor }
weight_in_ng { Faker::Number.between(from: 2**51, to: (2**53) - 1) }
weight_in_ng_str { Faker::Number.between(from: 2**60, to: 2**61) }
@@ -154,7 +142,7 @@ FactoryBot.define do
__typename { "Manufacturer" }
name { Faker::Company.name }
created_at { Faker::Time.between(from: recent_date - 30, to: recent_date).utc.iso8601 }
- ceo { build(:embedded_person) }
+ ceo { build(:person).as_embedded }
end
factory :address, parent: :indexed_type do
Summary
RecordPreparerwas incorrectly injecting__typenameinto embedded fields typed as a concrete object type. The bug arose because@types_requiring_typename— a set that includes concrete types whose shared index mapping contains__typename(e.g.Personindexed alongsideCompanyin a union index) — was being used both at the root level (correct) and during embedded field processing (incorrect).When a type like
Personis indexed in a shared union index, it legitimately needs__typenameat the root document level so the datastore can distinguish subtypes. However, whenPersonappears as the type of an embedded field (e.g.Manufacturer.ceo), it is a concrete type and__typenameis neither needed nor valid there. Injecting it caused aKeyErrorinCountAccumulatorwhen traversing the prepared document, since__typenameis not present in the embedded object's index mapping properties.Fix: Split
@types_requiring_typenameinto two sets:@types_with_typename_required_in_json_schema— types where the JSON schema marks__typenameas required (i.e. abstract types that can hold multiple subtypes). Used when deciding whether to preserve__typenameon embedded fields duringprepare_value_for_indexing.@types_requiring_typename— a superset that also includes concrete types inheriting a shared index. Used only at the root level inprepare_for_indexto inject__typenamewhen absent.The test schema is also updated to give
PersonandCompanytheir own dedicated indexes (peopleandcompanies), making them types that are both directly indexed and embedded in another indexed type (Manufacturer.ceo). This exercises the previously untested case Myron noted in the issue. Once index inheritance is fully implemented, the index can be moved up to theNamedInventorinterface, at which point the acceptance test will also directly exercise the bug fix path.Test Plan
Unit test (
record_preparer_spec.rb): A new example directly demonstrates the bug and fix. It builds a schema wherePersonshares a union index (requiring__typenameat root) and is also embedded asManufacturer.ceo(where__typenamemust be absent). The test verifies that preparing a rootPersonrecord injects__typename, while preparing aManufacturerrecord with an embeddedPersonceo does not.Acceptance test (
search_spec.rb): The existing "supports fetching interface fields" test is extended to index aManufacturerwith aceo: Personand query it end-to-end vianamed_entities. This confirms the full round-trip works correctly for a type that is both a root indexed type and embedded in another indexed type.Fixes #1097