Skip to content

parse_inline_*_annotation fails on multibyte content (since #2863) #2944

@pjocprac

Description

@pjocprac

Repro

require "rbs"
require "rbs/inline_parser"

src = <<~RUBY
class C
  # 日本語
  #: () -> void
  def m; end
end
RUBY

buffer = RBS::Buffer.new(name: Pathname("t.rb"), content: src)
prism = Prism.parse(src)
result = RBS::InlineParser.parse(buffer, prism)
result.diagnostics.each { |d| puts "#{d.class.name.split('::').last}: #{d.message}" }

Expected

No AnnotationSyntaxError for the #: () -> void annotation.

Actual

AnnotationSyntaxError: Syntax error: unexpected token for inline leading annotation

Replacing # 日本語 with an ASCII-only comment (e.g. # comment) parses cleanly without diagnostics.

Environment

  • rbs: 4.0.2 (also reproduces on master, 4.1.0.pre.1)
  • ruby: 3.4.7
  • platform: arm64-darwin24

Hypothesis

Parser.parse_inline_leading_annotation and parse_inline_trailing_annotation in lib/rbs/parser_aux.rb#L122-L130 pass the range argument directly to the C parser:

def self.parse_inline_leading_annotation(source, range, variables: [])
  buf = buffer(source)
  _parse_inline_leading_annotation(buf, range.begin || 0, range.end || buf.last_position, variables)
end

RBS::AST::Ruby::CommentBlock builds comment_buffer.ranges using character offsets. After #2863, the C parser expects byte offsets. Other parse_* methods in the same file use byte_range(range, buf.content) to convert character offsets to byte offsets, but parse_inline_*_annotation were not updated. With ASCII-only input the two coincide, hiding the bug; multibyte content makes the C parser start at an invalid byte position.

I'd like to confirm whether this was intentional or an oversight. I've prepared a proposed fix in #2945.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions