Repro
require "rbs"
require "rbs/inline_parser"
src = <<~RUBY
class C
# 日本語
#: () -> void
def m; end
end
RUBY
buffer = RBS::Buffer.new(name: Pathname("t.rb"), content: src)
prism = Prism.parse(src)
result = RBS::InlineParser.parse(buffer, prism)
result.diagnostics.each { |d| puts "#{d.class.name.split('::').last}: #{d.message}" }
Expected
No AnnotationSyntaxError for the #: () -> void annotation.
Actual
AnnotationSyntaxError: Syntax error: unexpected token for inline leading annotation
Replacing # 日本語 with an ASCII-only comment (e.g. # comment) parses cleanly without diagnostics.
Environment
- rbs: 4.0.2 (also reproduces on master, 4.1.0.pre.1)
- ruby: 3.4.7
- platform: arm64-darwin24
Hypothesis
Parser.parse_inline_leading_annotation and parse_inline_trailing_annotation in lib/rbs/parser_aux.rb#L122-L130 pass the range argument directly to the C parser:
def self.parse_inline_leading_annotation(source, range, variables: [])
buf = buffer(source)
_parse_inline_leading_annotation(buf, range.begin || 0, range.end || buf.last_position, variables)
end
RBS::AST::Ruby::CommentBlock builds comment_buffer.ranges using character offsets. After #2863, the C parser expects byte offsets. Other parse_* methods in the same file use byte_range(range, buf.content) to convert character offsets to byte offsets, but parse_inline_*_annotation were not updated. With ASCII-only input the two coincide, hiding the bug; multibyte content makes the C parser start at an invalid byte position.
I'd like to confirm whether this was intentional or an oversight. I've prepared a proposed fix in #2945.
Repro
Expected
No
AnnotationSyntaxErrorfor the#: () -> voidannotation.Actual
Replacing
# 日本語with an ASCII-only comment (e.g.# comment) parses cleanly without diagnostics.Environment
Hypothesis
Parser.parse_inline_leading_annotationandparse_inline_trailing_annotationinlib/rbs/parser_aux.rb#L122-L130pass therangeargument directly to the C parser:RBS::AST::Ruby::CommentBlockbuildscomment_buffer.rangesusing character offsets. After #2863, the C parser expects byte offsets. Otherparse_*methods in the same file usebyte_range(range, buf.content)to convert character offsets to byte offsets, butparse_inline_*_annotationwere not updated. With ASCII-only input the two coincide, hiding the bug; multibyte content makes the C parser start at an invalid byte position.I'd like to confirm whether this was intentional or an oversight. I've prepared a proposed fix in #2945.