fix(dsl): prevent lexer panic on string ending in trailing backslash#3007
Conversation
π WalkthroughWalkthroughThe fix adds an end-of-input guard inside ChangesLexer panic fix for trailing backslash
Estimated code review effortπ― 2 (Simple) | β±οΈ ~5 minutes Poem
π₯ Pre-merge checks | β 5β Passed checks (5 passed)
βοΈ Tip: You can configure your own custom pre-merge checks in the settings. β¨ Finishing Touchesπ§ͺ Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@omer-topal @ucatbas please have a look and lmk for any required changes |
There was a problem hiding this comment.
Actionable comments posted: 1
π§Ή Nitpick comments (1)
pkg/dsl/lexer/lexer_test.go (1)
1193-1207: π€ Low valueConsider asserting on token properties for more comprehensive validation.
The test correctly verifies that no panic occurs, which addresses the primary objective. However, it could be enhanced by also asserting:
- The token type (expected:
token.STRING)- The literal value (expected: empty string
"")- The number of tokens before EOF (expected: 1)
These additions would document the expected behavior more explicitly and catch any regression in how unterminated strings are tokenized.
π Example enhancement
It("Case 11 - Unterminated string ending in a trailing backslash should not panic", func() { // Regression test: a string literal that ends right after a backslash // (no character left to escape) used to overshoot the input bounds // and panic with a slice-out-of-range error. l := NewLexer("\"\\") - Expect(func() { - for { - tok := l.NextToken() - if tok.Type == token.EOF { - break - } - } - }).ShouldNot(Panic()) + var tokens []token.Token + Expect(func() { + for { + tok := l.NextToken() + tokens = append(tokens, tok) + if tok.Type == token.EOF { + break + } + } + }).ShouldNot(Panic()) + + Expect(tokens).To(HaveLen(2)) // STRING token + EOF token + Expect(tokens[0].Type).To(Equal(token.STRING)) + Expect(tokens[0].Literal).To(Equal("")) // Unterminated string yields empty literal })π€ Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/dsl/lexer/lexer_test.go` around lines 1193 - 1207, The test for the unterminated string with trailing backslash currently only validates that no panic occurs, but should also verify the correct tokenization behavior. Add assertions within the token iteration loop to capture and verify: the token type should be token.STRING before reaching token.EOF, the token literal value should be an empty string, and there should be exactly one token before the EOF token. Store the tokens in a slice or counter variable during the loop, then assert these properties after the loop completes to document the expected behavior and catch any regression in how unterminated strings are tokenized by the Lexer.
π€ Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/dsl/lexer/lexer_test.go`:
- Around line 1193-1207: Add additional regression test cases to the test file
following the same pattern as the existing "Case 11" test. Create separate test
cases for valid escape sequences at string end (such as "\n and "\t), invalid
escape sequences at string end (such as "\x and "\z), and multiple backslashes
at string end (such as "\\ and "\\\). Each new test case should follow the same
structure as Case 11: create a NewLexer with the unterminated string input,
iterate through tokens with NextToken() until reaching token.EOF, and verify
that the operation does not panic. This ensures comprehensive regression
coverage for all unterminated string patterns with escape sequences.
---
Nitpick comments:
In `@pkg/dsl/lexer/lexer_test.go`:
- Around line 1193-1207: The test for the unterminated string with trailing
backslash currently only validates that no panic occurs, but should also verify
the correct tokenization behavior. Add assertions within the token iteration
loop to capture and verify: the token type should be token.STRING before
reaching token.EOF, the token literal value should be an empty string, and there
should be exactly one token before the EOF token. Store the tokens in a slice or
counter variable during the loop, then assert these properties after the loop
completes to document the expected behavior and catch any regression in how
unterminated strings are tokenized by the Lexer.
πͺ Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
βΉοΈ Review info
βοΈ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 5a2c45ea-e34e-4ca5-a6ac-619f66a5facb
π Files selected for processing (2)
pkg/dsl/lexer/lexer.gopkg/dsl/lexer/lexer_test.go
| It("Case 11 - Unterminated string ending in a trailing backslash should not panic", func() { | ||
| // Regression test: a string literal that ends right after a backslash | ||
| // (no character left to escape) used to overshoot the input bounds | ||
| // and panic with a slice-out-of-range error. | ||
| l := NewLexer("\"\\") | ||
|
|
||
| Expect(func() { | ||
| for { | ||
| tok := l.NextToken() | ||
| if tok.Type == token.EOF { | ||
| break | ||
| } | ||
| } | ||
| }).ShouldNot(Panic()) | ||
| }) |
There was a problem hiding this comment.
π§© Analysis chain
π Script executed:
#!/bin/bash
# Description: Search for existing tests covering escape sequences and unterminated strings
# Search for test cases involving escape sequences in strings
rg -n -A 5 -B 2 'NewLexer.*\\\\' pkg/dsl/lexer/lexer_test.go
# Search for test cases with unterminated strings
rg -n -A 5 -B 2 'unterminated|Unterminated' pkg/dsl/lexer/lexer_test.goRepository: Permify/permify
Length of output: 680
π Script executed:
# Get all test cases in the file to see the scope
rg -n 'It\(' pkg/dsl/lexer/lexer_test.go | head -20Repository: Permify/permify
Length of output: 615
π Script executed:
# Search for tests with actual escape sequences in test inputs
rg -n '\\\\n|\\\\t|\\\\x|\\\\z|\\\\\\\\' pkg/dsl/lexer/lexer_test.go | head -20Repository: Permify/permify
Length of output: 41
π Script executed:
# Get context around Case 11 and nearby test cases to see what else is tested
sed -n '1150,1250p' pkg/dsl/lexer/lexer_test.goRepository: Permify/permify
Length of output: 1448
π Script executed:
# Find and examine the lexer implementation to understand escape sequence handling
fd lexer.go pkg/dsl/lexer/ --type fRepository: Permify/permify
Length of output: 82
π Script executed:
# Search for escape sequence handling in the lexer
rg -n -B 3 -A 10 'escape|backslash' pkg/dsl/lexer/lexer.go | head -40Repository: Permify/permify
Length of output: 823
π Script executed:
# Get more of the lexString function to see complete escape handling
sed -n '205,280p' pkg/dsl/lexer/lexer.goRepository: Permify/permify
Length of output: 1836
Add regression tests for other escape-sequence edge cases in unterminated strings.
The trailing backslash case (Case 11) is properly tested, but similar edge cases with escape sequences lack coverage. The lexer implementation has a safe if l.ch == 0 check that prevents panics, but the following scenarios should be explicitly tested:
- Valid escape sequences at string end:
"\n,"\t - Invalid escape sequences at string end:
"\x,"\z - Multiple backslashes at string end:
"\\,"\\\
Adding these tests ensures regression protection and documents the expected behavior for all unterminated string patterns.
π€ Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/dsl/lexer/lexer_test.go` around lines 1193 - 1207, Add additional
regression test cases to the test file following the same pattern as the
existing "Case 11" test. Create separate test cases for valid escape sequences
at string end (such as "\n and "\t), invalid escape sequences at string end
(such as "\x and "\z), and multiple backslashes at string end (such as "\\ and
"\\\). Each new test case should follow the same structure as Case 11: create a
NewLexer with the unterminated string input, iterate through tokens with
NextToken() until reaching token.EOF, and verify that the operation does not
panic. This ensures comprehensive regression coverage for all unterminated
string patterns with escape sequences.
Codecov Reportβ
All modified and coverable lines are covered by tests. β Your project check has failed because the head coverage (74.69%) is below the target coverage (75.00%). You can increase the head coverage or adjust the target coverage.
Additional details and impacted files@@ Coverage Diff @@
## master #3007 +/- ##
==========================================
- Coverage 82.58% 74.69% -7.89%
==========================================
Files 74 83 +9
Lines 8300 9215 +915
==========================================
+ Hits 6854 6882 +28
- Misses 910 1798 +888
+ Partials 536 535 -1 β View full report in Codecov by Harness. π New features to boost your workflow:
|
Bug
The schema DSL lexer panics with a slice-out-of-range error when lexing an unterminated string that ends in a trailing backslash.
Minimal repro:
NewParser(""\").Parse()
panic: runtime error: slice bounds out of range [:3] with length 2
pkg/dsl/lexer/lexer.go:231
Reached from the SchemaWrite API via Parse(); the gRPC recovery interceptor catches it, so it errors the request instead of crashing the server, but it's still an uncontrolled panic on untrusted input.
Root cause
In lexString, when a \ is the last character of the input, l.readChar() (skipping the backslash) sets l.ch = 0 but the code still executes position = l.position + 1, advancing position one past len(input).
The loop then exits on l.ch == 0, and the final str += l.input[position:l.position] slices with a start index greater than the string length, panicking.
Fix
Detect l.ch == 0 immediately after skipping the backslash and break early with position = l.position, treating it as an unterminated string instead of continuing to advance past the input bounds.
Testing
Added a regression test in lexer_test.go asserting the lexer doesn't panic on ""\", and verified the exact reporter NewParser(""\").Parse()) now returns cleanly. Full pkg/dsl/... suite still passes.
Closes #3004
Summary by CodeRabbit
Bug Fixes
Tests