-
Notifications
You must be signed in to change notification settings - Fork 85
Add span-aware AST and lossy parser for language server #690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Add lossy parsing capabilities to recover valid declarations from partially-invalid Candid programs, along with API for parsing from custom token iterators. - Add IDLProgLossy parser rule with RecoverDef and MainActorLossy recovery points to collect errors while parsing valid declarations - Add parse_prog_lossy() public API returning partial AST and error list - Add parse_idl_prog_from_tokens() for custom token iterator support - Make ParserError public to enable error handling in external code - Reorganize token imports for consistency (alphabetical ordering) - Add comprehensive tests for both new parsing modes
Split AST into spanned and spanless representations to allow different use cases to choose the appropriate level of detail. - Create syntax::spanned module with all original Span-aware types - Add spanless types in syntax root for consumers that don't need spans - Implement From<spanned::T> conversions for compatibility - Simplify pattern matching in code generators by matching directly on IDLType enum variants instead of extracting .kind field - Update grammar to produce spanned types, convert to spanless at API boundaries - Update all tests to work with new type structure
ilbertt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move the code refactoring in another PR to keep this PR simpler. Based on my review, the files whose changes can be reverted/moved to a new PR are:
rust/candid_parser/src/bindings/motoko.rsrust/candid_parser/src/bindings/rust.rsrust/candid_parser/src/bindings/typescript.rsrust/candid_parser/src/typing.rsrust/candid_parser/tests/parse_type.rs
Additionally, if we remove the IDLTypeKind type alias, we can also revert these files:
rust/candid_parser/tests/test_doc_comments.rsrust/candid_parser/src/syntax/pretty.rs
This will make the PR way smaller and easier to review. What do you think?
rust/candid_parser/src/syntax/mod.rs
Outdated
| PrincipalT, | ||
| } | ||
|
|
||
| pub type IDLTypeKind = IDLType; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this alias?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the alias to mirror IDLTypeKind in the spanned module, but looking again, it doesn’t actually add any value...
| } | ||
|
|
||
| #[derive(Debug, Clone, PartialEq, Eq)] | ||
| pub struct IDLType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would call this struct IDLTypeWithSpan or so, so that it doesn't get confused with the existing syntax::IDLType
rust/candid_parser/src/typing.rs
Outdated
| if let Dec::TypD(Binding { id, typ, .. }) = dec { | ||
| let t = check_type(env, typ)?; | ||
| env.te.0.insert(id.to_string(), t); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though this makes the code a bit cleaner, I would do it in a separate PR to keep the scope of this PR as close as possible to its goal. Feel free to open another PR!
rust/candid_parser/src/typing.rs
Outdated
| if let Some((file, include_serv)) = match dec { | ||
| Dec::ImportType(file) => Some((file, false)), | ||
| Dec::ImportServ(file) => Some((file, true)), | ||
| _ => None, | ||
| } { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, I would avoid refactoring the code if it's not in the scope of the current PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have the next branch in which we are collecting all the breaking changes. There we can change the syntax::IDLType and all the related structs to have the new span field. What do you think? I feel like repeating the code of the IDLType here is not a good approach in terms of maintainability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I placed the traditional module alongside the new one due to a lack of understanding of the destructive change's impact. However, as you suggested, unifying it into a structure that includes span is indeed a better approach. I will redesign it accordingly.
| match ty.as_ref() { | ||
| TypeInner::Record(ref fields) => { | ||
| if let Some(IDLTypeKind::RecordT(syntax_fields)) = syntax { | ||
| return pp_record(env, fields, Some(syntax_fields), is_ref); | ||
| } | ||
| pp_record(env, fields, None, is_ref) | ||
| } | ||
| (TypeInner::Variant(ref fields), Some(IDLType::VariantT(syntax_fields))) => { | ||
| pp_variant(env, fields, Some(syntax_fields), is_ref) | ||
| TypeInner::Variant(ref fields) => { | ||
| if let Some(IDLTypeKind::VariantT(syntax_fields)) = syntax { | ||
| return pp_variant(env, fields, Some(syntax_fields), is_ref); | ||
| } | ||
| pp_variant(env, fields, None, is_ref) | ||
| } | ||
| (TypeInner::Service(ref serv), Some(IDLType::ServT(syntax_serv))) => { | ||
| pp_service(env, serv, Some(syntax_serv)) | ||
| TypeInner::Service(ref serv) => { | ||
| if let Some(IDLTypeKind::ServT(syntax_serv)) = syntax { | ||
| return pp_service(env, serv, Some(syntax_serv)); | ||
| } | ||
| pp_service(env, serv, None) | ||
| } | ||
| (TypeInner::Opt(ref t), Some(IDLType::OptT(syntax_inner))) => { | ||
| pp_opt(env, t, Some(syntax_inner), is_ref) | ||
| TypeInner::Opt(ref t) => { | ||
| if let Some(IDLTypeKind::OptT(syntax_inner)) = syntax { | ||
| return pp_opt(env, t, Some(syntax_inner), is_ref); | ||
| } | ||
| pp_opt(env, t, None, is_ref) | ||
| } | ||
| (TypeInner::Vec(ref t), Some(IDLType::VecT(syntax_inner))) => { | ||
| pp_vec(env, t, Some(syntax_inner), is_ref) | ||
| TypeInner::Vec(ref t) => { | ||
| if let Some(IDLTypeKind::VecT(syntax_inner)) = syntax { | ||
| return pp_vec(env, t, Some(syntax_inner), is_ref); | ||
| } | ||
| pp_vec(env, t, None, is_ref) | ||
| } | ||
| (_, _) => pp_ty(env, ty, is_ref), | ||
| _ => pp_ty(env, ty, is_ref), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the typing.rs file, I would avoid refactoring code that is not in scope with the PR's goal
|
Thank you for your review. I plan to revert the refactoring-related changes and consolidate them into a separate PR. Also, I will change the |
- Remove separate `spanned.rs` module in favor of inline type definitions - Introduce `IDLTypeWithSpan` wrapper to track source spans alongside type kinds - Revert `IDLType` to direct enum definition - Move span information directly into data structures (TypeField, Binding, etc.) - Update all bindings (Motoko, Rust, TypeScript) to access `typ.kind` for type information - Simplify type parsing by eliminating conversion between spanned and spanless types
|
I've made changes to keep it as close to the |
Overview
Introduce a span-aware AST that the parser now builds first so experimental
candid-language-servercan attach precise locations to every declaration, field, and service item.The new
syntax::spannedmodule defines the source-of-truth structures and parsing helpers (rust/candid_parser/src/syntax/spanned.rs:13),grammar.lalrpopdrives lalrpop to emit those structures (rust/candid_parser/src/grammar.lalrpop:4), and the existing public AST insyntaxconverts from the spanned version to keep the current API stable (rust/candid_parser/src/syntax/mod.rs:49).Parser entry points and new regression tests (
rust/candid_parser/tests/parse_prog.rs:16) exercise the updated pipeline so doc comments and services survive the round-trip.A lossy parser variant was added so the language server can recover as much structure as possible while still collecting tokens.
Requirements
candid_parser::syntaxtypes for downstream crates while offering a lossless path to spanned data.rust/candid_parser/tests/parse_prog.rs:16).Considered Solutions
Recommended Solution
Adopt the spanned AST internally and convert to the existing structs for callers that rely on them. Bindings/generators were updated where they destructure AST nodes, the lossy parser entry point (
parse_prog_lossy) now rides on the spanned types, and new tests cover both exact and lossy parsing scenarios.This keeps the public surface stable while supplying the language server with the spans it needs.
Considerations
candid-language-server; other consumers interact with the familiar API but pay a small overhead because span-less structs are now derived from the spanned ones on every parse.syntax/mod.rsvs.syntax/spanned.rs), so future changes must modify both in lockstep to prevent drift.