Add span-aware AST and lossy parser for language server #690

wiyota · 2025-11-10T03:29:58Z

Overview

Introduce a span-aware AST that the parser now builds first so experimental candid-language-server can attach precise locations to every declaration, field, and service item.

The new syntax::spanned module defines the source-of-truth structures and parsing helpers (rust/candid_parser/src/syntax/spanned.rs:13), grammar.lalrpop drives lalrpop to emit those structures (rust/candid_parser/src/grammar.lalrpop:4), and the existing public AST in syntax converts from the spanned version to keep the current API stable (rust/candid_parser/src/syntax/mod.rs:49).

Parser entry points and new regression tests (rust/candid_parser/tests/parse_prog.rs:16) exercise the updated pipeline so doc comments and services survive the round-trip.

A lossy parser variant was added so the language server can recover as much structure as possible while still collecting tokens.

Requirements

Provide source spans for every AST node so the language server can power go-to-definition, hover, and diagnostics without reparsing.
Preserve the existing span-less candid_parser::syntax types for downstream crates while offering a lossless path to spanned data.
Ensure token-recording and lossy parsing entry points keep working with the new AST (covered by the new parser tests in rust/candid_parser/tests/parse_prog.rs:16).

Considered Solutions

Extending the existing AST structs with optional spans would have been a breaking change for every consumer.
Maintaining a separate parser just for the language server risked grammar drift and duplicated maintenance.
The spanned-AST-plus-conversion approach keeps a single grammar and minimizes API churn.

Recommended Solution

Adopt the spanned AST internally and convert to the existing structs for callers that rely on them. Bindings/generators were updated where they destructure AST nodes, the lossy parser entry point (parse_prog_lossy) now rides on the spanned types, and new tests cover both exact and lossy parsing scenarios.

This keeps the public surface stable while supplying the language server with the spans it needs.

Considerations

This work primarily benefits experimental candid-language-server; other consumers interact with the familiar API but pay a small overhead because span-less structs are now derived from the spanned ones on every parse.
When consumers need the old AST, we convert out of the spanned version, so there is a slight allocation/copy cost that we’ll monitor.
We now have two AST definitions (syntax/mod.rs vs. syntax/spanned.rs), so future changes must modify both in lockstep to prevent drift.

Add lossy parsing capabilities to recover valid declarations from partially-invalid Candid programs, along with API for parsing from custom token iterators. - Add IDLProgLossy parser rule with RecoverDef and MainActorLossy recovery points to collect errors while parsing valid declarations - Add parse_prog_lossy() public API returning partial AST and error list - Add parse_idl_prog_from_tokens() for custom token iterator support - Make ParserError public to enable error handling in external code - Reorganize token imports for consistency (alphabetical ordering) - Add comprehensive tests for both new parsing modes

Split AST into spanned and spanless representations to allow different use cases to choose the appropriate level of detail. - Create syntax::spanned module with all original Span-aware types - Add spanless types in syntax root for consumers that don't need spans - Implement From<spanned::T> conversions for compatibility - Simplify pattern matching in code generators by matching directly on IDLType enum variants instead of extracting .kind field - Update grammar to produce spanned types, convert to spanless at API boundaries - Update all tests to work with new type structure

ilbertt

I would move the code refactoring in another PR to keep this PR simpler. Based on my review, the files whose changes can be reverted/moved to a new PR are:

rust/candid_parser/src/bindings/motoko.rs
rust/candid_parser/src/bindings/rust.rs
rust/candid_parser/src/bindings/typescript.rs
rust/candid_parser/src/typing.rs
rust/candid_parser/tests/parse_type.rs

Additionally, if we remove the IDLTypeKind type alias, we can also revert these files:

rust/candid_parser/tests/test_doc_comments.rs
rust/candid_parser/src/syntax/pretty.rs

This will make the PR way smaller and easier to review. What do you think?

ilbertt · 2025-11-11T09:42:19Z

rust/candid_parser/src/syntax/mod.rs

    PrincipalT,
 }

+pub type IDLTypeKind = IDLType;


Why do we need this alias?

I added the alias to mirror IDLTypeKind in the spanned module, but looking again, it doesn’t actually add any value...

ilbertt · 2025-11-11T09:43:35Z

rust/candid_parser/src/syntax/spanned.rs

+}
+
+#[derive(Debug, Clone, PartialEq, Eq)]
+pub struct IDLType {


I would call this struct IDLTypeWithSpan or so, so that it doesn't get confused with the existing syntax::IDLType

ilbertt · 2025-11-11T09:48:16Z

rust/candid_parser/src/typing.rs

+        if let Dec::TypD(Binding { id, typ, .. }) = dec {
+            let t = check_type(env, typ)?;
+            env.te.0.insert(id.to_string(), t);
        }


Even though this makes the code a bit cleaner, I would do it in a separate PR to keep the scope of this PR as close as possible to its goal. Feel free to open another PR!

ilbertt · 2025-11-11T09:48:45Z

rust/candid_parser/src/typing.rs

+        if let Some((file, include_serv)) = match dec {
+            Dec::ImportType(file) => Some((file, false)),
+            Dec::ImportServ(file) => Some((file, true)),
+            _ => None,
+        } {


Same as above, I would avoid refactoring the code if it's not in the scope of the current PR

ilbertt · 2025-11-11T09:52:04Z

rust/candid_parser/src/syntax/spanned.rs

We have the next branch in which we are collecting all the breaking changes. There we can change the syntax::IDLType and all the related structs to have the new span field. What do you think? I feel like repeating the code of the IDLType here is not a good approach in terms of maintainability

I placed the traditional module alongside the new one due to a lack of understanding of the destructive change's impact. However, as you suggested, unifying it into a structure that includes span is indeed a better approach. I will redesign it accordingly.

ilbertt · 2025-11-11T09:53:45Z

rust/candid_parser/src/bindings/typescript.rs

+    match ty.as_ref() {
+        TypeInner::Record(ref fields) => {
+            if let Some(IDLTypeKind::RecordT(syntax_fields)) = syntax {
+                return pp_record(env, fields, Some(syntax_fields), is_ref);
+            }
+            pp_record(env, fields, None, is_ref)
        }
-        (TypeInner::Variant(ref fields), Some(IDLType::VariantT(syntax_fields))) => {
-            pp_variant(env, fields, Some(syntax_fields), is_ref)
+        TypeInner::Variant(ref fields) => {
+            if let Some(IDLTypeKind::VariantT(syntax_fields)) = syntax {
+                return pp_variant(env, fields, Some(syntax_fields), is_ref);
+            }
+            pp_variant(env, fields, None, is_ref)
        }
-        (TypeInner::Service(ref serv), Some(IDLType::ServT(syntax_serv))) => {
-            pp_service(env, serv, Some(syntax_serv))
+        TypeInner::Service(ref serv) => {
+            if let Some(IDLTypeKind::ServT(syntax_serv)) = syntax {
+                return pp_service(env, serv, Some(syntax_serv));
+            }
+            pp_service(env, serv, None)
        }
-        (TypeInner::Opt(ref t), Some(IDLType::OptT(syntax_inner))) => {
-            pp_opt(env, t, Some(syntax_inner), is_ref)
+        TypeInner::Opt(ref t) => {
+            if let Some(IDLTypeKind::OptT(syntax_inner)) = syntax {
+                return pp_opt(env, t, Some(syntax_inner), is_ref);
+            }
+            pp_opt(env, t, None, is_ref)
        }
-        (TypeInner::Vec(ref t), Some(IDLType::VecT(syntax_inner))) => {
-            pp_vec(env, t, Some(syntax_inner), is_ref)
+        TypeInner::Vec(ref t) => {
+            if let Some(IDLTypeKind::VecT(syntax_inner)) = syntax {
+                return pp_vec(env, t, Some(syntax_inner), is_ref);
+            }
+            pp_vec(env, t, None, is_ref)
        }
-        (_, _) => pp_ty(env, ty, is_ref),
+        _ => pp_ty(env, ty, is_ref),


Similar to the typing.rs file, I would avoid refactoring code that is not in scope with the PR's goal

wiyota · 2025-11-13T13:53:15Z

Thank you for your review.

I plan to revert the refactoring-related changes and consolidate them into a separate PR.

Also, I will change the spanned module to replace the conventional syntax module.

- Remove separate `spanned.rs` module in favor of inline type definitions - Introduce `IDLTypeWithSpan` wrapper to track source spans alongside type kinds - Revert `IDLType` to direct enum definition - Move span information directly into data structures (TypeField, Binding, etc.) - Update all bindings (Motoko, Rust, TypeScript) to access `typ.kind` for type information - Simplify type parsing by eliminating conversion between spanned and spanless types

wiyota · 2025-11-16T00:43:47Z

I've made changes to keep it as close to the master branch as possible. How does it look?

wiyota added 3 commits November 5, 2025 12:15

feat: carry spans through IDL types for AST

2ba2e04

cla-idx-bot bot added the external-contributor label Nov 10, 2025

ilbertt requested changes Nov 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add span-aware AST and lossy parser for language server #690

Add span-aware AST and lossy parser for language server #690

wiyota commented Nov 10, 2025

Uh oh!

ilbertt left a comment

Uh oh!

ilbertt Nov 11, 2025

Uh oh!

wiyota Nov 13, 2025

Uh oh!

ilbertt Nov 11, 2025

Uh oh!

ilbertt Nov 11, 2025

Uh oh!

ilbertt Nov 11, 2025

Uh oh!

ilbertt Nov 11, 2025

Uh oh!

wiyota Nov 13, 2025

Uh oh!

ilbertt Nov 11, 2025

Uh oh!

wiyota commented Nov 13, 2025

Uh oh!

wiyota commented Nov 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add span-aware AST and lossy parser for language server #690

Are you sure you want to change the base?

Add span-aware AST and lossy parser for language server #690

Conversation

wiyota commented Nov 10, 2025

Uh oh!

ilbertt left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wiyota commented Nov 13, 2025

Uh oh!

wiyota commented Nov 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants