Skip to content

Conversation

@wiyota
Copy link

@wiyota wiyota commented Nov 10, 2025

Overview

Introduce a span-aware AST that the parser now builds first so experimental candid-language-server can attach precise locations to every declaration, field, and service item.

The new syntax::spanned module defines the source-of-truth structures and parsing helpers (rust/candid_parser/src/syntax/spanned.rs:13), grammar.lalrpop drives lalrpop to emit those structures (rust/candid_parser/src/grammar.lalrpop:4), and the existing public AST in syntax converts from the spanned version to keep the current API stable (rust/candid_parser/src/syntax/mod.rs:49).

Parser entry points and new regression tests (rust/candid_parser/tests/parse_prog.rs:16) exercise the updated pipeline so doc comments and services survive the round-trip.

A lossy parser variant was added so the language server can recover as much structure as possible while still collecting tokens.

Requirements

  • Provide source spans for every AST node so the language server can power go-to-definition, hover, and diagnostics without reparsing.
  • Preserve the existing span-less candid_parser::syntax types for downstream crates while offering a lossless path to spanned data.
  • Ensure token-recording and lossy parsing entry points keep working with the new AST (covered by the new parser tests in rust/candid_parser/tests/parse_prog.rs:16).

Considered Solutions

  • Extending the existing AST structs with optional spans would have been a breaking change for every consumer.
  • Maintaining a separate parser just for the language server risked grammar drift and duplicated maintenance.
  • The spanned-AST-plus-conversion approach keeps a single grammar and minimizes API churn.

Recommended Solution

Adopt the spanned AST internally and convert to the existing structs for callers that rely on them. Bindings/generators were updated where they destructure AST nodes, the lossy parser entry point (parse_prog_lossy) now rides on the spanned types, and new tests cover both exact and lossy parsing scenarios.

This keeps the public surface stable while supplying the language server with the spans it needs.

Considerations

  • This work primarily benefits experimental candid-language-server; other consumers interact with the familiar API but pay a small overhead because span-less structs are now derived from the spanned ones on every parse.
  • When consumers need the old AST, we convert out of the spanned version, so there is a slight allocation/copy cost that we’ll monitor.
  • We now have two AST definitions (syntax/mod.rs vs. syntax/spanned.rs), so future changes must modify both in lockstep to prevent drift.

Add lossy parsing capabilities to recover valid declarations from
partially-invalid Candid programs, along with API for parsing from
custom token iterators.

- Add IDLProgLossy parser rule with RecoverDef and MainActorLossy
  recovery points to collect errors while parsing valid declarations
- Add parse_prog_lossy() public API returning partial AST and error list
- Add parse_idl_prog_from_tokens() for custom token iterator support
- Make ParserError public to enable error handling in external code
- Reorganize token imports for consistency (alphabetical ordering)
- Add comprehensive tests for both new parsing modes
Split AST into spanned and spanless representations to allow different
use cases to choose the appropriate level of detail.

- Create syntax::spanned module with all original Span-aware types
- Add spanless types in syntax root for consumers that don't need spans
- Implement From<spanned::T> conversions for compatibility
- Simplify pattern matching in code generators by matching directly on
  IDLType enum variants instead of extracting .kind field
- Update grammar to produce spanned types, convert to spanless at API
  boundaries
- Update all tests to work with new type structure
Copy link
Member

@ilbertt ilbertt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move the code refactoring in another PR to keep this PR simpler. Based on my review, the files whose changes can be reverted/moved to a new PR are:

  • rust/candid_parser/src/bindings/motoko.rs
  • rust/candid_parser/src/bindings/rust.rs
  • rust/candid_parser/src/bindings/typescript.rs
  • rust/candid_parser/src/typing.rs
  • rust/candid_parser/tests/parse_type.rs

Additionally, if we remove the IDLTypeKind type alias, we can also revert these files:

  • rust/candid_parser/tests/test_doc_comments.rs
  • rust/candid_parser/src/syntax/pretty.rs

This will make the PR way smaller and easier to review. What do you think?

PrincipalT,
}

pub type IDLTypeKind = IDLType;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this alias?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the alias to mirror IDLTypeKind in the spanned module, but looking again, it doesn’t actually add any value...

}

#[derive(Debug, Clone, PartialEq, Eq)]
pub struct IDLType {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would call this struct IDLTypeWithSpan or so, so that it doesn't get confused with the existing syntax::IDLType

Comment on lines 139 to 142
if let Dec::TypD(Binding { id, typ, .. }) = dec {
let t = check_type(env, typ)?;
env.te.0.insert(id.to_string(), t);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though this makes the code a bit cleaner, I would do it in a separate PR to keep the scope of this PR as close as possible to its goal. Feel free to open another PR!

Comment on lines 225 to 229
if let Some((file, include_serv)) = match dec {
Dec::ImportType(file) => Some((file, false)),
Dec::ImportServ(file) => Some((file, true)),
_ => None,
} {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, I would avoid refactoring the code if it's not in the scope of the current PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the next branch in which we are collecting all the breaking changes. There we can change the syntax::IDLType and all the related structs to have the new span field. What do you think? I feel like repeating the code of the IDLType here is not a good approach in terms of maintainability

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I placed the traditional module alongside the new one due to a lack of understanding of the destructive change's impact. However, as you suggested, unifying it into a structure that includes span is indeed a better approach. I will redesign it accordingly.

Comment on lines 32 to 63
match ty.as_ref() {
TypeInner::Record(ref fields) => {
if let Some(IDLTypeKind::RecordT(syntax_fields)) = syntax {
return pp_record(env, fields, Some(syntax_fields), is_ref);
}
pp_record(env, fields, None, is_ref)
}
(TypeInner::Variant(ref fields), Some(IDLType::VariantT(syntax_fields))) => {
pp_variant(env, fields, Some(syntax_fields), is_ref)
TypeInner::Variant(ref fields) => {
if let Some(IDLTypeKind::VariantT(syntax_fields)) = syntax {
return pp_variant(env, fields, Some(syntax_fields), is_ref);
}
pp_variant(env, fields, None, is_ref)
}
(TypeInner::Service(ref serv), Some(IDLType::ServT(syntax_serv))) => {
pp_service(env, serv, Some(syntax_serv))
TypeInner::Service(ref serv) => {
if let Some(IDLTypeKind::ServT(syntax_serv)) = syntax {
return pp_service(env, serv, Some(syntax_serv));
}
pp_service(env, serv, None)
}
(TypeInner::Opt(ref t), Some(IDLType::OptT(syntax_inner))) => {
pp_opt(env, t, Some(syntax_inner), is_ref)
TypeInner::Opt(ref t) => {
if let Some(IDLTypeKind::OptT(syntax_inner)) = syntax {
return pp_opt(env, t, Some(syntax_inner), is_ref);
}
pp_opt(env, t, None, is_ref)
}
(TypeInner::Vec(ref t), Some(IDLType::VecT(syntax_inner))) => {
pp_vec(env, t, Some(syntax_inner), is_ref)
TypeInner::Vec(ref t) => {
if let Some(IDLTypeKind::VecT(syntax_inner)) = syntax {
return pp_vec(env, t, Some(syntax_inner), is_ref);
}
pp_vec(env, t, None, is_ref)
}
(_, _) => pp_ty(env, ty, is_ref),
_ => pp_ty(env, ty, is_ref),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the typing.rs file, I would avoid refactoring code that is not in scope with the PR's goal

@wiyota
Copy link
Author

wiyota commented Nov 13, 2025

Thank you for your review.

I plan to revert the refactoring-related changes and consolidate them into a separate PR.

Also, I will change the spanned module to replace the conventional syntax module.

- Remove separate `spanned.rs` module in favor of inline type definitions
- Introduce `IDLTypeWithSpan` wrapper to track source spans alongside type kinds
- Revert `IDLType` to direct enum definition
- Move span information directly into data structures (TypeField, Binding, etc.)
- Update all bindings (Motoko, Rust, TypeScript) to access `typ.kind` for type information
- Simplify type parsing by eliminating conversion between spanned and spanless types
@wiyota
Copy link
Author

wiyota commented Nov 16, 2025

I've made changes to keep it as close to the master branch as possible. How does it look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants