Skip to content

feat(extraction): add ArkTS language support#656

Open
DXX5678 wants to merge 5 commits into
colbymchenry:mainfrom
DXX5678:feat/add-arkts-support
Open

feat(extraction): add ArkTS language support#656
DXX5678 wants to merge 5 commits into
colbymchenry:mainfrom
DXX5678:feat/add-arkts-support

Conversation

@DXX5678
Copy link
Copy Markdown

@DXX5678 DXX5678 commented Jun 3, 2026

feat(extraction): add ArkTS language support

Summary

Adds tree-sitter-based extraction support for ArkTS — the TypeScript superset used in Huawei's HarmonyOS / ArkUI application development. ArkTS files use the .ets extension and introduce the struct keyword for component definitions (@Component struct X { ... }), along with ArkUI-specific decorators (@State, @Prop, @Link, @Builder, etc.).

Changes

File Change
src/types.ts Register 'arkts' in LANGUAGES
src/extraction/grammars.ts Add .ets'arkts' extension mapping, WASM grammar registration, local-wasm loading path, and display name
src/extraction/languages/arkts.ts New — ArkTS LanguageExtractor (extends TypeScript extractor, adds structTypes: ['struct_declaration'])
src/extraction/languages/index.ts Import + register arktsExtractor
src/extraction/wasm/tree-sitter-arkts.wasm New — tree-sitter-arkts grammar (ABI 15, sourced from tree-sitter-arkts-open)

No changes were needed to the core extractor (tree-sitter.ts, parse-worker.ts, tree-sitter-types.ts). The existing LanguageExtractor interface and structTypes dispatch in visitNode handle ArkTS out of the box. ArkTS-specific decorators (@Component, @State, etc.) are already captured by the shared extractDecoratorsFor path.

Architecture

ArkTS is treated as a distinct language (not a TypeScript variant) because its grammar produces unique AST node types (struct_declaration). The extractor extends the TypeScript extractor via object spread and overrides only structTypes:

export const arktsExtractor: LanguageExtractor = {
  ...typescriptExtractor,
  structTypes: ['struct_declaration'],
};

The dispatch chain in TreeSitterExtractor.visitNode flows:

struct_declaration
  → structTypes.includes(nodeType)      ✅ matches
  → extractStruct(node)                 ✅ produces Node + containment edges
  → extractDecoratorsFor(node, nodeId)  ✅ captures @Component, @State, etc.
  → visitNode children                  ✅ extracts methods (build(), aboutToAppear(), etc.)

Verification

Tested against a real HarmonyOS application (233 files, 228 .ets):

Files:  233 (arkts: 228, typescript: 2, xml: 3)
Nodes:  6,085 (121 struct, 4,106 method, 454 interface, 842 import, …)
Edges:  11,902

All .ets files parse without errors. struct nodes (e.g. SpecialDetailScreen, MainPage) are correctly classified with their build() and lifecycle methods, decorators, and containment edges.

@DXX5678
Copy link
Copy Markdown
Author

DXX5678 commented Jun 3, 2026

Resolves #648

@clipsheep6
Copy link
Copy Markdown

ARKUI Framework-aware Routes optimize is also needed I think, tree-sitter is the first step for code search in arkts repos, since closure is not easy for agents working with arkui

@DXX5678
Copy link
Copy Markdown
Author

DXX5678 commented Jun 5, 2026

I agree – tree-sitter is just the first step for ArkTS repos. In ArkUI, most of the “logic” lives in @State/@prop fields and @builder closures, and UI event handlers (onClick, onChange, etc.) form chains that are really hard for agents to follow if we only have plain call edges.

And I plan to add ArkUI Framework-aware Routes and state/event chains, roughly:
Add arkui_page / arkui_route nodes for @Page/@component structs and router_map / main_pages.json entries;
Add arkui_state_dep and arkui_event_chain edges to connect @State@builder → UI event handlers;
Expose these in codegraph_search / codegraph_explore so agents can trace “page → builder → event → state” in one go.

DXX5678 added 2 commits June 5, 2026 18:28
  Add full ArkUI (HarmonyOS declarative UI) support across the detection,
  routing, resolution, and synthesis layers.

  ## What's new

  ### Framework resolver (`src/resolution/frameworks/arkui.ts`)
  - `detect()`: identify projects via `build-profile.json5` or `@Entry` in .ets files
  - `extract()`: 3-pass scanning for `@Entry` pages, router.pushUrl/replaceUrl
    navigation refs, and `@Component` structs — with decorator param support
    (e.g. `@Entry({ routeName: 'main' })`)
  - `resolve()`: 3-tier match strategy (filePath → qualifiedName → name suffix)
    with path-prefix anchoring to prevent false matches
  - `postExtract()`: ingest `main_pages.json` for HarmonyOS 5.0+ route declarations

  ### New NodeKind: `arkui_page` (1 addition — no other kinds changed)
  - Added to `NODE_KINDS`, `HIGH_VALUE_NODE_KINDS`, and `kindBonus` (weight 9)
  - Pages are treated as aggregation units; components, functions, and properties
    reuse existing `component`, `function`, and `property` kinds

  ### Synthesis edges (3 phases, 0 new EdgeKinds)
  All use `kind: 'calls'` + `provenance: 'heuristic'` + `metadata.synthesizedBy`:

  | Phase | synthesizedBy | Source → Target | Purpose |
  |-------|---------------|-----------------|---------|
  | A | `arkui-state-chain` | sibling method → `build()` | State change → re-render bridge |
  | B | `arkui-state-dep` | method → `@State` property | Marks which methods read reactive state
  |
  | C | `arkui-event-chain` | `build()` → handler method | `.onClick(this.handler)` → handler
  wiring |

  All edges are transparent to standard traversal tools (callers, callees, explore).

  ### Annotation output
  - `codegraph_node` trail: `[dynamic: Arkui state chain via handleOK @file:42]`
  - `codegraph_explore` flow spine and dynamic-dispatch links: labeled as
    `Arkui Click → handleClick`, `Arkui reads @State count`, etc.

  ## Key fixes included
  - Query both `class` and `struct` kinds in synthesis phases (ArkUI structs are
    tree-sitter kind `struct`, not `class`)
  - Phase C pre-filter covers all 11 event types (including drag events)
  - Decorator regex supports params: `@Entry({ routeName: 'main' })`
  - Removed dead qualifiedName dedup path in `postExtract`

  ## Tests
  - `__tests__/arkui-framework.test.ts`: 29 cases covering extract, postExtract,
    resolve, and all 3 synthesis phases
  - Full regression: 1159 pass / 0 fail / 60 test files
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants