Skip to content

New pattern matching compiler#322

Open
rodrigogribeiro wants to merge 11 commits intomainfrom
new-matching-compiler
Open

New pattern matching compiler#322
rodrigogribeiro wants to merge 11 commits intomainfrom
new-matching-compiler

Conversation

@rodrigogribeiro
Copy link
Collaborator

This PR implements the pattern matching compiler proposed by the following papers:

Now, incomplete patterns are considered a compiler error and redundant patterns generate a warning.

@mbenke
Copy link
Collaborator

mbenke commented Mar 5, 2026

This seems to fail for

data Bool = False | True;

  function second(x, y) {
    match x, y {
    | True, z => return z;
    | False, z => return z;
    }
  }

contract Second {
  function main() -> word {
    second(True, 42)
  }
}

we get

> Match compilation result:
data SecondCxt = SecondCxt  ;
data Bool = False  | True  ;
forall $4 . function second (x<Bool> : Bool, y<$4> : $4) -> $4 {
   match (x<Bool>) {
   | True<Bool> =>
      return z<$4>;
   | False<Bool> =>
      return z<$4>;
   }
}
contract Second {
   function main () -> word {
      return second<Bool -> word -> word>(True<Bool>, 42);
   }
}

with unbound variable z :(

Apparently, when compileMatrix reaches a row where all patterns are variables (line 125-135), it returns Leaf action directly. The treeToStmt conversion then emits the action body with no enclosing Match - the variable bindings from the pattern are silently dropped.

@rodrigogribeiro
Copy link
Collaborator Author

This seems to fail for

data Bool = False | True;

  function second(x, y) {
    match x, y {
    | True, z => return z;
    | False, z => return z;
    }
  }

contract Second {
  function main() -> word {
    second(True, 42)
  }
}

we get

> Match compilation result:
data SecondCxt = SecondCxt  ;
data Bool = False  | True  ;
forall $4 . function second (x<Bool> : Bool, y<$4> : $4) -> $4 {
   match (x<Bool>) {
   | True<Bool> =>
      return z<$4>;
   | False<Bool> =>
      return z<$4>;
   }
}
contract Second {
   function main () -> word {
      return second<Bool -> word -> word>(True<Bool>, 42);
   }
}

with unbound variable z :(

Apparently, when compileMatrix reaches a row where all patterns are variables (line 125-135), it returns Leaf action directly. The treeToStmt conversion then emits the action body with no enclosing Match - the variable bindings from the pattern are silently dropped.

Thanks for spotting this problem! I have fixed it and added the example to the test suite.

@rodrigogribeiro rodrigogribeiro marked this pull request as ready for review March 6, 2026 11:09
@rodrigogribeiro rodrigogribeiro requested review from Y-Nak and mbenke March 6, 2026 11:10
@rodrigogribeiro rodrigogribeiro force-pushed the new-matching-compiler branch from 4f2536b to 61382a4 Compare March 6, 2026 11:17
Copy link
Collaborator

@mbenke mbenke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is only partially fixed, consider

data Bool = False | True;                                                 

contract CatchAll {
    function catchAll(x, y) -> Bool{                                                 
      match x, y {                                                            
      | True, True => return True;                                               
      | z, w      => return z;                                                
      }                                                                       
    }

  function main() -> Bool {
    catchAll(True, False)
  }
}

we get

$ esolc catchAll.solc --dump-ds -s -g
> Match compilation result:
data CatchAllCxt = CatchAllCxt  ;
data Bool = False  | True  ;
contract CatchAll {
   function catchAll (x<Bool> : Bool, y<Bool> : Bool) -> Bool {
      match (x<Bool>) {
      | True<Bool> =>
         match (y<Bool>) {
         | True<Bool> =>
            return True<Bool>;
         | v1<Bool> =>
            return z<Bool>;
         }
      | v2<Bool> =>
         return z<Bool>;
      }
   }
   function main () -> Bool {
      return catchAll<Bool -> Bool -> Bool>(True<Bool>, False<Bool>);
   }
}

where z is undefined again.

The problem arises when specialize expands a PVar in the selected column
(the constructor column) and a different row's action references the variable
from a different column that was already consumed.
Consider a case where the default branch is taken:

data Bool = False | True;

function catchAll(x, y) {
  match x, y {
  | True, True => return True;
  | z, w      => return z;
  }
}

Column 0: [PCon True, PVar z] — necessity 1. Column 1: [PCon True, PVar w]
— necessity 1. Tie at depth 1, pick column 0.

Head constructors: [True]. buildConSwitch on occ [0]:

Branch True: specialize produces [[PCon True []], [PVar w]] (row 0
matches True directly; row 1's PVar z expands to fresh patterns — 0 fields,
so just [PCon True []] from the remaining column and [PVar w]). Wait, let
me be precise:

  • Row 0: [PCon True [], PCon True []] → first col matches True →
    [] ++ [PCon True []] = [PCon True []]
  • Row 1: [PVar z, PVar w] → first col is PVar → fresh pats (0 for True) →
    [] ++ [PVar w] = [PVar w]

specMat = [[PCon True []], [PVar w]]. specActs = [return True, return z].
Recurse on column 0 (the only column), occ [1]. headCons = [True].

Inner branch True: specialize → [[], []], acts = [return True, return z].
Recurse with empty cols: firstRow [] is all-vars. Leaf [] [return True].
Fine.

Inner default (True incomplete): defaultMatrix [[PCon True []], [PVar w]] =
[[]] (from the PVar w row). defaultActions = [return z].
Recurse: compileMatrix [] [] [[]] [return z]. firstRow [], all-vars.
Leaf [] [return z].

z is unbound. The varBinds is [] (empty row zipped with empty occs).
The action return z goes through with no substitution. But z should be
bound to x — it was the pattern variable in column 0, row 1.

The binding was lost at buildConSwitch's default path (line 202): defaultMatrix
strips the PVar z from the first column and defaultActions keeps [return z],
but no one records that z was bound to occurrence [0].

Similarly via specialize: In the True branch, row 1 had PVar z in column
0. specialize replaced it with fresh patterns (discarding z), but the
action return z survives with z still referencing the now-lost variable.

Root cause

The Maranget algorithm tracks pattern variables through a substitution
environment
that maps pattern variables to occurrence paths. This environment
must be extended every time a column is consumed — whether by a constructor
switch, a literal switch, or the all-wildcards case. The current fix only
extends this environment at the Leaf node, which only captures variables from
columns that happen to still be present when the leaf is reached.

Suggested approach

Instead of carrying bindings only in Leaf, accumulate them throughout
compilation. One way: extend compileMatrix to also return a list of
bindings, so that each recursive call can propagate bindings from consumed
columns upward. Alternatively, perform the substitution eagerly: when
specialize or defaultMatrix consumes a column, immediately substitute the
PVar's variable in the corresponding action with the scrutinee expression for
that occurrence. This could be done by augmenting specializedActions and
defaultActions to also rewrite the actions.

@mbenke
Copy link
Collaborator

mbenke commented Mar 9, 2026

Other issues

mismatch in inhabitsConCol (warnings)

At line 383, the missing-constructor case passes the wrong types to
inhabitsM:

mw <- inhabitsM (defaultMatrix matrix) (fieldTys ++ restTys)

defaultMatrix matrix has length restTys columns, but fieldTys ++ restTys
has length fieldTys + length restTys elements. The first length fieldTys
types are matched against columns that actually correspond to restTys,
misaligning the inhabitance check.

Contrast with searchBranch at line 397 which correctly passes
inhabitsM specMat (fieldTys ++ restTys) — the specialized matrix there has
the right number of columns.

Suggested fix

-- Line 383, change:
mw <- inhabitsM (defaultMatrix matrix) (fieldTys ++ restTys)
-- to:
mw <- inhabitsM (defaultMatrix matrix) restTys

-- And adjust the witness construction (lines 386-388):
Just rest -> do
  sub <- mapM freshPat fieldTys
  pure . Just $ PCon k sub : rest

This bug primarily affects warning quality (wrong witnesses, possible false
redundancy results) rather than code generation.

Minor issues

  • scrutineeType is partial (line 655). The catch-all error will crash
    on expression forms like FieldAccess, Indexed, or Call (Just _) that
    can appear after type inference.

  • Redundancy reporting seems to be inverted for catch-all-first patterns. When the
    first row is all-variables and the remaining rows are exhaustive, the compiler
    warns that the first row is redundant (line 133). Formally correct per
    Maranget's definition, but confusing from the user's perspective — the first
    row is the one that actually executes; the later rows are dead code. Consider
    warning about unreachable clauses instead, as GHC does.

  • inrConInfo uses inlTy (line 546). Both inlConInfo and inrConInfo
    use inlTy at bt as their return type. Works correctly for sibling lookup
    but the naming is misleading; sumTy or eitherTy would be clearer.

Test coverage gaps

The 10 unit tests cover decision tree structure well but have blind spots:

  • No test runs treeToStmt and checks that pattern variables are correctly
    substituted in the output statements. The existing tests only inspect the
    DecisionTree shape.
  • No multi-column test mixes constructor and variable patterns in a way that
    exercises variable binding through specialize or defaultMatrix (test 7
    uses constructors in both columns).
  • No test exercises the inhabitsConCol missing-constructor path with
    non-empty restTys.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants