Skip to content

Conversation

@Shekharrajak
Copy link
Contributor

@Shekharrajak Shekharrajak commented Jan 17, 2026

Which issue does this PR close?

Closes #3187

Rationale for this change

The aes_encrypt function was fully implemented.

CometScalarFunction serialized expressions to protobuf without including the return type. This caused the native planner to attempt looking up the function in DataFusion's built-in UDF registry, which failed with "There is no UDF named 'aes_encrypt' in the registry."

What changes are included in this PR?

  1. Fix scalar function serialization (CometScalarFunction.scala):
    - Changed to use scalarFunctionExprToProtoWithReturnType() instead of scalarFunctionExprToProto()
    - Now passes expr.dataType to avoid native registry lookup
    - Enables native execution for all Comet-specific scalar functions
  2. Add comprehensive test suite (CometStaticInvokeSuite.scala)

How are these changes tested?

  1. Unit Tests: CometStaticInvokeSuite validates:
    • Native execution via physical plan inspection (assert(plan.contains("CometProject")))
  2. Benchmark Results: CometEncryptionBenchmark shows native execution with performance improvements:

@Shekharrajak
Copy link
Contributor Author

Findings with first version , where there was no improvements in benchmarks :

  1. aes_encrypt is a RuntimeReplaceable expression in Spark
  2. Spark converts it to StaticInvoke calling Java's native crypto libraries before Comet sees it
  3. Even with our StaticInvoke registration, Comet's physical planner falls back to Spark for the entire projection operator

@Shekharrajak
Copy link
Contributor Author

Shekharrajak commented Jan 18, 2026

Got improvements :


  SQL:
  SELECT hex(aes_encrypt(
    cast(data as binary),
    cast(key as binary),
    'CBC'
  ))
  FROM t1;

  Benchmark Results:
  aes_encrypt_cbc:                          Best Time(ms)   Relative
  ------------------------------------------------------------------
  Spark                                               260        1.0X
  Comet (Scan)                                        260        1.0X
  Comet (Scan + Exec)                                 174        1.5X 

override def convert(expr: T, inputs: Seq[Attribute], binding: Boolean): Option[Expr] = {
val childExpr = expr.children.map(exprToProtoInternal(_, inputs, binding))
val optExpr = scalarFunctionExprToProto(name, childExpr: _*)
// Pass return type to avoid native lookup in DataFusion registry
Copy link
Contributor Author

@Shekharrajak Shekharrajak Jan 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When expr.return_type is None, it tries to look up the function in DataFusion's built-in UDF registry , which doesn't have aes_encrypt because it's a Comet-specific function registered later via create_comet_physical_fun

@Shekharrajak
Copy link
Contributor Author

Benchmark result :

➜  datafusion-comet git:(feature/aes-encrypt-support) ✗ cat spark/benchmarks/CometEncryptionBenchmark-jdk17-results.txt 
================================================================================================
Encryption expressions
================================================================================================

================================================================================================
aes_encrypt_gcm_basic
================================================================================================

OpenJDK 64-Bit Server VM 17.0.13+11 on Mac OS X 26.2
Apple M4 Max
aes_encrypt_gcm_basic:                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                               274            279           3          0.4        2740.6       1.0X
Comet (Scan)                                        270            275           7          0.4        2703.9       1.0X
Comet (Scan + Exec)                                 230            238           8          0.4        2300.4       1.2X


================================================================================================
aes_encrypt_gcm_with_mode
================================================================================================

OpenJDK 64-Bit Server VM 17.0.13+11 on Mac OS X 26.2
Apple M4 Max
aes_encrypt_gcm_with_mode:                Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                               266            276          11          0.4        2661.5       1.0X
Comet (Scan)                                        264            269           4          0.4        2638.5       1.0X
Comet (Scan + Exec)                                 228            234           9          0.4        2279.5       1.2X


================================================================================================
aes_encrypt_cbc
================================================================================================

OpenJDK 64-Bit Server VM 17.0.13+11 on Mac OS X 26.2
Apple M4 Max
aes_encrypt_cbc:                          Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                               260            263           2          0.4        2598.5       1.0X
Comet (Scan)                                        260            266           6          0.4        2595.3       1.0X
Comet (Scan + Exec)                                 174            176           1          0.6        1736.1       1.5X


================================================================================================
aes_encrypt_ecb
================================================================================================

OpenJDK 64-Bit Server VM 17.0.13+11 on Mac OS X 26.2
Apple M4 Max
aes_encrypt_ecb:                          Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                               221            224           6          0.5        2206.5       1.0X
Comet (Scan)                                        223            228           4          0.4        2227.1       1.0X
Comet (Scan + Exec)                                 154            160           6          0.7        1537.5       1.4X


================================================================================================
aes_encrypt_gcm_with_iv
================================================================================================

OpenJDK 64-Bit Server VM 17.0.13+11 on Mac OS X 26.2
Apple M4 Max
aes_encrypt_gcm_with_iv:                  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                               236            240           4          0.4        2360.4       1.0X
Comet (Scan)                                        240            243           5          0.4        2395.7       1.0X
Comet (Scan + Exec)                                 226            236          20          0.4        2264.4       1.0X


================================================================================================
aes_encrypt_gcm_with_aad
================================================================================================

OpenJDK 64-Bit Server VM 17.0.13+11 on Mac OS X 26.2
Apple M4 Max
aes_encrypt_gcm_with_aad:                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                               246            254          11          0.4        2460.1       1.0X
Comet (Scan)                                        247            252           4          0.4        2474.7       1.0X
Comet (Scan + Exec)                                 230            235           7          0.4        2300.4       1.1X


================================================================================================
aes_encrypt_with_base64
================================================================================================

OpenJDK 64-Bit Server VM 17.0.13+11 on Mac OS X 26.2
Apple M4 Max
aes_encrypt_with_base64:                  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                               251            255           2          0.4        2513.7       1.0X
Comet (Scan)                                        254            263           9          0.4        2543.9       1.0X
Comet (Scan + Exec)                                 259            268          10          0.4        2587.7       1.0X


================================================================================================
aes_encrypt_long_data
================================================================================================

OpenJDK 64-Bit Server VM 17.0.13+11 on Mac OS X 26.2
Apple M4 Max
aes_encrypt_long_data:                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                              2264           2327          89          0.0       22641.7       1.0X
Comet (Scan)                                       2283           2394         156          0.0       22833.5       1.0X
Comet (Scan + Exec)                                2635           2711         108          0.0       26346.5       0.9X

@Shekharrajak Shekharrajak changed the title feat: [DRAFT] aes encrypt support feat: aes encrypt support Jan 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Support Spark expression: aes_encrypt

2 participants