Skip to content

Conversation

@sisyphuswastaken
Copy link

Pertaining to issue #7354

I noticed that cube() can lead to errors when .SDcols is supplied as an evaluative or callable expression (for example - helpers like patterns() or measure()), since it may be forwarded unevaluated and used before being resolved.
To address this, I updated cube() to handle .SDcols using non-standard evaluation in the following way:
the argument is captured with substitute(), if it is callable then it is resolved via eval_with_cols() prior to being passed to groupingsets().
While testing this change, I encountered cases where applying numeric aggregations through j resulted in errors when non-numeric columns were present in .SD. To address this, j is captured as an expression to check if it references .SD, if it does then .SD is restricted to numeric values thereby avoiding unintended application of numeric aggregations to non-numeric data.

During development, I’m seeing two existing test failures (1378.2, 1378.3). I haven’t added new unit tests yet. I’m happy to add targeted tests if there are specific cases that need to be covered, and would appreciate guidance on whether additional test coverage is expected here. I’d also welcome any help on how best to approach the currently failing tests.

@codecov
Copy link

codecov bot commented Dec 28, 2025

Codecov Report

❌ Patch coverage is 90.90909% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 98.96%. Comparing base (291a711) to head (b6adef9).

Files with missing lines Patch % Lines
R/groupingsets.R 90.90% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7543      +/-   ##
==========================================
- Coverage   98.97%   98.96%   -0.01%     
==========================================
  Files          87       87              
  Lines       16733    16744      +11     
==========================================
+ Hits        16561    16571      +10     
- Misses        172      173       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jangorecki jangorecki marked this pull request as draft December 29, 2025 07:54
usesSD = any(all.vars(jj) == ".SD")
if (usesSD) {
if (missing(.SDcols)) {
.SDcols = names(x)[vapply(x, is.numeric, logical(1L))]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you know SDcols needs to be numeric, for example there are aggregate functions that works on character.

@jangorecki
Copy link
Member

I think the proper way to address this is to isolate SDcols logic from [ into helper and reuse this helper here.

@sisyphuswastaken
Copy link
Author

That’s a fair point. The numeric-only approach was a defensive measure against a specific aggregation failure I encountered. I agree that reimplementing this logic in cube() isn’t ideal and I’ll work towards isolating and reusing the existing .SDcols handling from [.data.table instead

@aitap
Copy link
Member

aitap commented Dec 29, 2025 via email

@sisyphuswastaken
Copy link
Author

Sure.
This is the sessionInfo-

> sessionInfo()
R version 4.5.1 (2025-06-13)
Platform: aarch64-apple-darwin20
Running under: macOS Sequoia 15.5

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Asia/Kolkata
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_4.5.1
> 

Errors:


Running the tests in ‘tests/main.R’ failed.
Last 13 lines of output:
  Observed: Taking input= as a system command because it contains a space ('file:///Users/username/Desktop/Code folder/data.table/data.table.Rcheck/data.table/tests/russellCRCRLF.csv'). If it's a filename please remove the space, or use file= explicitly. A variable is being passed to input= and when this is taken as a system command there is a security concern if you are creating an app, the app could have a malicious user, and the app is not running in a secure environment; e.g. the app is running as root. Please read item 5 in the NEWS file for v1.11.6 for more information and for the option to suppress this message.
  
  V1
  1
  V1
  0,1
  0,2
  a,b
  a
  Error in test.data.table() : 
    2 errors out of 12070. Search tests/tests.Rraw for test numbers 1378.2, 1378.3. Duration: 18.1s elapsed (17.5s cpu).
  Calls: test.data.table -> stopf -> raise_condition -> signal
  Execution halted
* checking for unstated dependencies in vignettes ... NOTE
'::' or ':::' import not declared from: ‘ns’
* checking package vignettes ... OK
* checking re-building of vignette outputs ... OK
* checking PDF version of manual ... WARNING
LaTeX errors when creating PDF version.
This typically indicates Rd problems.
* checking PDF version of manual without index ... ERROR
Re-running with no redirection of stdout/stderr.
* DONE
Status: 2 ERRORs, 1 WARNING, 2 NOTEs


@MichaelChirico
Copy link
Member

Run test.data.table() interactively to see the full output, otherwise run _R_CHECK_TESTS_NLINES_=0 R CMD check to show all output lines of the failed tests.

@sisyphuswastaken
Copy link
Author

Sure, here's the output:

username@192 data.table % _R_CHECK_TESTS_NLINES_=0 R CMD check .
* using log directory ‘/Users/username/Desktop/Code-folder/data.table/..Rcheck’
* using R version 4.5.1 (2025-06-13)
* using platform: aarch64-apple-darwin20
* R was compiled by
    Apple clang version 16.0.0 (clang-1600.0.26.6)
    GNU Fortran (GCC) 14.2.0
* running under: macOS Sequoia 15.5
* using session charset: UTF-8
* checking for file ‘./DESCRIPTION’ ... ERROR
Required fields missing or empty:
  ‘Author’ ‘Maintainer’
* DONE

Status: 1 ERROR
See
  ‘/Users/username/Desktop/Code-folder/data.table/..Rcheck/00check.log’
for details.

username@192 data.table % 

@MichaelChirico
Copy link
Member

Sorry, I should have been more complete, R CMD check . won't work because Authors@R format isn't accepted. You'd need to check the built binary: R CMD check data.table_1.18.99.tar.gz.

@sisyphuswastaken
Copy link
Author

Oh my bad, I'll share that too. I'm new to contributing so I'm still learning the ropes. I appreciate the help! :)

username@192 data.table % R CMD check data.table_1.18.99.tar.gz                    
* using log directory ‘/Users/username/Desktop/Code folder/data.table/data.table.Rcheck’
* using R version 4.5.1 (2025-06-13)
* using platform: aarch64-apple-darwin20
* R was compiled by
    Apple clang version 16.0.0 (clang-1600.0.26.6)
    GNU Fortran (GCC) 14.2.0
* running under: macOS Sequoia 15.5
* using session charset: UTF-8
* checking for file ‘data.table/DESCRIPTION’ ... OK
* this is package ‘data.table’ version ‘1.18.99’
* package encoding: UTF-8
* checking package namespace information ... OK
* checking package dependencies ... OK
* checking if this is a source package ... OK
* checking if there is a namespace ... OK
* checking for executable files ... OK
* checking for hidden files and directories ... OK
* checking for portable file names ... OK
* checking for sufficient/correct file permissions ... OK
* checking whether package ‘data.table’ can be installed ... OK
* used C compiler: ‘Apple clang version 17.0.0 (clang-1700.0.13.5)’
* used SDK: ‘MacOSX15.5.sdk’
* checking installed package size ... INFO
  installed size is  6.8Mb
  sub-directories of 1Mb or more:
    doc     1.1Mb
    po      1.6Mb
    tests   2.1Mb
* checking package directory ... OK
* checking ‘build’ directory ... OK
* checking DESCRIPTION meta-information ... OK
* checking top-level files ... OK
* checking for left-over files ... OK
* checking index information ... OK
* checking package subdirectories ... OK
* checking code files for non-ASCII characters ... OK
* checking R files for syntax errors ... OK
* checking whether the package can be loaded ... OK
* checking whether the package can be loaded with stated dependencies ... OK
* checking whether the package can be unloaded cleanly ... OK
* checking whether the namespace can be loaded with stated dependencies ... OK
* checking whether the namespace can be unloaded cleanly ... OK
* checking whether startup messages can be suppressed ... OK
* checking dependencies in R code ... OK
* checking S3 generic/method consistency ... OK
* checking replacement functions ... OK
* checking foreign function calls ... OK
* checking R code for possible problems ... OK
* checking Rd files ... OK
* checking Rd metadata ... OK
* checking Rd cross-references ... OK
* checking for missing documentation entries ... OK
* checking for code/documentation mismatches ... OK
* checking Rd \usage sections ... OK
* checking Rd contents ... OK
* checking for unstated dependencies in examples ... OK
* checking line endings in shell scripts ... OK
* checking line endings in C/C++/Fortran sources/headers ... OK
* checking line endings in Makefiles ... OK
* checking compilation flags in Makevars ... OK
* checking for GNU extensions in Makefiles ... OK
* checking for portable use of $(BLAS_LIBS) and $(LAPACK_LIBS) ... OK
* checking use of PKG_*FLAGS in Makefiles ... OK
* checking compiled code ... NOTE
File ‘data.table/libs/data_table.so’:
  Found non-API calls to R: ‘IS_GROWABLE’, ‘LEVELS’, ‘SETLENGTH’,
    ‘SET_GROWABLE_BIT’, ‘SET_TRUELENGTH’, ‘TRUELENGTH’, ‘XTRUELENGTH’

Compiled code should not call non-API entry points in R.

See ‘Writing portable packages’ in the ‘Writing R Extensions’ manual,
and section ‘Moving into C API compliance’ for issues with the use of
non-API entry points.
* checking installed files from ‘inst/doc’ ... OK
* checking files in ‘vignettes’ ... OK
* checking examples ... OK
* checking for unstated dependencies in ‘tests’ ... OK
* checking tests ...
  Running ‘S4.R’
  Running ‘autoprint.R’
  Comparing ‘autoprint.Rout’ to ‘autoprint.Rout.save’ ... OK
  Running ‘froll.R’
  Running ‘litedown.R’
  Running ‘main.R’
 ERROR
Running the tests in ‘tests/main.R’ failed.
Last 13 lines of output:
  Observed: Taking input= as a system command because it contains a space ('file:///Users/username/Desktop/Code folder/data.table/data.table.Rcheck/data.table/tests/russellCRCRLF.csv'). If it's a filename please remove the space, or use file= explicitly. A variable is being passed to input= and when this is taken as a system command there is a security concern if you are creating an app, the app could have a malicious user, and the app is not running in a secure environment; e.g. the app is running as root. Please read item 5 in the NEWS file for v1.11.6 for more information and for the option to suppress this message.
  
  V1
  1
  V1
  0,1
  0,2
  a,b
  a
  
  Tue Dec 30 10:14:05 2025  endian==little, sizeof(long double)==8, capabilities('long.double')==FALSE, longdouble.digits==NULL, sizeof(pointer)==8, TZ==unset, Sys.timezone()=='Asia/Kolkata', Sys.getlocale()=='C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8', l10n_info()=='MBCS=TRUE; UTF-8=TRUE; Latin-1=FALSE; codeset=UTF-8', getDTthreads()=='This installation of data.table has not been compiled with OpenMP support.; omp_get_num_procs()==1; R_DATATABLE_NUM_PROCS_PERCENT==unset (default 50); R_DATATABLE_NUM_THREADS==unset; R_DATATABLE_THROTTLE==unset (default 1024); omp_get_thread_limit()==1; omp_get_max_threads()==1; OMP_THREAD_LIMIT==2; OMP_NUM_THREADS==unset; RestoreAfterFork==true; data.table is using 1 threads with throttle==1024. See ?setDTthreads.', .libPaths()=='/Users/username/Desktop/Code folder/data.table/data.table.Rcheck','/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library', zlibVersion()==1.2.12 ZLIB_VERSION==1.2.12
  Error in test.data.table() : 
    2 errors out of 12070. Search tests/tests.Rraw for test numbers 1378.2, 1378.3. Duration: 18.1s elapsed (17.5s cpu).
  Calls: test.data.table -> stopf -> raise_condition -> signal
  Execution halted
* checking for unstated dependencies in vignettes ... NOTE
'::' or ':::' import not declared from: ‘ns’
* checking package vignettes ... OK
* checking re-building of vignette outputs ... OK
* checking PDF version of manual ... WARNING
LaTeX errors when creating PDF version.
This typically indicates Rd problems.
* checking PDF version of manual without index ... ERROR
Re-running with no redirection of stdout/stderr.
Converting parsed Rd's to LaTeX .......
Creating pdf output from LaTeX ...
Error in texi2dvi(file = file, pdf = TRUE, clean = clean, quiet = quiet,  : 
  pdflatex is not available
Error in texi2dvi(file = file, pdf = TRUE, clean = clean, quiet = quiet,  : 
  pdflatex is not available
Error in running tools::texi2pdf()
You may want to clean up by 'rm -Rf /var/folders/lw/71h__bqx3s31msh5nnmvp68w0000gn/T//RtmpHHUKjF/Rd2pdf183eb5c89dc54'
* DONE

Status: 2 ERRORs, 1 WARNING, 2 NOTEs
See
  ‘/Users/username/Desktop/Code folder/data.table/data.table.Rcheck/00check.log’
for details.

username@192 data.table % 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants