Skip to content

Comments

PR for 51 switch to datatable throughout#54

Merged
zhizhongpu merged 55 commits intomainfrom
51-switch-to-datatable-throughout
Aug 6, 2025
Merged

PR for 51 switch to datatable throughout#54
zhizhongpu merged 55 commits intomainfrom
51-switch-to-datatable-throughout

Conversation

@zhizhongpu
Copy link
Collaborator

closes #51

@zhizhongpu zhizhongpu self-assigned this Mar 8, 2025
@zhizhongpu zhizhongpu linked an issue Mar 8, 2025 that may be closed by this pull request
@zhizhongpu
Copy link
Collaborator Author

@santiagohermo just picking this up. There are a few more things to do from #51 (comment). When it's ready for your review I'll add you as a reveiwer.

@zhizhongpu
Copy link
Collaborator Author

zhizhongpu commented Mar 9, 2025

@santiagohermo Just an update on the work here:

By 2511786 I think we've reached a milestone:

  • Now I believe data.table is consistently used inside the package.
    • I already see the speed gain when I ran tests in this branch vs in the master branch.
  • Previously, if input is data.table, then the eventstudy estimation modifies the original dataset. Now the original data is not modified:
    • The solution is that, upon the input data's entry, we take a shallow copy, such that any modifications of this data are not applied to the input data object (as you said below):

Making an internal shallow copy sounds great to me.

    • I built a test (e.g. c588170) to make sure that this problem is spotted in future versions of the package.
      • I believe that this addressed your comment, with the caveat that I did not build the test for the case where input data is not a data.table object. My motivation is that the issue did and should not apply to other objects since they don't support modification by reference.
  • We can also have a test where we input different types of dataset objects into EventStudy and check that the original dataset is modified (or not) as intended
    • I have not yet built the option you described below, on the grounds that I expect the use case to be rare. But I'm on the fence and it's certainly something we could do. So I am deferring to you.

Edit: there has been updates i.e. 15ef116 to the two threads above.

We can also have an option copy_data or something, which defaults to TRUE, so that we allow users to avoid the copying altogether if they are fine with the potential consequences.

I think now it might be a good time for you to review the PR so I'm adding you as a reviewer. Note that all tests have passed on my local.

@zhizhongpu zhizhongpu requested a review from santiagohermo March 9, 2025 00:11
@zhizhongpu
Copy link
Collaborator Author

NTS: build test for avoid_internal_copy

@zhizhongpu
Copy link
Collaborator Author

@santiagohermo forgot to mention that yesterday a few lines in a test were messing up with my switch from df to dt grammar. I dropped those lines in 936cc0b. I think the problem is that the expect_equal was expecting something that's mechanically true, because the number of rows where a column is NA + the number of rows where the column is not NA must equal the total number of rows. I'm flagging because I sensed that the intention of the test might not have been to assert 1+1==2, so maybe you should take a look there.


@santiagohermo I replied to your comments.

@santiagohermo
Copy link
Collaborator

Thanks for the latest updates @zhizhongpu! I ran the devtools::check() on my end and things work fine.

In the commits above:

  • I updated from main. Importantly, this increased the minor version number
  • I made a small change to phrasing of a warning message

Regarding the test you mention here. I agree that the lines you removed were going to mechanically result in TRUE. But what remains of the test is even more vacuous now, since we are comparing to integers that are defined to be the same. Could you re-write the test to make it more meaningful?

We can also increase the minor number in DESCRIPTION by 1.

Other than the point above, I think we are ready to close this one. Almost there!

@zhizhongpu
Copy link
Collaborator Author

zhizhongpu commented Aug 4, 2025

@santiagohermo thanks for the commits!

Regarding the test you mention here. I agree that the lines you removed were going to mechanically result in TRUE. But what remains of the test is even more vacuous now, since we are comparing to integers that are defined to be the same. Could you re-write the test to make it more meaningful?

My recommendation now is to remove this test altogether. I have 3 reasons:

  1. It's a bit difficult to fully recover the original intention when the test was built in Prepare EventStudyFHS() #11 and updated in Revise functions PrepareLeads, PrepareLags, and GetFirstDifferences #31. In 02384ef I built my best interpretation of the intention into the test.
    • From your comment above I got the impression that you might not be familiar with it either (correct me if I'm wrong).
    • If we think it's worth it to investigate the original intention, I suggest that we open a new issue for that task so that the effort does not block this PR.
  2. It's been years since the vacuous test has been running and we've seen no sign elsewhere that indicates there's a problem
  3. The test does not test on a particular outcome (i.e. a function produces X, which matches our expected result, X'). Instead it seems to test whether a particular approach is adopted. This seems slightly trivial to me as we have very good knowledge (under R/*.Rabout what approach is adopted.

@santiagohermo
Copy link
Collaborator

You are correct, I do not recall when this test was introduced nor its original purpose. So thanks for the exploration @zhizhongpu!

After some closer review, your suggestion makes sense to me. Let's remove the test.

I think we are ready to squash and merge, so I'll approve. Before that we should:

  1. remove the test
  2. increase minor number in DESCRIPTION (from 1.1.4 to 1.1.5)
  3. merge, add usual summary comment in Switch to data.table throughout #51

Let me know if you'd like me to handle any of these steps.

I don't think we need to submit to CRAN just yet, we can do that after #42. We can also create a new release then.

@zhizhongpu zhizhongpu merged commit 73026b7 into main Aug 6, 2025
5 checks passed
@zhizhongpu zhizhongpu deleted the 51-switch-to-datatable-throughout branch August 6, 2025 13:00
@zhizhongpu
Copy link
Collaborator Author

#51 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Switch to data.table throughout

2 participants