Conversation
|
Thanks @Erasdna for the PR! I fixed the merge conflicts and made some changes. Would be great if you could have a look an see if it make sense? I am also not sure about the changes in the @cdaversin could you take a look? |
|
Thanks! Part of my thinking was that it could be useful to be able to specify which statistics you want to compute, and then store them as a pivot table, i.e, one value column and multiple grouping columns (which I think is often an advantage for later processing). Maybe the two generate dataframe functions should operate in a more similar way? Computing an unnecessary number of statistics can sometimes take too much time imo (but the specific implementation of this is maybe not optimal as it stands) I also think it would be useful to decide on whether functions should take a Path or an MRIdata/Segmentation object as input (this applies several places). I would like to add some more functionality to modify segmentations etc, which would make passing the class object a bit more practical, I think. I am also curious about the regex stuff @cdaversin, to me it seems a little bit convoluted sometimes. I added a way to directly read patient info assuming that files follow the bids convention (which is sota I believe). Would maybe be nice to stick to that? We should probably also make some way to generate filenames according to bids? |
|
Thanks @Erasdna ! Looks good to me :) I see the point of the pivot table and I agree that it is useful, but the way you create your pivot table (i.e. what is the information you are looking for) can vary from one user to another. What about generating all the statistics (more or less as we do now), and export it as a I agree that the input type (input file, MRIData, np.array) should be more consistent through the different functions in the code. This is something we should do. Regarding the regex, I agree that it might look a bit complicated. But I don't think we can assume that all the data we are (and will be) working with follows the BIDS convention. So we might need to be a bit more flexible regarding the input files / data. But I agree that for the processed files that we generate, we should be following the BIDS convention as much as possible. |
|
One issue now is that there are two separate functions that does similar things (e.g generate_stats_dataframe and generate_stats_dataframes_rois). In the original PR the latter just replaced the former, so I was not sure what to do here. I know @cdaversin implemented the first one, but would be nice to somehow unify these, or perhaps implement some common helper functions that can be used in both. I agree with @Erasdna that we should probably not compute more stats than the user need. Perhaps it is possible to make it default to compute a minimum number of stats, and add some flags if you want to compute more stats? For For the regex stuff, I don't really know why we need this. Is it just to use the correct method in |
My point was a little bit simpler I think. I wanted to store the data in a long format (one column with statistic name and one value column) rather than in a wide format (with one column for each statistic, as was done intiially). The long is nice because it is easier to reformat with a Imo regex is maybe most useful for checking that information is consistent (i.e segmentation and data is from the same patient). I wonder if this is something that should be handled independent of computing the statistics themselves. My thinking was that the To merge the two functions I don't think it should be easy to simply provide an option in |
Add Segmentation class with FreeSurfer labels, make io go through MRIData, rework of stats dataframe function.
Tests are passing.