Skip to content

Segmentations #31

Open
Erasdna wants to merge 7 commits intoscientificcomputing:mainfrom
Erasdna:main
Open

Segmentations #31
Erasdna wants to merge 7 commits intoscientificcomputing:mainfrom
Erasdna:main

Conversation

@Erasdna
Copy link

@Erasdna Erasdna commented Mar 13, 2026

Add Segmentation class with FreeSurfer labels, make io go through MRIData, rework of stats dataframe function.

Tests are passing.

@Erasdna Erasdna requested a review from finsberg March 13, 2026 12:33
@Erasdna Erasdna closed this Mar 13, 2026
@Erasdna Erasdna reopened this Mar 13, 2026
@finsberg finsberg requested a review from cdaversin March 16, 2026 00:48
@finsberg
Copy link
Member

Thanks @Erasdna for the PR! I fixed the merge conflicts and made some changes. Would be great if you could have a look an see if it make sense? I am also not sure about the changes in the compute_stats module. Would be good to get an opinion from @cdaversin as well.

@cdaversin could you take a look?

@Erasdna
Copy link
Author

Erasdna commented Mar 16, 2026

Thanks!

Part of my thinking was that it could be useful to be able to specify which statistics you want to compute, and then store them as a pivot table, i.e, one value column and multiple grouping columns (which I think is often an advantage for later processing). Maybe the two generate dataframe functions should operate in a more similar way? Computing an unnecessary number of statistics can sometimes take too much time imo (but the specific implementation of this is maybe not optimal as it stands)

I also think it would be useful to decide on whether functions should take a Path or an MRIdata/Segmentation object as input (this applies several places). I would like to add some more functionality to modify segmentations etc, which would make passing the class object a bit more practical, I think.

I am also curious about the regex stuff @cdaversin, to me it seems a little bit convoluted sometimes. I added a way to directly read patient info assuming that files follow the bids convention (which is sota I believe). Would maybe be nice to stick to that? We should probably also make some way to generate filenames according to bids?

@cdaversin
Copy link
Collaborator

Thanks @Erasdna ! Looks good to me :)

I see the point of the pivot table and I agree that it is useful, but the way you create your pivot table (i.e. what is the information you are looking for) can vary from one user to another. What about generating all the statistics (more or less as we do now), and export it as a csv file that one could load using pandas and generate any possible pivot table.

I agree that the input type (input file, MRIData, np.array) should be more consistent through the different functions in the code. This is something we should do.

Regarding the regex, I agree that it might look a bit complicated. But I don't think we can assume that all the data we are (and will be) working with follows the BIDS convention. So we might need to be a bit more flexible regarding the input files / data. But I agree that for the processed files that we generate, we should be following the BIDS convention as much as possible.

@finsberg
Copy link
Member

One issue now is that there are two separate functions that does similar things (e.g generate_stats_dataframe and generate_stats_dataframes_rois). In the original PR the latter just replaced the former, so I was not sure what to do here. I know @cdaversin implemented the first one, but would be nice to somehow unify these, or perhaps implement some common helper functions that can be used in both.

I agree with @Erasdna that we should probably not compute more stats than the user need. Perhaps it is possible to make it default to compute a minimum number of stats, and add some flags if you want to compute more stats?

For Path vs MRIData, I think the underlying function should take MRIData, but the command line interface should take a path. If you want to use the library for python, it is a bit silly if you need to save the data to a file first in order to use the function.

For the regex stuff, I don't really know why we need this. Is it just to use the correct method in nibabel to load the data? Have we tried just using the nibabel.load function? This seem to allready have some logic implemented to handle this?

@Erasdna
Copy link
Author

Erasdna commented Mar 16, 2026

I see the point of the pivot table and I agree that it is useful, but the way you create your pivot table (i.e. what is the information you are looking for) can vary from one user to another. What about generating all the statistics (more or less as we do now), and export it as a csv file that one could load using pandas and generate any possible pivot table.

My point was a little bit simpler I think. I wanted to store the data in a long format (one column with statistic name and one value column) rather than in a wide format (with one column for each statistic, as was done intiially). The long is nice because it is easier to reformat with a pivot_table like function and often easier to query. The data would still be stored as a .csv and read with `pandas (This is the way I did it).

Imo regex is maybe most useful for checking that information is consistent (i.e segmentation and data is from the same patient). I wonder if this is something that should be handled independent of computing the statistics themselves. My thinking was that the generate_stats function assumes that the metdata is correct, and then computes various quantities. I think maybe providing regex patterns makes the most sense for the command line interface, we can then provide some functionality to parse the metadata as necessary, with BIDS as one option? This also applies to the timetable etc?

To merge the two functions I don't think it should be easy to simply provide an option in generate_stats_dataframe_rois to also provide an optional list of rois. You could then just mask the segmentation array.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants