Refactor artifacts detection + add saturation detection#4297
Refactor artifacts detection + add saturation detection#4297samuelgarcia wants to merge 18 commits intoSpikeInterface:mainfrom
Conversation
|
@samuelgarcia do not forget, in this PR, to change DetectThresholdCrossing(recording, ...) to DetectThresholdCrossing(envelope, ...) in the detec_period_artefacts_by_envelope |
|
Hi @oliche @JoeZiminski I think we do not need to scaled traces at each chunk we could only could the unscaled threhold in the init and then we could avoid the costly convertion to float32 or float64. Also I put the unit in uV to be more SI friendly. |
|
@alejoe91 @chrishalcrow ready to review on my side |
| self, recording, periods=artifacts, mode=mode, noise_levels=noise_levels, seed=seed, **noise_levels_kwargs | ||
| ) | ||
| # note self._kwargs["periods"] is done by SilencedPeriodsRecording and so the computaion is done once | ||
|
|
There was a problem hiding this comment.
add detect_artifact_method detect_artifact_kwargs
|
Hello, trying this out on real data with saturation. I get this output artifact_periods=array([(0, 637721, 638769), (0, 638955, 639206), (0, 923219, 923602),
(0, 923605, 923606), (0, 923607, 923835)],Are these periods inclusive? And if they are, should the last pair be merged? |
|
Hey @samuelgarcia thanks for this! will check this out tomorrow. I accidentally did not push two tidy-up commits to #4301, do you mind if I push to your branch to add some changes? I think it is mostly superficial stuff like docstrings etc. |
Yes, the last pair should be merged. |
yes of course. push anything you want |
|
Hey @samuelgarcia thanks for this, I added the content of those commits I missed. I ran into some issues with the (now renamed) If possible I'd prefer to cast and scale the data to avoid such issues, then we always know we are solid whatever the user inputs. I get |
| max_ = np.max(np.r_[data_seg_1.flatten(), data_seg_2.flatten()]) | ||
| gain = max_ / 2**15 | ||
| offset = 0 | ||
| offset = 50 |
There was a problem hiding this comment.
@JoeZiminski : why did you put 50 as offset here ?
There was a problem hiding this comment.
Apologies forgot to say, just to ensure it is tested (if it was zero and was accidentally not applied in the function, the test would not flag it)
There was a problem hiding this comment.
(that was just a quick test, see update below)
NP1 is 12bits, so I think that the lsb is 5000 / 2**11 = 2.4uV |
|
Hey @samuelgarcia I didn't realize the bit depth of the ADC is not 16-bit! I see here ADC resolution is given as 10 bits. In any case, the resolution from the gains/offsets used to scale the data won't be able to represent the values used for the |
|
Hey @samuelgarcia @alejoe91 thanks for the feedback, I'm running into the below issue calculating the derivative on See below for the test data in float and int. When taking the derivative, information is lost due to overflow.
The above is the data converted to int, each timeseries is the data from one channel. In many instances, the data after the saturation is negative in int space, resulting in overflow e.g. by definition the saturation point is at the max of the int16 range, |
|
@JoeZiminski I think the test data is not realistic. Usually the maximum int recorded on neuropixel is 512 on NP1 and 8192 on NP2 so you wouldn't have the issues above as the max diff would be 8192 * 2 (2 ** 14). However in general I think it is a poor practice to compute in Anyways for this specific saturation detection problem, you just have to fix your test data by staying in the maxint ranges above. |
|
Cheers @oliche! That's good to know for NP probes. For other providers like Nexus, run their ADC at 16 bit-depth (e.g. here), in this case it might be a problem? That being said I don't know how often these probes saturate and whether people are likely to run this function for non-NP1 probes. But yes I think the easiest solution to avoid these worries is to perform computation in float. |
|
If you think you may exceed half the the range of int16 (ie higher than 2 ** 14 or lower than -2 ** 14), then you can check for the max(abs()) values. If they are above or below, you may throw a warning and only perform the absolute saturation detection and skip the one based on the derivative ? That would prevent too many false positives. |
|
Thanks @oliche, yes in this case I would definitely suggest @samuelgarcia we perform operations in float here to avoid having to worry about the unpredictable bit-depth of the data. |
|
Hi guys, I've debugged this a bit and I think it's much simpler to specify the diff threshold in Thoughts? |
|
Hey @alejoe91 thanks I completely agree that is much more intuitive and easy to use. Thanks for finalising this! My only comments are:
|
Done! Just casting to float at the top of the compute!
I think I agree with this. We can keep it simple, have the 2 functions and users can directly choose what they want to do. @samuelgarcia ok to remove |
|
Another note: the last commits also add the option to annotate a recording with |
| # Do not remove this, this is to remenber that in previous version the first node needed to be | ||
| # a detectot but not anymore | ||
| # node0 = nodes[0] | ||
| # if not isinstance(node0, PeakSource): | ||
| # raise ValueError( | ||
| # "Peak pipeline graph must have as first element a PeakSource (PeakDetector or PeakRetriever or SpikeRetriever" | ||
| # ) |
There was a problem hiding this comment.
I suggest making this optional and have a check_for_peak_source flag
| sample-to-sample voltage change exceeds this value in the required | ||
| fraction of channels are flagged as saturation. Pass ``None`` to | ||
| disable derivative-based detection and rely solely on | ||
| ``saturation_threshold_uV``. IBL use **300 μV/sample** for NP1 probes. |
| # The saturation threshold can be specified in the recording annotations and loaded automatically | ||
| # for some acquisition systems (e.g., Neuropixels) | ||
| if "saturation_threshold_uV" in recording.get_annotation_keys() and saturation_threshold_uV is None: | ||
| saturation_threshold_uV = recording.get_annotation("saturation_threshold_uV") |
There was a problem hiding this comment.
I think we should check in case this is None. Maybe:
| saturation_threshold_uV = recording.get_annotation("saturation_threshold_uV") | |
| saturation_threshold_uV = recording.get_annotation("saturation_threshold_uV") | |
| if saturation_threshold_uV is None: | |
| raise ValueError("Cannot read `saturation_threshold_uV` from recording. Please pass `saturation_threshold_uV` manually.") |
|
EDIT: Think I'm wrong. I think the Hello, when I run this on real data I still get periods that I think should be merged (highlighted below) si.detect_saturation_periods(rec, saturation_threshold_uV=1200, diff_threshold_uV=300)
>>> array([(0, 61450219, 61450586), (0, 61450751, 61451231),
(0, 61451294, 61451329), (0, 61451332, 61451334),
(0, 61451335, 61451342), (0, 61451351, 61451358),
**(0, 61451362, 61451363), (0, 61451364, 61451802),**
(0, 61451959, 61452412), (0, 61452428, 61452429),
(0, 61452617, 61452619), (0, 61452622, 61452630),
(0, 61452631, 61453012), (0, 61453207, 61453536),
(0, 61453828, 61454147), (0, 61454493, 61454716),
(0, 61455076, 61455294), (0, 61455296, 61455297),
(0, 61455719, 61455722), (0, 61455725, 61455885)],I could be misunderstanding (maybe the end sample isn't inclusive??), but I think the starred ones should be a single period? |


@JoeZiminski @oliche
This refactor the artifact detection and blanking
detect_artifact_periods()time to be used in preprocess dict (@chrishalcrow this is for you)
Could also be done in this PR:
To be done in 2 futures PR: