-
bahamatxmerlin: We have been able to reproduce it with the reproducer script, but it's much less frequent than it seems like other systems are encountering. The most recent release from last week includes the fix.
-
xmerlinbahamat as I can see you have integrate the patch without mutex but the patch in illumos and openzfs use mutex ...why?
-
bahamatxmerlin: We imported it early. Was there an additional issue found that necessitated the mutex, or is it just extra precaution?
-
bahamatLet me check with Dan...we might respin the release.
-
bahamatOk, so the mutex was put into openzfs for a completely unrelated issue, over two years ago where lseek(2) would fail, seemingly at random.
-
bahamatBut it doesn't seem to be related to the data corruption issue.
-
bahamatSo we're not respinning the release, but it will be in our next release (and in the current dev build, assuming it finished).
-
bahamatAnd yeah, that build has finished.
-
bahamatThat image is master-20231204T135840Z
-
danmcdWorst case xmerlin is that the window of 16087 gets only much smaller without the mutex. With the mutex it should be eliminated.
-
danmcdI ran 100 runs of the brute-force parallel reproducer on the release-build and it didn't trigger at all.
-
danmcd(it == the bug)