03:37:51 a bit random, but does anyone know offhand if the kernel ever updates /etc/path_to_inst, or is that always done from userland? 03:37:59 (or vice versa) 03:39:21 (if userland, i wanted to experiment with addign it as a boot module from the local disk and then lofi mount in the running os) 03:39:37 so it can be persistent even with a ramdisk based setup 06:17:27 [illumos-gate] 15613 SMB2 Large MTU support -- Gordon Ross 14:32:55 *sigh* one thing that would be nice is if standards included a history for a feature (e.g. 'first appears in version X') so you don't have to chase down older versions to answer that 15:44:17 indeed. The python docs at python.org seem to get this right (plus you can pull up the older versions in a couple clicks) 16:03:00 (among many other things), I'm hoping to work on some improvements with how ZFS talks to disks 16:03:36 e.g. there's an existing 'don't cache' flag for buf(9S) that zfs sets, but doesn't actually get used by anything 16:03:59 (B_DONTNEED IIRC) 16:04:26 err B_NOCACHE 16:36:07 could also set a flag in the READ/WRITE CDBs (at least for SCSI disks) to tell the disk to not cache 16:36:43 as well as when zfs says don't retry, have sd.c not retry the failed I/O 16:37:32 (our timeout/retry defaults here are horrible and were probably marginally ok for disks 25-30 years ago) 16:38:15 Unfortunately standards are often that way and just is the nature of the beast. 16:38:29 yeah.. the scsi ones are far from the only ones that do it 16:38:41 I'll be curious to see what you come up with here. 16:40:33 looks like B_NOCACHE does get used in bio.c (leads to the buf getting freed sooner). I'd worry about dumb firmware misinterpreting it. 16:40:53 ("it" being the SCSI equivalent of B_NOCACHE) 16:42:02 i'd probably add in a hook to disable it via sd.conf like some of the other behaviors.. 16:42:32 the bigger one though is if the disk has an uncorrectable media error, it returns EIO to zfs, which AFAICT never attempts self-healing in that instance 16:42:47 but sd.c could tell the drive to attempt to remap that block 16:43:09 at which point self healing could be used to repair it 16:43:22 (or on a write, just retry the write after remapping the block) 16:45:59 (also, might be nice to enable background scanning via fmd or such.. it's supposed to at least be non-impacting) 16:48:11 hard part would be if as a result of the scan the disk says, unsolicited, "block 8675309 is unreadable". would be hard to find the block pointer that has the checksum that covers that block.. 16:48:25 scrub at least has that context. 16:50:48 yeah.. for bad blocks discovered asynchronously, it might just be use that to trigger a scrub 16:50:55 presumably you could tell if it's allocated or free. (free -> just remap it to zeros; alloc -> add to suspect list and schedule a scrub?) 16:51:27 maybe.. i need to look at more to see how easily that can be discerned 16:55:20 mapping back from lba -> slice -> metaslab might be messy but should presumably be doable but might involve a bunch of io to read the metaslab metadata. 16:57:23 at minimum, fma could at least know from a scrub 'yeah, i'm expecting errors on these blocks' based on the bad block list after a media scan 16:57:50 (at some point, I'd also like to make the disk & zfs modules a little smarter... right now a bad disk can cause them to step on top of each other) 16:58:50 i don't know if it's possible, but like if the disk module thinks the disk is bad, but is a whole disk zfs disk, have a way to tell and defer to the zfs module 16:59:05 right now when it retires a disk, it can kinda pull the rug out from zfs 16:59:13 and the zfs module might be trying to do things too 18:11:49 yep, I've noticed it's quick to mark a disk "removed" requiring what looks like a full rebuild after it returns (and likely messing with the increased robustness that the DTL should give you) 18:12:53 yeah, at least some disk manuf have told us it's normal/ok to have a certain # of defects 18:14:06 if it's offline for minutes or hours, a DTL-pruned resilver should complete quickly but i don't see that happening. I'd still run a scrub afterwards just to be sure but...