01:08:45 <yuripv> looks like my subscription to advocates list is "awaiting approval" since March, could someone approve it, pretty please? 
01:09:06 <yuripv> (and allow an rti message through)
05:01:04 <tozhu> hello all, a simple question regarding to illumos compile progress, I see there is no ‘-O2’ option, can we add ‘-O2’ for illumos compile?
11:23:26 <andyf> In general, we are careful to favour observability over optimisation, which is why -O is used - and even then we turn off some of the optimisations that -O enables. It's quite likely that some of the things that -O2 adds would be beneficial, but we'd have to validate that they don't break things like DTrace fbt probes.
11:25:37 <yuripv> looking at nightly.log, there are a lot of "-O2" in gcc invocations
11:34:12 <yuripv> so it looks like we build 32bit object with -O and 64bit ones with -O2
11:37:52 <andyf> Ah right - my searching for -O\d was not working, but they're specified with the studio flags, and converted by cw
11:38:03 <andyf> Makefile.master - amd64_COPTFLAG=           -xO3
11:39:35 <yuripv> yeah, I didn't look at Makefile.master as there's just too much magic involved in translation
11:40:15 <yuripv> oh, tozhu isn't here
15:10:44 <yuripv> rmustacc: is having both debug/non-debug builds a new requirement for rti? i'm pretty sure i didn't do both previously
15:11:36 <rmustacc> No, not new.
15:12:23 <rmustacc> https://illumos.org/docs/contributing/#submitting-a-patch
15:13:36 <yuripv> it doesn't say anything about having both thouhg
15:14:01 <yuripv> (it would be good to have that mentioned explicitly)
15:14:38 <rmustacc> Happy to improve it. That's what full has always meant to me.
15:15:03 <yuripv> my understanding was "non-incremental" :)
15:15:14 <yuripv> and "with all checks enabled"
15:15:52 <rmustacc> Feel free to make a pr with verbage you'd find clarifying.
15:16:57 <rmustacc> Or if not, I'll try to get to it at some point. But I can't promise you'll be happier. ;)
15:18:23 <rmustacc> Given there are different warnings that are checked and one can make one build work but not the other, that's why it's there.
15:25:33 <yuripv> understandable, it's just first time I was asked for both
15:31:32 <rmustacc> Not everyone always checks, especially when we end up doing builds before pushing anyways. Sorry for the confusion.
15:39:14 <tsoome_> cmpldev() in getaudit_addr() is failing for 32-bit consumers:(
15:41:07 <tsoome_> cmpldev:entry int64_t 0x21600000006b3  cmpldev:return int64_t 0 -- value passed on call and return code. 
15:46:53 <rmustacc> What's the exact path you're going?
15:50:05 <tsoome_> I found that getaudit_addr() call in userland is failing (like with auditconfig -getaudit); so I did trace a bit and found that getaudit_addr() syscall in kernel is returning getaudit_addr:return int64_t 0x4f (EOVERFLOW), and there are exactly 2 cases where we can return it -- the argument len is correct, but we do get error from cmpldev().
15:52:52 <tsoome_> so cmpldev() does dev >> L_BITSMINOR to get major (getting value 0x21600) and as L_AMXMAJ32 is 0x3fff, the if (major > L_MAXMAJ32 is true and we return 0 (and NODEV32).
15:54:06 <tsoome_> so, the interesting question is, is the 0x21600 something we have to accept as is, or can it be further translated....
15:55:41 <tsoome_> or... should getaudit_addr() syscall handle the NODEV32 better than just erroring out...
15:56:05 <rmustacc> What does the device correspond to?
15:59:54 <tsoome_> huh... we are translating cmpldev(&dev, ainfo->ai_termid.at_port) ---  should this be pts?
16:00:01 <tsoome_> hm... 
16:04:49 <tsoome_> it does not really make sense as such, need to see how they are getting this value in first place....
17:12:31 <tsoome_> eh:D well. this terminal id is not getting translated to dev the because it is not dev, it is network address with port numbers and remote ip. as simple as that.
17:14:25 <gitomat> [illumos-gate] 15261 "ExpDataSN mismatch in SCSI Response" error on FreeBSD 13.1 initiators -- Yuri Pankov <ypankov⊙dc>
17:15:05 <tsoome_> uint_t at_type = 0x10 (thats AU_IPv6)
19:32:26 <gitomat> [illumos-gate] 16729 disambiguate generic & pci memlist functions -- Luqman Aden <luqman⊙oc>
19:56:15 <gitomat> [illumos-gate] 14916 ehci_qh_pool_size is probably too low -- Joshua M. Clulow <josh⊙so>
21:26:30 <richlowe> I realized I recently said I'd never seen the zfs 'bn > dn_maxblkid' issue be more than two blocks over.   That's not true.  I just saw 3
21:40:18 <tsoome_> got it with zfs send?
21:45:00 <richlowe> yes
21:45:04 <richlowe> it's always send/recv
21:45:21 <richlowe> I just don't want to have talked about the bug and left wrong info in the IRC logs without a correction :)
21:47:01 <richlowe> it feels like it's using the large...thingy feature without turning it on? or without checking if it's off?
21:47:51 <richlowe> is it large_blocks or large_dnode?
21:49:41 <richlowe> if I understand `zpool upgrade` properly, I don't have either enabled on the source pool
21:50:11 <richlowe> so it's almost like `zfs send` is inventing a big something-or-other, or `zfs recv` is deciding to create one when it should not
21:50:36 <hadfl> there is also another zfs issue that causes even more havoc than just printing a warning message. if there are labels left at the end of the device. zpool expansion will blow up.
21:53:14 <richlowe> yes, that one too!
21:56:49 <hadfl> it's particularly annoying as for aarch64 we are obviously dd'ing new images to the same device over and over again for testing. the only workaround so far is to do a `zpool create -f ... <dev>`; `zpool destroy ...`; `zpool labelclear <dev>`
22:07:35 <tsoome_> "if there are labels left at the end of the device" -- left how? overwriting pool without doing zpool destroy first?
22:09:50 <richlowe> labelclear is the important part
22:10:12 <richlowe> zfs puts labels at the start and end, if the end one still exists (we think), zfs expansion chokes irrecoverably.
22:10:27 <richlowe> I say "we think", the workaround I'm pretty sure is 100% reliable
22:11:19 <richlowe> so in this case, you have an sd card, you dd your new image to it, plug it back in, and assume it'll autoexpand, instead it'll take irrecoverable errors the moment it tries.
22:11:26 <richlowe> iff the trailing label still exists
22:12:53 <hadfl> zpool create -f; zpool destory; is just to make labelclear work. as rich pointed out this is the thing that helps
22:13:41 <hadfl> tsoome_, if you want to test it, use a device, dd a zpool to it, expand it. dd the same zpool to the larger device again and try to expand it again
22:19:15 <tsoome_> so that in this scenario you would overwrite labels at the beginning of the disk with new image, but labels at the end are left intact as your image will hope to use its own label copies from the end of the image?
22:21:29 <hadfl> yeah, the end of the image (i.e. end of the new zpool) is not the end of the device where there are still old labels left
22:25:07 <tsoome_> there are few problems with left over labels --- first one, the last two are searched based on the size of the device. Now, if you are overwriting the same image again and again, it means the pool, device etc guids in those labels will match. And worse, as the source of the pool is from the same image, the transaction group numbers on "old" image are larger than ones in the "new" image, therefore the "old" labels
22:25:07 <tsoome_>  are more recent than the ones from the image....  and you get very confused pool:)
22:34:12 <hadfl> well, it does not necessarily need to be the same image. but i guess any assembled image has fewer transactions than a pool that was in use. since this issue seems to be known and "explainable" what's the best workaround?
22:34:39 <hadfl> i doubt the re-imaging a device is a non-standard use-case
22:35:28 <richlowe> it's also immediately and terminally fatal to the pool
22:37:57 <tsoome_> well, if it is not the same image, you have 2 first labels from your new image and 2 last ones from old pool and thats kind of worst case because which ones we should believe to be true? at least "good old" SVM was using majority based voting with metadb, so 50% available metadb replicas did stop the boot, and you had to fix it manually.
22:39:07 <hadfl> the interesting part is that everything is happy as long as we don't try to expand the pool
22:39:14 <jclulow> In this case you do actually have four labels that make sense, FWIW, they're just not in the right spot according to the slice
22:39:18 <jclulow> Right
22:39:24 <richlowe> so ok, but what happens in practice is everything works (it read a start-of-disk label), until you try to expand, when if it sees an end-of-disk label _then_ it shits the bed.
22:39:24 <hadfl> i can dd a new pool with old labels on the device as many times i want
22:39:30 <hadfl> everything works
22:39:44 <hadfl> as long as i don't want to expand the pool
22:39:45 <jclulow> As far as I know, the pool knows how big the used region of the slice is; I'm sure it's just getting confused on re-open
22:40:18 <richlowe> right and then it uses the labels that don't even agree with the labels it was using a second ago
22:40:31 <jclulow> So I think it would be pretty legitimate to have ZFS locate the labels where it expects them to be, as it seems to be doing (except for expand) and just trust those -- and to not screw up if there's random data at the end of the slice
22:41:15 <tsoome_> richlowe and that is bug, it must make sure the newly found labels are not poisoned.
22:41:21 <jclulow> I suspect what we really ought to do, prior to expansion, is not even look -- just erase the target label region
22:42:07 <jclulow> if that fails, we don't try to do anything else
22:42:28 <jclulow> but otherwise we know that we can't then become confused
22:43:38 <tsoome_> yep, the fact that we are in process of expanding means that we should not care about what is there.
22:43:47 <jclulow> yeah.
23:14:02 <tozhu> andyf, yuripv  thank you for the answer regarding to ‘-O2’ compile option, thank you