-
rzezeski
but with that it will save a few hundred million syscalls
-
rzezeski
I think it dropped something like 34 seconds on one of my machines
-
rzezeski
I think tsoome_ was working on getting us upgraded? I forget. I worked on this a bit ago but life stuff has been happening and I didn't follow up.
-
rzezeski
But I also wonder if newer versions will just add more checks and negate any wins, haha.
-
jbk
is that the bug where sometimes it starts looking at like every file on your build system?
-
rzezeski
no
-
rzezeski
I don't think I ran across that one
-
jbk
i've yet to be able to reproduce it consistently
-
jbk
but sometimes if there's a bug, it gets into this thing where it looks at like every header file it can find
-
jbk
or something like that
-
jbk
which of course means the build just sits there for hours
-
gitomat
[illumos-gate] 16485 smbios chassis SKU logic is accidentally required -- Robert Mustacchi <rm⊙fo>
-
rzezeski
I have another patch I want to make, it's a follow up to one pmooney made years ago, but we missed a few spots. That also would reduce a bunch of redundant syscalls IIRC. But I don't think I've seen the one you are talking about.
-
jbk
e.g. if there's a bug in the code you're building
-
sommerfeld
while we're talking about build perf: I'm still tinkering with it but it looks the changes I'm doing for 16497 is saving me 2-3% wall clock time on builds while keeping the load average relatively low.
-
jbk
is that the dmake M2 change?
-
sommerfeld
Yeah.
-
sommerfeld
Fixed the algorithm to more closely resemble what I remember seeing GNU make and the NetBSD make do.
-
jbk
nice.. i had discovered it way back and tried it, but didn't have the time to dig into the problem
-
jbk
after it blew up rather spectacularly
-
sommerfeld
I haven't root-caused the failure - just changed the M2 limiter to interact with the rest of make the way the M1 limiter does.
-
sommerfeld
yeah, when it starts trying to build things in the midst of make clobber it goes off the rails very quickly.
-
rzezeski
What is "M2"?
-
rzezeski
never mind, read the issue
-
jbk
i have to wonder how much of the time savings is just from not going nuts under usr/src/man :)
-
nwilkens
Agnar : we have a build of minio for SmartOS that jperkins worked on recently.. He can fill in further details.
us-central.manta.mnx.io/pkgsrc/publ…ic/test-packages/minio-20240321.tgz for reference
-
nwilkens
s/jperkins/jperkin/
-
jperkin
also I've been upstreaming the fixes required to golang so hopefully you should be able to build minio without patching on your distribution at some point
-
tsoome_
hm, 504 Gateway Time-out from src.illumos.org .(
-
gitomat
[illumos-gate] 16484 libnvme should take care of unaligned log page sizes -- Robert Mustacchi <rm⊙fo>
-
gitomat
[illumos-gate] 16387 CLSET_NODELAYONERR not set if udp transport is used. -- Toomas Soome <tsoome⊙mc>
-
danmcd
@jbk ==> fenix illumos 16502
-
fenix
BUG 16502: panic in zfs:abd_iter_map (New)
-
fenix
-
danmcd
(I can't believe your small change would've caused this, but since you were the last person in abd-land, thought you should know.)
-
jbk
it's unfortunate there's no dump there
-
jbk
i can't see anything obvious from the stack that'd suggest it could be related
-
jbk
i'd imagine if there was any problems w/ that change, it'd be freeing into the wrong arena...
-
jbk
-
danmcd
Yeah... why I can't believe it.
-
jbk
which suggests the abd is bad (would a use after free potentially cause a page fault like that?)
-
danmcd
2nd panic is different, but also called by abd_iterate_func().
-
richlowe
given the pc and fault addr in nils' report you should be able to pin it down to what it's trying to read at least
-
richlowe
if someone has a dump
-
richlowe
also I had no idea nils was around, that's cool
-
danmcd
Yeah... look at both dumps. They are called from vdev_indirect_io_done() through a checksum error path and erport.
-
sommerfeld
jbk: agree with you re: the line at issue.
-
danmcd
SOrry... both panic stacks.
-
richlowe
I think jbk's theory that his work is safe is not necessarily true, iff we got corruption onto the disk that nils is now choking on
-
richlowe
but I want to be clear I'm not sayying it's _unsafe_
-
jbk
the second looks like
github.com/illumos/illumos-gate/blo…sr/src/uts/common/fs/zfs/abd.c#L838 -- I think the first arg to memcpy is probably bogus
-
richlowe
jbk: yeah, that's why I was saying about the fault addr and the dump, given the arguments, you can look at the fault address in terms of its offset from them
-
richlowe
like, private->arg_buf there
-
jbk
unfortunately, it doesn't appear there's a dump
-
rmustacc
It's hard to get, but I may be able to work to get one.
-
rmustacc
I think from what I can understand easy to get into kmdb and get data, but not a dump.
-
rmustacc
I think we were able to get a dump.
-
richlowe
rmustacc: you mean you can't dump raw memory from your firmware, and produce a valid dump offline?
-
richlowe
amateurs ;)
-
rmustacc
It's on a commodity PC.
-
richlowe
I was just joking anyway
-
rmustacc
All good.
-
rmustacc
I'll see if we can get it somewhere.
-
rmustacc
But I'm a bit backed up personally.
-
jbk
yeah, i'd like to take a look if I can... i really tried to go through and verify that either you allocated a new abd_t w/ the metadata flag set appropriately, or if you (more or less) got an abd_t from another abd_t, that the metadata flag would match the source -- and couldn't find any way the flag could 'flip'
-
jbk
so if I did miss something, I want to know what :)
-
rmustacc
Yeah, just figuring out upload mechanics. I miss manta.
-
sommerfeld
looking further up the stack, it's logging an error after repairing a checksum error encountered while handling i/o to a indirect vdev (which is something that is created when you remove a top-level device from a pool).
-
sommerfeld
if Nils can recall it would likely be good to capture the history of the pool configuration.
-
sommerfeld
was there an accidental "zpool attach" when "zpool add" was meant?
-
sommerfeld
err, other way around
-
sommerfeld
zpool add when zpool attach was meant.
-
jbk
rmustacc: me too.. i wish we had it here for uploading diag stuff (incl dumps) from customers
-
jbk
the thing we use now really can't handle large files well at all
-
jbk
(e.g. crash dumps)
-
NilsN
The dump file has been uploaded to the URL Dan provided.
-
richlowe
thanks nils
-
danmcd
Nice when vmdump.N is < 1GB large. :)
-
danmcd
I have a copy in my dump area on kebecloud, and I'll put it somewhere for public download unless you're worried about making it public @NilsN
-
richlowe
I wouldn't make it _public_ either way, but maybe letting "us" see if nils is ok
-
danmcd
I can put it somewhere for specific values of, "us". :)
-
NilsN
I doubt there is anything sensitive in the dump, since it's coming from an ISO-booted system, but it's probably not ideal to make it world readable. Feel free to give access to whoever ends up digging into the bug.
-
sommerfeld
Given the pool history (removed one of two striped disks) I'd be looking at the vdev_indirect code very close.
-
sommerfeld
closely
-
sommerfeld
openzfs/zfs #14804 might be relevant
-
danmcd
Corrupted abd map for sure.
-
danmcd
fffffe007cac84b0 abd_copy_to_buf_off_cb+0x30(dc6003a8fb040179, e9f, fffffe007cac8550)
-
danmcd
That first parameter is supposed to be a pointer and it sure the &*%^^$ isn't.
-
danmcd
So in the dump I have... theres' a call to annotate_ecksum()
-
danmcd
It takes a "goodabd" and a "badabd" and the "badabd" is so bad mdb has "no mapping for address" for its pointer.
-
jbk
well at least the name is right then :)
-
jbk
if i'm looking at the right stack in the ticket,
-
jbk
the 'bad' abd should be coming from vdev_indirect_checksum_error
-
jbk
ic->ic_data
-
danmcd
I think sommerfeld might be on to something...
-
danmcd
-
sommerfeld
so I have a couple thoughts - 1) I wonder if this particular sequence of events has been tested (zpool remove in a non-redundant pool)
-
sommerfeld
2) the failure happens when it's trying to log data around the corruption. if we just turn off the ereport generation can it import the pool w/o crashing?
-
NilsN
Happy to give that a try.
-
danmcd
Put some data in the ticket. @NilsN I've at most an hour until dinner time here. Anything you want me to do with the dump, please let me know.
-
jbk
i need to dig in more, but IIUC (from what I have looked at so far), for the code in vdev_indirect.c to have identified on of the indirect_child_t's as bad, it would have to actually attempt to checksum the 'bad' ic_data
-
jbk
oh wait, duh
-
NilsN
I don't see any easy way to suppress ereports from ZFS, but I think I could do it by setting vdev_checksum_rl to 0.
-
jbk
danmcd: the no mapping should actually be expected
-
jbk
because it's zfs data
-
danmcd
Ahh.
-
NilsN
That would trigger the ratelimitter.
-
jbk
by default, that's explicitly excluded from dumps
-
jbk
that was in fact part of the motivation for my change -- since we (already) exclude zfs data from dumps normally (IIRC, you can modify the behavior w/ dumpadm to include it)
-
jbk
we were seeing 200+gb dumps
-
jbk
with most of it in abd_ts
-
jbk
well or at least large percentages
-
jbk
since abd's came later, the thought was that since i think only illumos cares about such distinctions, not using the data arena on them was just an oversight
-
jbk
(on abds that weren't marked as metadata)
-
jbk
though.. it's the _good_ abd data that seems like it's the problem
-
jbk
unless the mdb output dan pasted is off w/ args
-
jbk
look at the value to abd_borrow_buf_copy
-
jbk
and then look at where it is in the call to zfs_ereport_post_checksum
-
jbk
though either way, i'm not sure how it could (prior to trying to generate the ereport) determine which abd was bad and which was good without accessing both of them
-
jbk
which suggests (need to look through the code to confirm this) that the contents of both should have been mapped/readable/accessible at the time it figured out which was good and which was bad
-
jbk
(let me note some of that in the ticket)
-
sommerfeld
ok, I think I reproduced something similar in a VM.
-
sommerfeld
with the advantage of not having done it to the root pool.
-
NilsN
Go ahead. Rub it in.
-
jbk
i think both stacks (in the ticket) are related
-
sommerfeld
sorry...
-
sommerfeld
if it's any consolation, the vm is boot looping
-
sommerfeld
not exactly the same traceback
-
jbk
abd_iter_map() is what computes the address prior to invoking the callback (abd_copy_to_buff_off_cb) which in the second stack gets a bogus address (created by abd_iter_map)
-
jbk
since the abd_t is segmented, it's trying to grab linear chunks of it to copy into the new buffer (to concatenate the bits into a linear buffer)
-
sommerfeld
NielsN: I was thinking something more along the lines of commenting out/patching out the call to vdev_indirect_checksum_error()
-
sommerfeld
patching out the call to vdev_indirect_checksum_error in vdev_indirect_repair (overwriting the 5 bytes of the call instruction with 0x90 (amd64 NOP)) lets me import the pool.