-
sommerfeld
looks like a reboot between runs is advisable to clear stuff that got mounted by tests and not unmounted.
-
andyf
Yes, and I've also had occasions where the dump device is left set to a volume on the testpool, preventing it being destroyed.
-
rmustacc
Is that specific to the ZFS tets that you're seeing that?
-
sommerfeld
yes.
-
jbk
you can also clean up /var/tmp between runs without having to reboot
-
sommerfeld
I saw at least one left-over mount point.
-
jclulow
It really seems like the kind of test suite that you don't want to run on a machine with things you care about
-
jclulow
i.e., it should probably run in a throw-away VM
-
jbk
haha no..
-
jbk
which reminds me.. at some point i need to finish setting that up
-
jbk
i managed to get pkg(5) working in a smartos zone and i was going to create a dedicated test vm separate from the one used to build/dev
-
jbk
it's silly, but i don't want to give up my current omni vm because by some cosmic conincidence, the UUID ended up with the first 5 digits of pi (hence the name) :)
-
sommerfeld
jclulow: indeed, but these issues should be tracked down and fixed because they cause damage to other tests in the same suite!
-
jclulow
oh, to be clear, I am also a broken record on the subject of needing someone to make a complete catalogue of flakes in the test suite
-
jclulow
I just think, even if it's 100% reliable, that I would still only expect to run it in a throw-away location
-
jclulow
Unlike, say, the libc test suite or whatever
-
richlowe
yeah, josh has been... vocal about how this needs fixing
-
richlowe
the challenge is that it requires such a throwaway setup to run, and many of us don't have one
-
richlowe
or have one unique enough it's hard to tell if the problem is the setup, or the tests
-
richlowe
jbk: it _is_ silly, but whenever I get a "good" commit hash, I'm sad to lose it
-
tsoome
if zfs tests are 100% success, then it must be safe to run; failing tests are obviously potentially harmful.
-
richlowe
you need to make that more obviously deadpan
-
tsoome
in my test vm, I have list of 9 failing tests, but it may not be 100% true.
-
jbk
yeah, i have my list -- though I thought I filed tickets for all of them (to help others) even if I wasn't able to root cause all of them
-
tsoome
for example, in my vm, I have FAIL l2arc tests, while andyf had them PASS.
-
richlowe
yeah, some are unreliable
-
richlowe
danmcd runs the ZFS tests as part of the smartos release process
-
richlowe
so is a good person to talk to
-
tsoome
yep, that too.
-
richlowe
A think that rmustacc did with nvme, which is wonderful, is split his test suite into "destructive" and "not"
-
richlowe
that would help the zfs tests in so many ways (even if there are not that many non-destructive)
-
danmcd
^^^
-
danmcd
I have a VMware Fusion VM specifically tailored for taking the abuse of such a run. It has 6 virtual SATA drives just for ZFS test, a virtual NVMe drive that appears to pass all but one corner-case test I'm chalking up to VMware madness...
-
danmcd
... and it's called "new-from-scratch-SmartOS" so its usage is right there on the label.
-
danmcd
Some are very unreliable.
-
tsoome
I was using iscsi luns to get trim tests running in vmware fusion vm.
-
tsoome
for trim tests, this one is only one failing: Test: /opt/zfs-tests/tests/functional/cli_root/zpool_trim/zpool_trim_verify_trimmed (run as root) [00:02] [FAIL]
-
sommerfeld
I have a test setup in a bhyve vm with multiple host-zvols imported as additional disks.
-
richlowe
yeah, I have an illumos-image-builder setup that did it (though not with me)
-
richlowe
it's a bit hinky to use, but jclulow's image builder set up to build a fresh disk out of your proto area is a _wonderful_ test setup.
-
sommerfeld
and I'm queuing up a couple test fixes for some of the tests that are failing for me to go along with the vdev_indirect fix I'm testing.
-
tsoome
thats nice.
-
richlowe
I feel confident they will be appreciated as much as, if not somehow more than, the fix
-
sommerfeld
yeah, with all due respect to the work that'I don't think I'd ever put myself in a position where I'd want to use zpool remove on a system I cared about.
-
sommerfeld
err, that escaped early.
-
sommerfeld
with all due respect to the folks who did the work on it, I don't think I'd ever put myself in a position where I'd want to use zpool remove on a system I cared about.
-
danmcd
Yeah, in 20240418 there were 24 skips all ZFS trim related.
-
sommerfeld
It's good that the capability exists.
-
danmcd
49 failures (not counting fenix OS-8542 ):
-
fenix
OS-8542: SmartOS bash's built-in echo breaks illumos#16437 (Open)
-
fenix
-
danmcd
4 are BHYVE ones that wont' work due to nested-virt stuff
-
danmcd
1 mapfile due to the bizarre way we seem to build libgcc
-
sommerfeld
but it's not needed often enough that it is exercised enough to be dependable compared to alternatives like zfs send|zfs recv
-
danmcd
2 aforementioned NVMe test failures
-
danmcd
2 ksensor failures (also likely VMware-caused)
-
danmcd
and 40 ZFS failures.
-
rmustacc
ksensor should work despite vmware.
-
rmustacc
But likely
illumos.org/issues/15873 which I saw had been filed the other day.
-
fenix
→
BUG 15873: ksensor tests should accept the ksensor_test driver already being installed (New)
-
rmustacc
That is the ksensor tests don't rely on any actual hardware thing.
-
danmcd
Oh, and it's one ksensor failu7re not 2 The other non-zfs failure is svr4 packaging, which we don't ship.
-
tsoome
40 still? huh, could you mail me a link?
-
danmcd
Hang on...
-
danmcd
@rmustacc /opt/os-tests/tests/ksensor/ksensor_basic.32
-
tsoome
I was kind of hoping that since we both run tests on fusion vm, the resulting list should be similar:D
-
danmcd
```
-
danmcd
@rmustacc
-
danmcd
ksensor_basic.32: TEST FAILED: failed to open /dev/sensors/test/test.volt.0.1: No such file or directory
-
danmcd
tsoome: list forthcoming...
-
richlowe
danmcd: the ksensor failures are a race condition
-
richlowe
I'd talked to robert about this on arm, the problem is that the pre-test hook loads the test module, but doesn't (can't?) wait until they're ready
-
richlowe
so if you're unlucky, the actual tests then run before the test module is initialized
-
danmcd
Oh yes!!! That's right, I have to do that DIFFERENTLY becuayse it assumes it's there.
-
danmcd
(Perhaps I should put a delay in?)
-
richlowe
I'd have thought about this harder, but I thought it was unique to emulated arm, I'm sorry.
-
danmcd
I even filed a bug for it I think...
-
sommerfeld
danmcd: if there's something that you can poll for, a timed polling loop would be good. but a fixed sleep is going to be either too long (in which case it wastes test time) or too short (in which case it's flaky..)
-
sommerfeld
and if there isn't something you can poll for for driver init, maybe there's a missing interface...
-
richlowe
the devices existing, would be the obvious thing.
-
danmcd
fenix illumos#14886
-
fenix
BUG 14886: vmm_drv_test needs to be less IPS-dependent (New)
-
fenix
-
richlowe
but I'm not sure if that comprimises what the tests are testing
-
danmcd
(And it was an inconsistency...)
-
danmcd
ANYWAY... I owe tsoome a list of tests.
-
richlowe
danmcd: that's vmm _not_ doing what ksensor does (14886)
-
richlowe
what you want in 14886 is what ksensor does, what you want in ksensor is something to poll for readyness of the test driver _because_ it does what you want
-
richlowe
(I hope that made sense)
-
rmustacc
I had forgotten the bit on ARM and a race there. I can try to look at fixing it up. Sorry about that. It is unfortunate, but there is probably a cheap way to guarantee it's ready with devfsadm.
-
richlowe
me too, really, it didn't occur to me anyone else could lose that race
-
rmustacc
But I expect a devfsadm call should block on that much, but hard to say.
-
rzezeski
is there any reason our various kernel printf functions don't support '%#x'?
-
rmustacc
Probably just a lack of traditional support for the alternate form / need.
-
richlowe
rzezeski: I have been asked before in code review not to use it, because people didn't immediately know what it meant
-
richlowe
perhaps related
-
richlowe
(and to be fair, outside of %x, I don't know what it means for any other format)
-
rzezeski
Okay, well FWIW it would help to have it because stuff like the cxgbe common code uses it all over the place. And it means instead of getting useful data in the log you just get the character 'x'.
-
richlowe
Yeah, to be clear, I'm very in favour of it existing and working and being used
-
rzezeski
We may find it confusing, but for better or worse we got other's code in our gate that like to use it.
-
rzezeski
richlowe: How hard is this to support (I know nothing about printf stuff, I thought it was part of the compiler)? I could just add it as part of this T7 wad.
-
sommerfeld
rzezeski: %#x goes back at least as far as C89 (I have the ansi spec in dead tree form).
-
sommerfeld
rzezeski: it's implemented in C as part of the library in userland and in the kernel for kernel.
-
andyf
rzezeski - if the kernel printf functions you're using are tagged with __KPRINTFLIKE, you will also need to teach gcc that it's ok
-
tsoome
rzezeski it would be nice to have:) we use it in few places in loader ;)
-
sommerfeld
rzezeski: take a look at usr/src/uts/common/os/printf.c
-
andyf
and unfortunately then wait for everyone to upgrade their gcc
-
rzezeski
sommerfeld: k, well maybe I'll see if I can get it working
-
richlowe
rzezeski: it should be easy for simple cases like that
-
richlowe
but I've been wrong in _ndprnt-like places before
-
jbk
sommerfeld: though i don't think that's used in userland, libc has it's own (slightly convoluted) _doprnt
-
jbk
or whatever it's called
-
richlowe
it does, userland supports %#x obviously, because standards.
-
richlowe
the kernel has a different non-standard printf
-
richlowe
which is also where andyf's comment about possibly having to teach GCC that # is supported on %x comes from (on the other hand, there's a better than 50% chance that it thinks it's supported already...)
-
andyf
Well, yes, but I hoped the same about %h and %z (or something in that area)
-
rzezeski
I mean, if this has been around since C89, wouldn't everyone already know about it :)
-
andyf
I think they were supported by the kernel, but gcc didn't believe so
-
richlowe
Well, I'm placing the bet because ryan says that his code is already using the format and it's just not working.
-
richlowe
if gcc were grumpy, it would presumably be grumpy _now_ :)
-
rzezeski
Yea, this is code shared across FreeBSD/Linux/us
-
rzezeski
so I'm guessing they have it
-
richlowe
no, the kprint stuff is illumos-specific
-
richlowe
but it'd still be firing on your format strings already, if it were upset
-
richlowe
sorry I can't words good
-
sommerfeld
rzezeski: actually you want to fix vsnprintf in usr/src/common/util/string.c
-
rzezeski
Haha, yea, all I mean is, this code is compiled/running on other platforms. And I figure it works there since Chelsio choose to useit.
-
andyf
richlowe - good point. I don't see it in
github.com/illumos/gcc/blob/il-10_4_0/gcc/config/sol2-c.c#L43-L75 but it might be somehow silently allowed
-
sommerfeld
digging around in ancient history: looks like 7th edition unix doesn't document %#x while 4.2BSD documents it.
-
jbk
so it's probably established enough to be safe to use, i mean we're not postgres :)
-
richlowe
we have absolute control of our compilers jbk, we can use all _kinds_ of things we want to, when they work.
-
richlowe
the downsides are annoying, and I see why folks complain, but when you get the upside it sure is nice :)
-
sommerfeld
and for the kernel/standlone printf which doesn't do floating point types, the # modifier only has to deal with #x and #o
-
rzezeski
computers are so much fun
-
richlowe
it's a laugh a minute
-
andyf
../../common/inet/ip/keysock.c:2457:26: error: unknown conversion type character '#' in format [-Werror=format=]
-
andyf
cmn_err(CE_NOTE, "Test %#x\n", keystack);
-
andyf
It looks like an easy patch to gcc though, if we end up adding support.
-
rzezeski
andyf: help me out here because I'm dumb, the cxgbe code is already using it and compiles just fine, it just results in bogus output. What am I not understanding?
-
andyf
Which printf-like functions is it using with that?
-
andyf
and is the error (-Wformat) possible turned off?
-
andyf
*possibly
-
rzezeski
snprintf and then passing that to vcmn_err
-
rzezeski
I see nothing about turning off the format warning
-
rzezeski
cxgb_printf()
-
rzezeski
that's what it calls
-
andyf
Perhaps the compiler can't follow the chain?
-
sommerfeld
Unless cxgb_printfs is declared as printf-like, the compiler won't follow the chain.
-
richlowe
KPRINTFLIKE in this case
-
andyf
If you were to mark cxgb_printf as __KPRINTFLIKE(3) or whatever, you'd probably get the error
-
richlowe
(for it to fail)
-
richlowe
but yeah, we should add support to the illumos gcc, even if you're safe right now
-
sommerfeld
yeah, KPRINTFLIKE for kernel printf rules.
-
richlowe
because then we can all use it
-
rzezeski
richlowe: yea, I mean I can go and update all of the cxgbe common code, but then the next person to update this driver might not realize we don't support # and reintroduce this issue. Or we could add the KPRINTFLIKE so it's impossible to miss. But it does add additional noise to the delta between us and upstream. It just seems like we should allow #.
-
andyf
Oh, definitely, we should allow it, and update our gcc branches.
-
richlowe
yeah, I'm apparently being super unclear, I absolutely thoroughly want you to allow it (though it's not really up to me, obviously, it's up to you). We're just saying where the rest of the bear trap is hiding
-
richlowe
unfortunately, the best person we have at translating what I'm trying to say has already said no for today :)
-
rzezeski
richlowe: I was also being unclear in that I was agreeing with you. haha. Consider this 3 votes for adding # support.
-
sommerfeld
I may be too obsessive about this archaeology because I found and was just looking at 4.1BSD's doprnt.s, which has # support that isn't mentioned in the manpage.
-
sommerfeld
(and, yes, it's in VAX assembler...)
-
richlowe
the unix history repo does that to a person
-
sommerfeld
and there is a doprnt.s.old that doesn't have it. so that looks very much like at least one version of %# was born in or around 4.1bsd
-
sommerfeld
rzezeski: be glad that our printf core is in C, not assembler.
-
neirac
after my first build for debug and non-debug I could start doing incremental builds in nightly?
-
rmustacc
Or just use bldenv.
-
richlowe
if you did debug _and_ non-debug, and don't have multi_proto set, incrementals are weird
-
richlowe
in general if you're developing and what incrementals and convenience, only build one way or the other, until your RTI build
-
sommerfeld
I'd just use bldenv after a nightly. incrementals might work ok if the nightly is either debug or non-debug but not both at the same time in the same tree
-
richlowe
s/what/want/
-
neirac
oh ok, thanks for the pointers. I'll read about bldenv and the whole workflow again, I usually was doing full rebuilds (omnios bi) but is taking around 3 hrs.
-
richlowe
`bldenv <the same env file you use with nightly>` will spawn a shell you can work in where 'make install' will work to put things into the proto area
-
richlowe
much more convenient, I find, is `bldenv <env> <command>`, so like `bldenv ../foo 'make -wej 28 install'`
-
richlowe
(the documentation is probably clearer and better at explaining)
-
neirac
richlowe thanks!, I'll spend more time in the docs
-
rmustacc
-
neirac
rmustacc thanks, I was looking at the illumos.org/docs/developers/build. I 'll check the workflow next to have the whole picture.