-
richlowe
tsoome_: oh, so we and lld do what it expects, but binutils doesn't now, and won't?
-
richlowe
despite both of us having implemented it based on what gnu used to do
-
gitomat
[illumos-gate] 16317 SHA2Update() is wrong for 512 MiB or bigger blocks -- Bill Sommerfeld <sommerfeld⊙ho>
-
andyf
It was fixed by illumos 13970 (fenix), so 'startstop.patch' is no longer required for building illumos.
-
fenix
BUG 13970: loader: BIOS loader ld script needs to use KEEP statement with linker sets (Closed)
-
fenix
-
andyf
I think it's been long enough now that we can drop that patch in omnios - I don't think we're building any gate versions from 2021 any more.
-
igork
hi all, i have panic at nvme with latest illumos updates
-
igork
-
igork
1 panic[cpu1]/thread=fffffe000aaebc20: programming error: invalid NS/format in cmd fffffe065806ee00
-
igork
it is panic under vmware 6.7 with nvme controller
-
rmustacc
With VMware's virtualized controller or passthrough?
-
rmustacc
So, uh, that's an identify namespace command with nsid 1 set. So at first glance seems pretty legal? What's the state of the namespace?
-
igork
rmustacc: added NVMe virtual controller to vm
-
igork
no drives attached to this controller yet, one drive attached to SATA virtual hba
-
wacki
@andyf: Thanks for the info I removed the patch from my commit again.
-
igork
rmustacc: no panic if i have one nvme attached to controller.
-
igork
i have panic if i have just NVMe virtual controller without attached drives
-
rmustacc
I guess we'll need to look at the virtual controller state then to understand why asking for this is crashing. Is it actually advertizing a namespace being present and active?
-
jbk
it wouldn't surprise me if VMware is perhaps playing loose with the NVMe standards
-
jbk
IIRC, it advertises trim support (whatever the NVMe term is.. every protocol uses a different name), but then returns an error if you actually issue the request
-
jbk
... though I wonder now if maybe instead on VMware we should do write zeros
-
jbk
that's what you have to do with virtual scsi disls
-
jbk
err disks
-
jbk
(which reminds me, I need to upstream that here at some point)
-
jbk
basically on vmware virtual disks, it uses WRITE SAME to write out 0s instead of UNMAP
-
jbk
which will actually shrink a thin provisioned image
-
jbk
because vmware doesn't support unmap for virtual disks
-
sommerfeld
while zfs can compress an all-zero block to a null blockpointer..
-
jbk
this is in lieu of issuing a SCSI UNMAP command
-
jbk
so below zfs
-
jbk
(for VMware disks)
-
jbk
so you can zpool trim a pool on vmware and shrink the vmdk size
-
andyf
Beats having to shut down the VM and use the hole punch thing from the ESX CLI
-
andyf
Or maybe both are necessary?
-
jbk
i mean, internally, when i issued the ioctl to a virtual disk, i could see the size of the vmdk shrink without any other intervention
-
jbk
so I don't think so
-
jbk
it doesn't help there's actually two different ways to unmap in the scsi standard
-
jbk
(because of course why not)
-
jbk
there's the actual UNMAP op
-
jbk
but you can also set a bit in a WRITE SAME op to UNMAP
-
andyf
Things may have improved since I last used VMWare in earnest, which was mostly 5.5
-
jbk
i just need to pull out a few bits -- mostly because I don't think they'd be wanted upstream (for disks that support absolutely nothing of unmap/etc., it resorts to WRITE SAME w/ a block of zeros to basically erase those areas of a disk... which turned out to be useful for other stuff for us, but probably not something that'd be useful more generally
-
jbk
andyf: they haven't
-
jbk
their SCSI support is still pretty shit IMO
-
jbk
and for some reason, i suspect that's unlikely to change any time soon :)
-
jbk
just call it a hunch
-
andyf
:)
-
sommerfeld
jbk: vaguely related in terms of thin provisioning -- I've wondered for some how hard it would be to get swap to use UNMAP or equivalent.
-
jbk
ISTR some OSes (can't remember if its Linux or FreeBSD) attempt to trim the swap device when it's added
-
jbk
though I guess we'd probably want to make it an additional flag to swap -- since people could still be using the swap device as a dump device as well
-
jbk
in terms of while the system is running, that I don't know -- i mean it just needs to call ldi_ioctl(), but haven't looked at teh swap code too closely to know how problematic that might be in practice
-
igork
rmustacc: i have no ideas and it was working before latest refactoring of nvme driver
-
gitomat
[illumos-gate] 16020 Borrowed abds ignore metadata flag during allocation -- Jason King <jason.brian.king⊙gc>
-
Agnar
good evening. asuming you are a nvidia graphics driver for solaris, which kernel function would you use to reserve memory to copy your vbios into? I dtraced on :cmn_err:entry and printed the stack(), however the deepest call I see is from :nvidia:nv_printf:entry - and this is pretty sure not the call that produces the error...
-
alanc
If I were such adriver, I might call one of the functions in
github.com/illumos/illumos-gate/blo…sr/src/uts/common/sys/gfx_private.h since they were specifically created for me and me alone to use
-
alanc
but I don't know what the actual driver does
-
Agnar
alanc: thanks for the pointer, I have so no clue about driver programming I just poke around and hope to touch a nerve :)
-
alanc
and I was just reminded by the comment in the file that while it was originally created for the nvidia driver it was later extended to use by the drm graphics drivers as well
-
Agnar
ah! I'll browse through that in a minute
-
sommerfeld
jbk: one of the things I've noticed is that the swap block allocator is a clock algorithm, which means that, over time, your originally-sparsely-provisioned swap vdev gets fully populated.
-
sommerfeld
trim on swapoff/shutdown would also be good for transient vm's (no need to retain swap contents while the thing is off).
-
copec
sommerfeld, that used to be classically well known that Solaris swap was the only *nix whose swap worked well
-
copec
sommerfeld, you should check out Solaris Internals 2nd edition, it really hammers home how much it is a highly engineered <s>Boeing<s>Airbus, compared to Linux (which I still love Linux)
-
copec
Every class of hypervisor still has to implement all the core kernel components to share the machine. As good as Xen is now, it never made sense to me to discard all the hard earned knowledge
-
Agnar
alanc: after reading throught that I would suspect gfxp_alloc_kernel_space() being called, would you agree?
-
sommerfeld
copec: I'm not sure what your point is. TRIM basically didn't exist in the timeframe you're talking about. Illumos should evolve to use new features of both physical and virtual hardware.
-
copec
sorry, I tangent-ed off of the solar swap
-
copec
I agree, although modern SSDs internal allocation seem to manage just fine with or without TRIM
-
Agnar
but it takes a size_t as an argument, and I should be no problem if you ask for x +size_t, you get it. maybe something in that function
-
copec
I have many SATA SSDs in SAS enclosures that don't use TRIM and I compare their performance to the same disks in enclosures with SATA, and the performance difference after some time is indistinguishable
-
copec
There are all "enterprise" level disks though
-
sommerfeld
what's more, illumos by default swaps on a zvol rather than a dedicated slice of physical disk; a "trim" (via the ldi_ioctl mentioned above) would free blocks to the pool for reuse even if there was no TRIM going to the physical hardware.
-
copec
Yeah, TRIM in that regard makes total sense. The swap zvol shouldn't be sparse though, and at least for me that zvol is a drop in every zpool I have, so it doesn't really make the difference in terms of a zpool getting towards full and slowing allocations
-
Agnar
ok, it's not fbt:gfx_private:gfxp_alloc_kernel_space
-
jbk
sommerfeld: it looks like maybe if we added an ldi handle to struct swapent (and allocated a flag to incidate if trim should be done), we could maybe issue the ioctl in swap_phys_free()
-
Agnar
ok, different approach.
paste.ec/paste/eIj27fAE#4cWppNTJaVyAcqLy6W0wB3MkGJSc6dnyCsyL4WuuVjl shows a call to nvidia`_nv001009rm, however dtrace -l doesn't find that one
-
sommerfeld
jbk: yeah, but can you do I/O from where swap_phys_free is called without creating bigger problems?
-
Agnar
I tought fbt are generic probes?
-
sommerfeld
IIRC the fbt provider will decline to create a probe for a function if the prologue or epilogue are not what it's expecting.
-
Agnar
sommerfeld: what does that mean?
-
sommerfeld
Agnar: the function prologue is the set of instructions that first execute on entry to a function that do things like set up the stack frame, save registers, etc.; epilogue undoes that on the way out.
-
sommerfeld
fbt has some assumptions about the structure of the prologue and if it's not shaped the way it expects it doesn't create fbt probes for that function.
-
Agnar
ah,understood
-
Agnar
so for "private" functions in closed source code, this could happen and we have no clue of course.
-
sommerfeld
yes, there are compiler options that affect how the prologue is generated and if you're going for performance above all other considerations you probably will step outside the line of what dtrace can cope with.
-
Agnar
which makes sense for a graphics driver
-
Agnar
heck, I need to sleep...thanks so far for your help, alanc and sommerfeld
-
richlowe
if you disassemble that function, we should be able to tell whether dtrace would like it
-
richlowe
if I remember, the most important part is that it begins by pushing a frame
-
alanc
if I remember correctly, the nvidia drivers are cross-compiled using gcc on Linux, and thus likely without the stack-frame patches in the illlumos gcc
-
jbk
sommerfeld: good question, unfortunately comments don't really answer that.. an alternative would be more complicated
-
jbk
e.g. you'd probably need to track both 'needs trim' and 'free/ready for reuse' so you're not holding the swapinfo_lock while trimming (since that'd block allocation/freeing)
-
jbk
(side note, it looks like the swap code is using it's own homegrown bitmap implementation to track free/in use... wonder if it'd be nicer to convert it to use sys/bitmap.h
-
jclulow
copec: It's actually fine for the swap zvol to be sparse, FWIW.
-
jclulow
You can make a reservation for it, obviously, to make it thick
-
jclulow
But it is not required. And pre-zeroing it does not improve performance; if anything it makes it worse because of COW.
-
richlowe
I think that's what sommerfeld mentioned, they can be sparse but algorithmically stop being sparse as fast as possible
-
copec
jclulow, I mean you'll also have other problems at that point, but what happens if the zpool fills while the sparse zvol can't allocate?
-
jclulow
You can't page more things out
-
jclulow
It is not ideal, but it is not necessarily immediately fatal
-
richlowe
you'd probably want a reservation, at least.
-
jclulow
I mean I think you want _some_ reservation. But the tension created by never-overcommit + fork() leads to some hard bargains
-
richlowe
I can't believe it's the future, and fork v. vfork still matters
-
richlowe
memory big, software bigger
-
jclulow
I mean really we should posix_spawn(3C) wherever we can -- and probably look at moving it into the kernel
-
pfr
Is anyone here using Tribblix? would love some pointers with setting up wifi (wificonfig)
-
jbk
yeah, if no one else gets to it first, it's something I'd like to look at (though seems like there's a lot of pieces to get right).. IIUC Solaris made it a separate syscall at some point
-
jclulow
pfr: I don't see ptribble here right now. Is "wificonfig" a Tribblix tool?
-
jclulow
Oh, no, that's part of the base OS
-
jclulow
I can't imagine it is Tribblix specific
-
pfr
jclulow: not sure tbh.
-
pfr
I have read the man page but I cannot figure out how or where to add my password
-
pfr
'wificonfig connect' doesn't have a password flag
-
jclulow
pfr: I fear you may be in for some disappointment
-
richlowe
wificonfig is the base OS, but I think old
-
jclulow
pfr: I would actually look at the dladm page
-
richlowe
I think you want dladm *-wifi
-
jclulow
yeah
-
pfr
I didn't find wpa_supplicant in any of the tribblix overlays
-
richlowe
and we need to work out why wificonfig is (still?) there?
-
jclulow
richlowe: Yes I am unclear.
-
jclulow
pfr: The details are managed for you by "dladm *-wifi" I believe
-
pfr
richlowe: yes, I also looked at dladm, but was also unclear how to connect. The tool looks pretty comprehensive for complex networks. I just need to connect to a private home network
-
jclulow
You would use "dladm connect-wifi"
-
pfr
and I'll presumably be prompted for the SSID and password?
-
pfr
Sorry, I'm at work procrastinating so not on my illumos machine rn
-
jclulow
I think to stick a password in there you want to create a secobj, so "dladm create-secobj"
-
jclulow
e.g., see "Example 3 Creating a WiFi Key" and then "Example 4 Connecting to a Specified Encrypted WiFi Link" I guess
-
jclulow
in dladm(8)
-
pfr
thanks. I'll look into that
-
rmustacc
igork: Yeah, I get it. Just the more you can provide me, the more I can try to help. I don't understand how that's currently an illegal request without more context about the state of the controller.
-
jclulow
It does look like "dladm create-secobj -c wpa somenameforkey" would prompt for the password yeah
-
gitomat
[illumos-gate] 16324 Want Micron 7300, 7400, 7450 log support -- Robert Mustacchi <rm⊙fo>
-
gitomat
[illumos-gate] 16325 nvmeadm: want ability to write log page to raw file -- Robert Mustacchi <rm⊙fo>
-
gitomat
[illumos-gate] 16326 Update NVMe error status codes for 2.x -- Robert Mustacchi <rm⊙fo>
-
gitomat
[illumos-gate] 16327 wdc nvme assertion clearing support -- Robert Mustacchi <rm⊙fo>