-
sjorge
"zrepl status" always errors out for me, on two hosts
-
sjorge
while on the latest bloody bits but is fine on the previous ones
-
andyf
That's strange, but at least it's a narrow window to look at!
-
andyf
Since it's happening on OI too, it must be some change in gate, and I think richlowe narrowed it down to between june 9th and 30th
-
andyf
Well, not "must", but likely..
-
sjorge
just booted back to my previous be as it was getting annoying
-
andyf
I am not seeing anything strange in vim. Are you seeing this on the console or is it an ssh session or something?
-
sjorge
ssh session
-
sjorge
vim was a mixed bag
-
sjorge
zrepl status triggered it 100% of the time for me
-
sjorge
(And I am debugging a sync issue so it was a problem)
-
andyf
That looks fine for me too. iterm2 and mac terminal, various sizes
-
sjorge
weird
-
sjorge
Where is your primary console?
-
andyf
serial
-
sjorge
Mine is on serial, where zrepl/vim have always been funcky for me
-
sjorge
client side I tried iterm2 terminal.app and putty
-
sjorge
So it must be something in combination with a configuration I have I guess
-
sjorge
But what
-
andyf
Well, you and yuripv (who saw it on OI too)
-
sjorge
Once I have fixed the zrepl issue, I'll update again, assuming there will be another bloody build soonish with the openssh stuff fixed.
-
sjorge
I'll only do one then so it's easier to compare A and B
-
andyf
The updated openssh is already in bloody.
-
andyf
But yes, somebody who is seeing this problem will have to bisect packages and try and find where it was introduced
-
sjorge
Sadly neither of mine are easy to reboot, it takes a long time
-
sjorge
-
sjorge
That's the panic zprel throws from last night
-
sjorge
I still had a terminal window open
-
sjorge
Doesn't seem to be any changes between now and the working be that would come even close to terminal handling I think
-
sjorge
5.11-151047.0:20230609T142524Z
-
sjorge
Is the exact ts from the working build
-
sjorge
At least I found the zrepl not syncing problem now I can actually run zrepl status, my certs expired
-
sjorge
I guess it's time to move that to wireguard + tcp transport instead as I had the same issue lat year too
-
sjorge
I'll switch back to the new be on one of hte nodes after
-
sjorge
re-updating one of my hosts now
-
sjorge
Done and I can trigger it again with zrepl
-
sjorge
Let me do an truss on both I guess
-
sjorge
OMGFG
-
sjorge
when I run it though truss it works
-
sjorge
without truss it fails -_-
-
sjorge
Aha it's not truss
-
sjorge
I guess like yuripv I can't trigger it all the time I just got unlucky ;(
-
sjorge
I did get a truss output now from when it fails
-
sjorge
-
sjorge
It seems to panic shortly after a TCSETS (L1003)
-
sjorge
Based on the lines above it started to draw it's box layout and then calls that ioctl and it poofs out of existance
-
sjorge
That would also explain why andyf could not reproduce it as it's indeed 'random' I must have just gotten unlucky last night to repeatedly hit it
-
papertigers
sjorge: im doing zrepl over tailscale with the tls certs. I think I set the to expire in a year so I will likely be puzzled in a years time when things break haha. I should add a cron job to alert me when they are close to expiring
-
nahamu
Does zrepl support using certs that you provide? I wonder if you could use tailscale-provided certs...
-
nahamu
I think "tailscale serve" might even auto-renew them for you...
-
sjorge
papertigers I was like I can either rotate them or... push it over wireguard
-
sjorge
I wanted to setup a tunnel between the tiny backup box at my mom and home anyway
-
sjorge
Since it's encrypted over the internet via wg I just went to tcp transport
-
sjorge
No point in double encryption tbh
-
sjorge
Seems to work well enough, with nahamu work on getting wireguard-go into omnios-extra
-
nahamu
Is the wg connection using my bits, or is the tunnel handled by a non-illumos kernel?
-
sjorge
See above :p
-
nahamu
asked and answered
-
nahamu
nice!
-
sjorge
I wrote my own wg-quick manifest... then found the one from the omnios package
-
sjorge
So i dropped mine
-
nahamu
If your old one had any improvements over mine, please let me know.
-
sjorge
It did not
-
sjorge
Only downside is that the interfaces are named tunX tbh
-
nahamu
yeah, that's a shortcoming of the tun driver.
-
nahamu
but at least you can name your config file however you want.
-
sjorge
Yeah I noticed that
-
sjorge
I had it called tun0.conf first but that got messy so now I'm back to wgX.conf naming
-
sjorge
And let wg-quick handle it
-
nahamu
the tun driver works just well enough that no one has enough drive to write a better one.
-
sjorge
Having it fold into ipadm some day would rock, but I guess that means we need a none wireguard-go implementation first
-
sjorge
Exactly :D
-
sjorge
Eh its a bit quircky but it works well enough -> setup and stop caring about the quirckiness
-
nahamu
uh, my tailscale port does use ipadm. does my wg-quick script not??
-
sjorge
It does fro the ip stuff
-
sjorge
but something like ipadm create-wg -l wg0 ... would be nice too
-
sjorge
Well I guess you need a dladm create-wg + ipadm create-addr technically
-
nahamu
Ah, yeah, we'd need to port wireguard into the kernel. Someone smart enough might be able to port the FreeBSD kernel port over...
-
sjorge
We can probably get the wg dev to look over a port before integration if one ever get smade
-
sjorge
How the freebsd one went was a mess, but he did end up reviewing and fixing a ton of bugs in the first attempt, ideally it would just be reviewing but the port for freebsd has a whole history about it -_-
-
sjorge
It was the hallway talk at the EuroBSDcon I went to before covid
-
nahamu
Yeah, I think the current version is fine, but yeah, the first version was a mess.
-
nbjoerg
we still don't know what's wrong with the netbsd version :)
-
nahamu
nbjoerg: is it based on the FreeBSD one or somethign else?
-
nbjoerg
written from scratch
-
nbjoerg
("no, you are not supposed to do that, you are not smart enough!")
-
rmustacc
Wrong as in doesn't work or wrong because it's not 'approved'
-
nbjoerg
rmustacc: both
-
nbjoerg
according to him
-
sjorge
He does seem very opinionated on the crypto stuff used
-
sjorge
But that not necessarily bad (assuming he knows what he's doing and not a total ass about it)
-
jbk
the amusing part is if his choices ever prove to be wrong, the protocol is designed such that you can't practically do any sort of rolling upgrade
-
jbk
there's really no way to specify a 'v2' set of mechanisms... basically upgrade everything everywhere at once or stand up entirely parallel infrastructure and move things over
-
nbjoerg
yeah, that's my pet peeve with the design
-
jbk
i also find the whole 'you must assign every client a static IP' thing annoying
-
nbjoerg
that and the lack of a configuration protocol
-
jbk
like that's fine for tiny deployments
-
jbk
i don't see how that really works well in large deployments
-
nbjoerg
it doesn't, that's why some vpn providing shrinkwrapped wireguard
-
nbjoerg
kind of like the various commercial vpn tools on windows
-
nbjoerg
...so compared e.g. to openvpn, it can actually be a step back
-
sjorge
Yeah I do hate the needs static assignment
-
sjorge
I like my pipes encrypted and setup, addressing should not be part of the pipe laying so to speak
-
sjorge
But the performance is a lot better for me than openvpn (which I also use for mobile devices -> home connections)
-
jbk
isn't openvpn basically vpn over http[s] ?
-
sjorge
Yes
-
sjorge
Well TLS
-
sjorge
I run mine on port 443 for 'random wifi that is not mine' compat reasons
-
papertigers
jbk: I think that's tailscales value add. Management layer on top of wg that seems to work really well
-
jbk
i had (back at joyent) started on the _very_ beginning bits for kernel support
-
jbk
starting with chacha20-poly1305 support
-
jbk
though the openbsd code i was using to try to add to our existing chacha20 support was headache inducing
-
jbk
because of the rampant type punning happening everywhere
-
jbk
which by extension made it rely on the implicit C struct layout and padding for correctness, which made me uncomfortable
-
jbk
at least from my recollection
-
gitomat
[illumos-gate] 15793 When preferred_dc is set, DNS outage can cause smb/server and idmap startup to timeout -- Matt Barden <mbarden⊙rc>
-
jbk
in retrospect, it probably would have been easier to not even try to reuse the existing chacha20 code and just have a completely separate chacha20-poly1305 implementation
-
jbk
(I remember being kinda of shocked with the quality of the code given openbsd's vaunted reputation for security)
-
jbk
but the thought was to add the needed mechanisms to the crypto api, then could implement the protocol in the kernel using those
-
gitomat
[illumos-gate] 15794 Want file name in oplock break timeout messages -- Gordon Ross <gwr⊙rc>
-
jbk
and to maybe extend pf_key
-
jbk
since IIRC at least openbsd (and maybe freebsd) basically requires the kernel to hold all the info for every client in kernel mem
-
jbk
which (again) seems bad for larger deployments
-
jbk
and might allow for more flexibility on how you manage client keys
-
richlowe
andyf: I didn't narrow it down, I thought I had, but it isn't happening to me in general.
-
richlowe
I thought I narrowed it based on me being fine on the 9th and yuri broken on the 30th
-
richlowe
but I'm also fine in july
-
richlowe
or I'm mis-organizing the repro steps.
-
sjorge
i had a 100% reproducible rate yesterday but like 1/25 ish now
-
sjorge
i hope it's not timing based
-
sjorge
i did capture it with truss
-
yuripv
for me it's like `sudo vim ...` is broken, and then everything else starts to behave weirdly
-
sjorge
did the openssh update and reboot, it's now close to a 100% again for zrepl status :/
-
Agnar
moin, assuming I want to understand what goes wrong in illumos when I get: NOTICE: NVRM: GPU 0000:01:00.0: Failed to copy vbios to system memory. - how would I start debugging it?
-
Agnar
ej tsoome!
-
rmustacc
Agnar: So I can't find the string vbios in illumos.
-
rmustacc
The nvrm makes me think this maybe is nvidia related. Is that the case?
-
Agnar
rmustacc: yes, sorry I forgot to mention the important part. It's the /kernel/drv/amd64/nvidia driver version 470 that ships with OI. It should work, the card IS supported but it seems we have some issues in OI with it
-
Agnar
rmustacc: the driver from freebsd works, so does Linux
-
rmustacc
OK. I'm not really sure how to suggest debugging that. I guess try to see what APIs it's calling around when it emits that message.
-
Agnar
rmustacc: the question is, can I somehow get more informations about the driver with kmdb? ::modinfo seems not to provide more informations
-
Agnar
rmustacc: I also think of trying Ghidra on the module...
-
Agnar
It would be very helpful if we had some illumos folks at nvidia :/
-
rmustacc
There are some folks who used to work on illumos there.
-
rmustacc
Anyways, I would use kmdb to step through stuff until you find where the message is coming from or similar.
-
rmustacc
I don't have great suggestions here.
-
richlowe
I don't think the people we know at nvidia are at the part of nvidia that would be interesting to you.
-
rmustacc
Well, actually, if you try to load that module again after boot can you get that message to appear again?
-
rmustacc
If so, then you can start doing a bit of DTrace. Though is there even a symbol table?
-
richlowe
nvidia used to provide "source" within the package they themselves provided, to the non-juicy bits.
-
richlowe
back when it was an SVR4 package, anyway.
-
richlowe
it's unlikely to tell you anything about NVRM, but might get you a little somewhere.
-
Agnar
richlowe: I guess starting Xorg will trigger that again. good idea.
-
Agnar
richlowe: I'll play with that tomorrow, thanks so far!
-
rmustacc
If you can get it to reproduce the message then you can use DTrace to get a stack trace for what it's worth.
-
Agnar
rmustacc: yes! I'll get my dtrace book tomorrow and try to get a stack trace and maybe I get an idea why the copy fails :)
-
rmustacc
I mean, if you can get that message to reproduce reliably after boot then you could do something like fbt::cmn_err:entry{ stack(); }
-
Agnar
there are even fbt probes in the nvidia module
-
nahamu
I was under the impression that fbt can automatically find function boundaries to trace at runtime rather than needing those probes to be configured at build time.
-
rmustacc
That's correct. fbt is just looking at the symbol table basically.
-
rmustacc
sdt is build time static probes.
-
Agnar
ah!
-
Agnar
well, it's late, I'll try that tomorrow. thank you all so far