-
danmcd
Smithx10: pfiles $PID_OF_PROCESS will give you ports, which you can then use as arguments to snoop/tcpdump/whatever.
-
danmcd
(Unless your NAT is on the same machine, in which case you'll have to be careful to snoop on the process's zone's netstack)
-
Smithx10
I noticed that a few machines rebooted unexpectedly but nothing in /var/crash/volatile
-
Smithx10
Anything else to check to maybe get a hint at what happened?
-
jbk
maybe /var/adm/messages -- if it's a matter of the dump volume being too small, that should get spit out to there during boot
-
jbk
as long as it hasn't been too long
-
Smithx10
2 days
-
jbk
if you can still see the normal boot output in one of the messages files, it'd be sometime shortly after that
-
danmcd
Only time I've seen unexpected machine boots are when I'm doing disk-intensive ops on my HDC (which I now suspect to be a cooling problem with the HDDs).
-
copec
[<- this guy, I know] Going back to messing around with this machine. I hardware booted a opnsense on it, and it ran into roughly the same limit of routed traffic at ~700Mbit. I've got a replacement on order, and then I'm going to figure out what's up with this hardware
-
copec
I realize it's old, but this CPU should be able to route 1Gb of traffic
intel.com/content/www/us/en/product…-cache-2-40-ghz/specifications.html
-
Smithx10
-
danmcd
This looks very familiar. Hang on...
-
danmcd
jinni illumos#13700
-
jinni
-
danmcd
Smithx10: ^^^
-
danmcd
First fixed in 20220811
-
pmooney
well, that one and 14982
-
pmooney
I believe postgres was specifically tripping over 14982
-
Smithx10
Added another, dump..
-
Smithx10
looks like its the same thing on the other node
-
danmcd
jinni illumos#14982
-
jinni
-
pmooney
Smithx10: is that a post 14982 PI?
-
danmcd
pmooney: got the bug id right?
-
danmcd
14982 is a ZFS fix.
-
pmooney
14892
-
danmcd
Stack shows pollhead_delete issues.
-
pmooney
sorry
-
danmcd
jinni ilumos#14892
-
danmcd
jinni illumos#14892
-
jinni
-
pmooney
(should be marked as related to 13700)
-
pmooney
one of those "A problem, but not _the_ problem" situations
-
Smithx10
so... this and the nvme bug the latest platform I should be good to pull in so if / when they fault again
-
Smithx10
the CN will boot with the fixes*
-
danmcd
Smithx10's dump is from 202205xx which is pre-13700
-
pmooney
*phew*
-
danmcd
Smithx10: don't forget we drop 20221215 this week.
-
danmcd
(Which is ALSO a Triton release.)
-
pmooney
I didn't immediately recognize 13700 as the pollhead stuff, but the number raised my hackles for sure
-
danmcd
joyent_20220505T001410Z says the gist.
-
Smithx10
Alright Ill just wait for 20221215
-
danmcd
Thanks.
-
Smithx10
Cool, glad these are already known
-
Smithx10
./back to b33rs
-
danmcd
Smithx10: note that 14892 is NOT YET FIXED.
-
Smithx10
:(
-
Smithx10
What's pmooney waiting for :P
-
danmcd
If you see PG invoking a panic after you jump to a post-13700-fixed PI, we'll need to deep-dive into 14892.
-
danmcd
I think 13700 closes the 14892 window BY A LOT.
-
pmooney
oh, bother
-
pmooney
I'm totally mixed up about what I've fixed, and what I haven't
-
Smithx10
This bug is actually helping all of the folks using our PG service to learn how to handle failover
-
Smithx10
Bonus
-
pmooney
lol
-
pmooney
I should charge extra for that
-
Smithx10
You know you could do this? psql 'postgres://postgres:$password⊙1928,10.91.197.149,10.91.209.157/postgres?target_session_attrs=primary&sslrootcert=root.crt&sslkey=server.key&sslcert=server.crt&sslmode=verify-ca'
-
Smithx10
target_session_attr along with multiple peer addrs
-
pmooney
so, I have a branch drafted for 14892
-
pmooney
but it involved a bunch of changes, which spurred me to write more tests
-
pmooney
and that's where it stalled
-
toasterson
Smithx10: This is good to knwo :)
-
Smithx10
toasterson: yeesss
-
Smithx10
Not every psql library tho has support
-
toasterson
yep, sadly
-
Smithx10
-
Smithx10
I think I recall you doing stuff with the go