05:20:09 Smithx10: pfiles $PID_OF_PROCESS will give you ports, which you can then use as arguments to snoop/tcpdump/whatever. 05:20:39 (Unless your NAT is on the same machine, in which case you'll have to be careful to snoop on the process's zone's netstack) 12:59:17 I noticed that a few machines rebooted unexpectedly but nothing in /var/crash/volatile 12:59:48 Anything else to check to maybe get a hint at what happened? 14:31:18 maybe /var/adm/messages -- if it's a matter of the dump volume being too small, that should get spit out to there during boot 14:31:24 as long as it hasn't been too long 14:40:04 2 days 14:40:46 if you can still see the normal boot output in one of the messages files, it'd be sometime shortly after that 15:06:47 Only time I've seen unexpected machine boots are when I'm doing disk-intensive ops on my HDC (which I now suspect to be a cooling problem with the HDDs). 20:58:33 [<- this guy, I know] Going back to messing around with this machine. I hardware booted a opnsense on it, and it ran into roughly the same limit of routed traffic at ~700Mbit. I've got a replacement on order, and then I'm going to figure out what's up with this hardware 21:02:27 I realize it's old, but this CPU should be able to route 1Gb of traffic https://www.intel.com/content/www/us/en/products/sku/77988/intel-atom-processor-c2758-4m-cache-2-40-ghz/specifications.html 21:47:44 jbk: danmcd think this might be something "_" https://gist.github.com/Smithx10/5a740c3e7022a099fb79d78acac9fa99 21:48:24 This looks very familiar. Hang on... 21:49:25 jinni illumos#13700 21:49:26 https://www.illumos.org/issues/13700 21:49:44 Smithx10: ^^^ 21:49:54 First fixed in 20220811 21:50:41 well, that one and 14982 21:51:29 I believe postgres was specifically tripping over 14982 21:52:12 Added another, dump.. 21:52:17 looks like its the same thing on the other node 21:52:28 jinni illumos#14982 21:52:29 https://www.illumos.org/issues/14982 21:52:43 Smithx10: is that a post 14982 PI? 21:53:10 pmooney: got the bug id right? 21:53:30 14982 is a ZFS fix. 21:53:45 14892 21:53:45 Stack shows pollhead_delete issues. 21:53:47 sorry 21:53:57 jinni ilumos#14892 21:54:09 jinni illumos#14892 21:54:10 https://www.illumos.org/issues/14892 21:54:35 (should be marked as related to 13700) 21:54:52 one of those "A problem, but not _the_ problem" situations 21:54:54 so... this and the nvme bug the latest platform I should be good to pull in so if / when they fault again 21:55:05 the CN will boot with the fixes* 21:55:05 Smithx10's dump is from 202205xx which is pre-13700 21:55:12 *phew* 21:55:23 Smithx10: don't forget we drop 20221215 this week. 21:55:33 (Which is ALSO a Triton release.) 21:55:42 I didn't immediately recognize 13700 as the pollhead stuff, but the number raised my hackles for sure 21:56:10 joyent_20220505T001410Z says the gist. 21:56:29 Alright Ill just wait for 20221215 21:56:35 Thanks. 21:57:20 Cool, glad these are already known 21:57:27 ./back to b33rs 21:59:52 Smithx10: note that 14892 is NOT YET FIXED. 22:00:24 :( 22:00:48 What's pmooney waiting for :P 22:01:02 If you see PG invoking a panic after you jump to a post-13700-fixed PI, we'll need to deep-dive into 14892. 22:01:14 I think 13700 closes the 14892 window BY A LOT. 22:01:59 oh, bother 22:02:11 I'm totally mixed up about what I've fixed, and what I haven't 22:03:43 This bug is actually helping all of the folks using our PG service to learn how to handle failover 22:03:47 Bonus 22:03:49 lol 22:03:56 I should charge extra for that 22:05:16 You know you could do this? psql 'postgres://postgres:$password⊙1928,10.91.197.149,10.91.209.157/postgres?target_session_attrs=primary&sslrootcert=root.crt&sslkey=server.key&sslcert=server.crt&sslmode=verify-ca' 22:05:33 target_session_attr along with multiple peer addrs 22:05:48 so, I have a branch drafted for 14892 22:06:02 but it involved a bunch of changes, which spurred me to write more tests 22:06:05 and that's where it stalled 22:39:14 Smithx10: This is good to knwo :) 22:39:30 toasterson: yeesss 22:39:45 Not every psql library tho has support 22:40:13 yep, sadly 22:41:26 toasterson: tho, https://github.com/jackc/pgx/blob/v3.6.2/conn.go#L144 this pgx lib does 22:41:45 I think I recall you doing stuff with the go