00:03:47 Yeah the IP blocking is difficult with AI crawlers. They're run out of large networks so the crude ban hammer blast radius is gonna touch some innocent bystanders 00:04:20 Also not convinced the (free version of the) Anubis solution amounts to much more than security theater 00:04:43 These crawlers are a serious issue with no easy solution right now 00:05:44 No idea about the infra involved but you could give Anubis a try to see if it alleviates some of the load 17:48:17 hrm... 18:51:11 It’s so bad even on my small stafix 18:51:25 *static blog too 18:52:09 Traffic has gone 10x in the past year and it’s all AI crowlers and most just ignore robots.txt or worse, use it to crawl hidden stuff 18:54:00 it is very much "or worse" 21:31:22 I've got a nasty one.... 21:33:05 connect() is getting EINPROGRESS and doesn't seem to be sending out to the switch. did a packet capture switch side and didn't see anything coming in fronm dig on 2 CNs 21:33:18 the process just loops 21:33:31 https://gist.github.com/Smithx10/cf1b687a117a093ac199044ddbade28e#file-gistfile1-txt-L397 21:41:31 Smithx10: maybe check `connstat` and see what state the TCP connection is in, also run snoop on the host and see if SYN packets are going out or not. 21:44:07 very strange that snoop on the Server said we sent them, but switch doesn't see them https://github.com/Smithx10/debug-images/blob/main/pcap.png. The dig command intermittently gets stuck 21:45:21 rzezeski: yeah Fri Sep 12 20:08:34 UTC 2025 https://gist.github.com/Smithx10/740b9d61d28fbfb207a9b30e94a98815#file-dns-command-L140 when it times out tcp 10 seconds, this is what happens in the capture client side https://gist.github.com/Smithx10/740b9d61d28fbfb207a9b30e94a98815#file-gistfile1-txt-L1058 21:51:36 rzezeski: when it hangs it looks like the latest one that was hung is SYN_SENT from what I can tell https://gist.github.com/Smithx10/e07c3de9dd5cb5915febac1de182da3d 22:00:23 Yep, SYN_SENT is the state I would expect if the host sent a SYN and is still waiting for its ACK. If you see it hitting snoop it should technically have hit the wire, but snoop intercepts before the driver, so it could be the driver or the device rejected to send it for some reason. What is the link it's going out on? 22:02:16 i40e in aggr. 22:02:45 I have at least 2 CN in this state, behavior is happening in bhyve guests and os zones 22:03:04 I am gonna takethe aggr out of the scenario in one of them 22:03:38 okay, one thing you might do is take two snapshots of kstat for any mac, aggr, and i40e stats. And look for any counters that are changing that look suspicious, like errors or drops. 22:04:01 are other connections on the same host working on that link? like other TCP connections going out over the same aggr? 22:04:21 yeah there a bunch of virtual machines 22:04:37 internal customers 22:04:43 so other VMs are getting new connections established over the link okay? 22:04:54 no, everyone is having intermittent issues 22:05:21 tcp, udp 22:05:42 what kstat dumps you think have the most value? 22:05:43 hmmm, okay, I mean it kind of sounds like the i40e Tx freeze issues I fixed years ago, but those would grind the i40e device to a halt (no traffic for anyone) until a reset. 22:06:13 I can't think of them off the top of my head, but anything with 'mac', 'i40e', or 'aggr' in the name 22:06:55 connect(4, 0x005EAC28, 16, SOV_DEFAULT) Err#150 EINPROGRESS 22:07:10 errr 22:07:11 whhos 22:07:15 whoops* 22:08:45 I have to step away for a bit 23:10:30 rzezeski: kstats https://gist.github.com/Smithx10/822e28e4568bb67866939ae51bacf036 https://gist.github.com/Smithx10/9475909a1dd0b1eb90920caa7d950a21 23:19:58 Smithx10: i40e kstats might also be informative (there are a bunch of different reasons for a tx drop) 23:45:51 sommerfeld: I think they are in there on that gist should be 3 files i40e, mac, and aggr 23:46:19 each gist is a server, I haven't bounced if you'd like to see anything more