15:35:51 so I ran into an odd bug with checksum offload yesterday. Turns out that the IPv6 raw sockets API includes IPV6_CHECKSUM to have the stack compute a checksum of the outgoing packet after source address selection (see section 3 of RFC3542). 15:39:05 But it would appear that this is busted in our stack for i40e - packet gets dropped by the driver; looks like it's a path which bumps the tx_hck_nol4info kstat (haven't isolated further just yet) 15:40:28 patching dohwcksum to 0 with MDB forces IP to compute the checksum and things start working. 15:45:15 not sure if it's a missing piece of the mac_ether_offload_info infrastructure or a bug in i40e 15:46:25 I will be filing a bug but would appreciate it if anyone has any insight into what should be happening here.. 15:47:00 someone (patrick?) did some work recently in that area... I don't know if that might address it (or maybe related depending on on current the code is you're running) 15:47:17 err on how current 15:47:41 Looks like that work landed in february and is in the bits I'm running. 15:52:24 or am i thinking stuff in an rfd 15:56:08 there's IPD 55 & 56 which extend checksum and LSO to certain tunneling protocols 16:05:18 sommerfeld: So to understand, our expectation is that the hardware will calculate it or software will? Is this about calculating an ICMP checksum? 16:08:56 OSPF on IPv6, using IPV6_CHECKSUM to specify the packet offset where the checksum lands. The code in ip_output_cksum_v6() sets HCK_PARTIALCKSUM and leaves computing it to the driver. 16:09:13 so raw socket, not ICMP 16:10:06 ("nxge_cksum_workaround" doesn't apply because it's not ICMP..) 16:12:44 OK. Gotcha. 16:12:49 I expect it shouldn't be going to hardware then. 16:12:51 mac_ether_offload_info sees a proto it doesn't know, doesn't set l4hlen, and ignores DB_CKSUMSTUFF(mp). 16:12:53 And we need to do it in software. 16:13:41 i40e sees HCK_PARTIALCKSUM without l4info and throws up its hands 16:13:47 Specifically the driver specifcying HCKSUM_INET_PARTIAL indicates only TCP/UDP and we've seen this a bunch. 16:14:09 That is there are other devices where it they don't support ICMPv6. 16:17:55 sommerfeld: I wonder if I did the wrong checksum bit there. But i40e was a long time ago. 16:18:30 Given we don't actually use start and the descriptors don't really support partial configurations. 16:21:37 so perhaps the fix would go in ip_output_cksum_v6() and (approximately) change the "proto != ICMPV6" to "(proto == UDP || proto == TCP)" 16:23:32 looks like the DB_CKSUM* interface is only consumed by nxge 16:27:00 https://github.com/illumos/illumos-gate/blob/master/usr/src/uts/common/io/qede/qede_gld.c#L2059-L2064 is what I was thinking about. 16:32:24 I'd have to go back through the datasheet, but I'm not sure if we can do this checksum action with i40e. 16:33:23 sommerfeld: that's because nxge is a very old driver, the newer driver use meoi to compute that stuff 16:33:59 yeah, but every packet pays the cost of filling in fields that no driver but nxge will ever look at.. 16:50:29 sorry, it's been a while since I've been working on mac/drivers, I might have been wrong to point at meoi, I would have to refresh my memory on all the different pieces, but you can clearly see mac_provider APIs making use of DB_CKSUM. E.g., drivers make use of mac_hcksum_get() to help fill out Tx descriptors. Since nxge is so old it just uses DB_CKSUM* directly. 16:52:43 sorry, I misread "git grep" output. yes, I see that they are in fact still used. 16:59:28 There are some dragons in the checksum/capab handling stuff, and it can get confusing fast. I'll be ramping up on all this again in the next couple of weeks and can probably help review whatever fix comes out of this. 17:00:17 sommerfeld: Do you have IXAF_SET_RAW_CKSUM set in the ixa when you get to ip_output_cksum_v6? 17:00:51 I guess it's not clear to me which path you're in in in that code. 17:03:24 yes, I believe IXAF_SET_RAW_CKSUM is set on the packet. 17:05:12 I've also noticed something broken with ipv6 router advertisements on igc that worked on other cards before, but didn't have time to debug yet. Starting snoop to switch the card into promisc mode makes it work 17:17:03 I filed https://www.illumos.org/issues/17593 17:17:04 → BUG 17593: We should not attempt to offload checksums for raw sockets with IPV6_CHECKSUM set (New) 17:19:07 rzezeski: thanks in advance for taking a look at this.. 18:01:05 Feeling nostalgic about OpenSolaris, it's been more than 15 years since.. What would be the distribution to install today? OmniOS, OpenIndiana? A general purpose one is what I am after 18:02:01 I'll eventually try more than one, but an advice for a first one is welcome 18:16:30 hrm.. 18:16:49 at least on a few NICs, dladm show-linkprop appears to always show the current MTU as the default 18:16:58 e.g. you enable jumbo frames, the 'default' value becomes 9000 18:18:13 shouldn't the 'VALUE' column show 9000, 'DEFAULT' 1500 given pretty much every NIC does in fact default to 1500, and POSSIBLE show - ? 18:20:29 mlxcx seems to be one 18:21:58 it certainly seems odd 18:23:13 wiedi: If you get more info on that let me know and I'm happy to help as I can. I may have screwed something up there. 18:24:06 Well, let's just a get a list of them jbk and we can file bugs and fix it. 18:25:56 thanks, will try to get some pcaps and open a ticket with more details when I have a moment :) 18:26:19 the only place I have a 9k default is vnics, but I don't have much actual hardware 18:26:31 so ok.. i just wanted to be sure that wasn't the intended behavior (sounds like it's not)... 18:27:42 (just noticed it from output from a customer, but waiting on feedback to determine if it's a renamed nic or a vnic since it's not obvious from the show-linkprop output they gave 18:34:59 vnics and other things will be a different story 18:35:21 yeah, that's why I want the details for the other links shown in the output 18:35:42 the mlxcx ones are fairly obvious 18:37:36 hopefully I'll get that in a bit... 18:46:51 If I gcore(1) a process, is there a snappy way with mdb to locate *callers* of a function? 18:47:12 SOMething like "dis-all-the-functions ! grep $FUNCTION_NAME", or something more clever. 18:51:02 what is the function? 18:51:18 if it's non-local, there will be a relocation for every reference 18:51:25 and you can pull them out of .SUNW_reloc and match them in the symtab 19:02:45 isnt it easier to dig from source? or there is no source? 19:04:37 weidi: I have a machine with some spare igc's I could experiment with. (currently only using one of them as a v4-only interface; a few months ago it successfully got a DHCPv6 address and default v6 route from my ISP) 19:04:38 tsoome: function pointers 19:04:47 tsoome: not that my solution helps much there either 19:25:16 untested WIP change at https://code.illumos.org/c/illumos-gate/+/4374 19:25:17 → CODE REVIEW 4374: 17593 Don't offload IPV6_CHECKSUM on raw sockets (NEW) | https://www.illumos.org/issues/17593 19:31:03 if Gallant is telling you something: https://cgit.freebsd.org/src/commit/?id=9e8c1ab0976c9a645a92ae45ad531ada3e4e6701 :) 19:52:50 haha wow 19:54:15 divlamir: If you mean a general purpose machine without graphics, I would say OmniOS. If you need a desktop environment, though, I think Tribblix or OpenIndiana are the options there generally. 20:03:33 IPA on the console, finally we can add pronunciations to our manual pages. 20:04:09 alanc can email the austin group asking for canonical pronunciations for standards things. 20:04:13 I'm sure it'll be good fun 20:12:47 @richlowe C_DecryptInit and it's in in.iked or libike (closed-source, sorry @tsoome ) 20:13:19 ah. right, I already forgot about it... 20:13:39 * danmcd still runs a punchin server on kebe.com 20:14:04 dtrace with ustack() for some time?:) 20:15:11 Oh damn, and it's Function Pointers Everywhere (TM) in libike and in.iked. I'll just have to do literally what tsoome ^^^ just said. 20:18:07 Oh... NVM. I won't have to worry about my problem anyway... no support of AES-[GC]CM in in.iked. 20:24:21 time to look for something like https://www.openiked.org ?:) 20:24:39 * tsoome hides 20:30:57 i almost had transport mode working, but i never had a chance to debug it 20:31:30 tunnel mode after that wouldn't have been _too_ bad I don't think 20:32:18 though i guess transpport mode doesn't get as much use which is unfortunate 20:33:58 basically you can manage over-the-wire encryption on a per-host basis 20:34:19 instead of having to worry about certs for every single app (which may or may not have bespoke ways of dealing with that) 20:36:32 IIRC, the problem was some problem dealing with the incogruence of IKEv2 traffic selectors and how you represent those in the kernel and trying to basically 'negotiate' 20:38:28 (kernel wants ADDRESS/MASK (or /PREFIXLEN) IKEv2 does START ADDR-END ADDR so you have to figure out the intersection(s) that can be expressed as an address+mask 20:43:05 danmcd: ustack would be the easiest way, if you can provoke it with sufficient coverage 21:32:36 [illumos-gate] 17554 Add -p flag for "smbadm lookup" for parsable output -- Chao Wang 21:32:37 [illumos-gate] 17556 SMB client test memory leak -- Gordon Ross 21:32:37 [illumos-gate] 17557 Memory leak in PKCS11 C_DecryptInit with AES_CCM -- Gordon Ross