-
jbk
hrm.. it looks like the smbios3.0 spec includes records that should allow you to map a DIMM (from the smbios data) to a physical address.. is there any reason (other than just not implemented/ENOTIME) the imc driver couldn't use that (when present) to include the vendor/serial/etc. info?
-
rmustacc
jbk: Which piece?
-
rmustacc
Do you mean the bank locator?
-
rmustacc
Or Device locator?
-
jbk
the device locator..
-
rmustacc
So, I would probably have topo merry it up.
-
rmustacc
It's a free form string and varies with every vendor.
-
rmustacc
And will change from hw gen to gen.
-
rmustacc
At least, that was my experience when I looked at it back when I did the imc work.
-
jbk
it looks like you could take the physical address from a SMB_TYPE_MEMDEVICEMAP and match that record up with a SMB_TYPE_MEMDEVICE to get the device locator, manuf, serial, etc.
-
jbk
and use the physical address to map it back to the dimm from the mc
-
jbk
(asking the mc to return the DIMM for the physical address)
-
jbk
(I'm happy to play around with this and see if I can get it working if there's no obvious flaws I'm missing)
-
rmustacc
Intersting. I'd probably start with marrying it up in topo.
-
rmustacc
Rather than in the kernel.
-
jbk
i guess for fm, it really doesn't matter since as long as it's getting the info from somewhere, it should be able to use i
-
jbk
t
-
jbk
doesn't matter how it gets the info
-
rmustacc
Though I think it may be more complicated. I'll need to see what's going on on one of these systems.
-
rmustacc
I think this may not be a slam dunk.
-
rmustacc
Due to interleaving, I see things that all have the same starting address on a system.
-
rmustacc
Maybe because it's two dpc.
-
jbk
the SMB_TYPE_MEMDEVICEMAP does have the interleaving info... though my system doesn't do interleaving, so i'll need to find one that does to see what that looks like
-
rmustacc
Mine all have a value of 255.
-
rmustacc
Which makes me wonder if something is going on with the decoding, but I dunno.
-
rmustacc
Anyways, promising area, but something we'll need to dig a bit more into.
-
rmustacc
Otherwise, you can always do something with a map for a known scheme with the bank locator stringf.
-
rmustacc
*string
-
jbk
-
jbk
does that suggest that on intel, we'll never fail dimms larger than 8gb?
-
jbk
because there's no serd engine defined for larger dimms?
-
jbk
err no..
-
jbk
i think it's just propagating a fault w/ different serd parameters..
-
jbk
rmustacc: interestingly enough, it appears mcelog on linux is using that approach (smbios type 20 records to map physical addr to type 17 record)
-
jbk
though annoyingly, it appears (from checking a few different systems) that at least some HP and Dell systems don't supply those records :(
-
jbk
(my supermicro server does though)
-
rmustacc
I would probably just do the manual map personally.
-
jbk
probably a dumb question, but if you have an existing TCP connection, and the interface associated with the source IP does down (link down), is there a way to detect that aside from just waiting for the connection to time out (or looking for the link level sysevent and map the linkname back)? assume you don't care/want to wait to see if the link comes back
-
rmustacc
With just the tcp socket?
-
jbk
yeah, if there was an option or such that could be set... i didn't think there was, but sometimes there's stuff lurking taht isn't immediately discoverable
-
sommerfeld
you can monitor for interface changes with the routing socket.
-
gitomat
[illumos-gate] 16060 tcp data loss with recv() -- Arne Jansen <arne⊙dd>
-
jbk
that might be the way to go
-
jbk
the context here is iSCSI.. the typical setup I see w/ multipathing is that you'll have sparate vlans for each path which means multiple tcp sessions on different interfaces
-
jbk
(e.g. not using link aggrs)
-
jbk
and while I can see where a TCP timeout causes a path down, if the underlying link is down, we might as well immediately stop traffic across it at least until it comes back up
-
sommerfeld
(example of something using this - in.mpathd. see usr/src/cmd/cmd-inet/usr.lib/in.mpathd/mpd_main.c
-
sommerfeld
.. but be aware that this only catches some reasons why a destination might be unreachable..)
-
jbk
yeah.. that might be fine though... while the best we can do if packets are lost in the ether is just timeout, but when we know out link is down, we're better off not trying to send PDUs down it if we have an alternative
-
jbk
(I do think this sort of illustrates why trying to do SCSI over TCP isn't a great fit
-
jbk
but not much I can do about that...
-
sjorge
Does anybody know if the mariadb package in either omnios extra or pkgsrc have galera support?
-
sjorge
At first glance it looks like that is a no.
-
jbk
i don't even know what that is :)
-
danmcd
This comes from @bahamat:
-
danmcd
"The galera documentation says it's Linux only. Makes me think that maybe they know they have linuxisms that aren't/won't be supported elsewhere."
-
jbk
-
jbk
it looks like synchronous replication
-
jbk
it looks like c++
-
jperkin
sjorge: not the mariadb package (yet), but our percona-cluster packages do
-
jperkin
-
jperkin
we have at least one customer using it so it seems to be pretty stable on illumos, there are occasional linuxisms I need to fix on updates but the rest is OS agnostic
-
jclulow
jbk: I don't think a link going down actually means a TCP connection is going to be useless, per se, right? It presumably depends on what's going on with the routing table and is almost certainly site dependent (e.g., if you're using BGP it might be a few seconds and then you'll have a new path to the end host)
-
jclulow
If you need TCP liveness you really should do it inside the protocol at the interval where you want liveness to be tested
-
jbk
in general yes
-
jclulow
iSCSI probably should have done that at a transport layer below the SCSI packets
-
jbk
but in the context of iSCSI
-
jbk
it really ends up being worse
-
jbk
and i've never seen a setup where anyone is insane enough to route iSCSI packets
-
jclulow
It's almost certainly happening in modern deployments where you isolate a subnet within a rack
-
jclulow
and only do L3 forwarding between racks
-
rmustacc
(Or even if you live in the IPv6 future with a /64 per server)
-
jbk
i don't think those places are using iSCSI
-
jclulow
Yeah
-
jclulow
They might not be, I guess.
-
jbk
every setup i've ever seen is each path to the storage has it's own subnet (I suppose it could be routed within that)
-
sommerfeld
jbk: one approach that I've heard of injects individual server addresses into the local routing protocol so if there is a different path to the destination the connection can stay alive. But that is rarely done (requires host admins and network admins to trust each other too much, and I'm only a little sarcastic about that..)
-
jbk
well we all know the network is perfection in carnate, flawless even, and is _never_ the cause of problems :)
-
sommerfeld
for iSCSI multipathing, failover really should be driven by timeouts in iSCSI rather than relying on a signal from TCP.
-
jbk
yes, and it is, but you're still going to basically have stuff freeze until the timeout hits
-
jbk
but even then, if i have foo0 w/ 1.2.3.4/24 and bar0 with 1.2.3.5/24, once the TCP session is created, we're not going to send packets from 1.2.3.4 out bar0, are we?
-
jbk
if the link is down, we're just going to buffer until the link comes back or it times out, right?
-
jbk
if I have a CDB I need to send, and I have multiple paths to choose from, why would I ever want to pick the one that I know (or at least could know) is not going to be able to send it right now?
-
sommerfeld
jbk: depends on how routing is set up.
-
sommerfeld
both 1.2.3.4 and 1.2.3.5 are on 1.2.3.0/24
-
jbk
(and even that is pretty unusual... pretty much always foo0 would be on subnetA and bar0 would be on subnetB)
-
sommerfeld
.. and on which hostmodel the stack is configured to use (weak/strong/src-priority; see ipadm man page)
-
sommerfeld
weak host model would send packet out any interface if it believed it had the best route
-
sommerfeld
strong model sends it only out the configured interface
-
sommerfeld
other signals you could look for distributing requests across multiple equivalent connections would be number of outstanding transactions and time of most recent reply from the other end.
-
sjorge
jperkin aha thanks! I did not check the percona package, but percona is fine too.
-
gitomat
[illumos-gate] 15723 mbrtowc_l manpage omits "_l" from function name in prototype -- Bill Sommerfeld <sommerfeld⊙ame>
-
danmcd
While I'll be back for a period of Wed. night the 27th through Fri noon the 29th for the 20231228 release, and maybe Wed. the 3rd if Intel is intransigent, I officially enter my winter break at 5pmET today, and will be officially back on Monday the 8th. See you in 2024!
-
sommerfeld
enjoy your break!