-
sjorge
fwiw, I got OmniOS booting OKish on linode (nano 1g vms, paravirt mode)
-
sjorge
shutdown -i6 -y just halts but the watchdog from linode brings it back up
-
jbk
a bit of an into the weeds question, but is there typically anything wired up on an LPC bus where it'd make sense to add acc checking w/ fault management?
-
tsoome
sjorge runlevel 6 is reboot;)
-
tsoome
sjorge or you meant that the expected reboot did not work and instead you got the kick from watchdog
-
sjorge
tsoome: yes, I expect it to reboot but it doesn't
-
rmustacc
jbk: I wouldn't be doing much of anytihng to change the LPC bus, especially since it's gone on most new platforms.
-
rmustacc
jbk: Also, what mechansim in the protocol would you use? Just things that are checking for the ranges you send there or something else?
-
Smithx10
rmustacc: Not sure if you answered this already, if so my apologies. FM is offlining a NVME device, which is resulting in me not being able to get the Serial Number / Location of the device that is offline now. I saw in your IPD to cache those values I think.... is it possible to get that info today, or do we need additional work?
-
jbk
i mean for a device on the LPC bus -- some drivers will include the DDI_FLAGERR_ACC in ddi_device_acc_attr.devacc_attr_access and then will call ddi_fm_acc_err_get presumably after one or more calls to ddi_{get,put}XX()
-
jbk
so maybe stating it differently, for a device that sits on an LPC bus (or behind an PCI-LPC bridge) will ddi_fm_acc_err_get actually be able to do anything?
-
jbk
since the isa module itself I don't think has any FM support, presumably it needs to be involved for any children that would want to use ddi_fm_acc_err_get()
-
gitomat
[illumos-gate] 16085 use modern libxml2 API -- Andy Fiddaman <illumos⊙fn>
-
rmustacc
jbk: So in this case you would expect that the nexus hierarchy would not set any fm capable flags in ddi_fm_init() and therefore the driver shouldn't acually be setting DDI_FLAGERR_ACC in anything or at least it won't actually happen.
-
rmustacc
But I'm not exactly sure.
-
rmustacc
The request to ddi_fm_init() children of the isa module should go to it and it shouldn't be setting flags I imagine.
-
rmustacc
Smithx10: I didn't really do anything about caching.
-
Smithx10
Sorry maybe I didnt understand this right
-
Smithx10
Providing interfaces that make it easy to snapshot information and then consume it when the device is no longer present. For example, the smbios -w or pcieadm save-cfgspace commands make it so we can capture data on a target system in a way that it can be sliced and decided on an entirely different system later.
-
rmustacc
There was an ipd about adding similar logic to nvmeadm.
-
Smithx10
ahhh ok
-
rmustacc
But in this case, the question would be does your system have a topo map that reflects the locations?
-
rmustacc
If so, I'd expect that the FMRI was there.
-
Smithx10
I've noticed diskinfo -P doesn't show locations and grepping for serial numbers in fmtopo doesn't result in finding the device
-
rmustacc
Then that means your systems don't have a topo map that has those mappings together.
-
rmustacc
And likely therefore fma doesn't have that either.
-
Smithx10
I think i recall that it was because that info was nested and I don't think we walked down to get that info.
-
Smithx10
Probably gonna just have the guy stand next to the array and run a scrub and find the non blinking light lol
-
rmustacc
Do you have another system with an identical layout?
-
Smithx10
we have a few of the same SKUs yea
-
Smithx10
so far all the differnet SKU we run have no luck getting info
-
rmustacc
OK, well you should be able to map that bridge to a slot.
-
rmustacc
Who makes the system?
-
Smithx10
HP, and SM
-
Smithx10
-
rmustacc
OK, so what we should be able to do is to based on the /devices path that is retired figure out what slot that is.
-
rmustacc
Because it'll correspond to a particular bridge.
-
Smithx10
On the 3 servers, what should I find || grep for in those paths to find it?
-
gitomat
[illumos-gate] 16098 properly escape backslashes per mdoc -- Robert Mustacchi <rm⊙fo>
-
rmustacc
Smithx10: I guess you probably want to start with finding the /devices path and go from there.
-
Smithx10
-
Smithx10
the nvme device doesn't show up in /dev/rdsk since fm took it out
-
rmustacc
Right, but you'll have the parent.
-
rmustacc
And you should have the retired note in prtconf and realted.
-
rmustacc
*related
-
Smithx10
-
Smithx10
-
rmustacc
Sorry, I probably can't quite hand hold you through this right now. But the basic thing is each bridge has a slot and the same bridge should be in the same slot on both systems.
-
rmustacc
So you can get it that way.
-
Smithx10
ok, np
-
Smithx10
thanks for the point in the direction
-
Smithx10
-
jbk
i wonder where it's getting 'physical slot: 11' from
-
Smithx10
jbk: I believe that is the PCI slot number, not sure if it maps to the physical drive bay number
-
rmustacc
It doesn't per se, but it will be consistent.
-
jbk
the parent device /pci@c1,0/pci1022,1483@3,4 appears to be a bridge
-
Smithx10
When the DC hand gets there ill power this box down and he can go through the 6 drives that are off and find the match
-
jbk
i know i've asked about it, but at some point I need to dig in and see if it's possible to override those labels based on the HW path so you can get something more useful than 'MB' if there's an actual label on the MB for the device (it does assume a system firmware update won't muck up the path, but that's at least something that can be checked and dealt with)
-
rmustacc
Depends on the enumerator.
-
rmustacc
But with a map you should be able to.
-
jbk
speaking of that, should dimm enumeration include the serial#, manuf into in the fm topo? (I see on mine it just shows the rank size, etc)
-
jbk
(wondering if something's broken and i need to dig in or not)
-
jbk
smbios shows it, so the info exists
-
jbk
one thing i was going to do for work was a small program that walks the fm topology to produce a HW inventory (sort of like fmtopo, but a bit more focused instead of the giant dump of everything)
-
jbk
though supplement with some additional info (e.g. we use lots of aggrs, so it'd nice to include the aggr config as well as IP config of the interfaces)
-
Smithx10
lol, this stuff seems rather complicated....
-
Smithx10
you'd think mapping the drive bay # to the disk inside it wouldn't be so brutal
-
jbk
you'd think
-
jbk
but then if it's a SCSI disk, usually there's a completely separate device you have to talk to
-
jbk
and then map back what it reports back to a disk
-
Smithx10
yea, its pretty brutal lol
-
jbk
and the standard for that is roughly 'we'll just kinda document what everyone does, but no guarantee some manufacturer won't do something different that'll break things'
-
Smithx10
one day i'll type diskinfo -P and it will just work :)
-
Smithx10
also from SKU to SKU nvme information will be in my oob or not
-
jbk
that also goes for the indicator lights
-
jbk
then for SATA it's different
-
jbk
and I haven't looked, but I'm guessing NVMe is also different
-
jbk
SATA at least, the indicator lights (if supported) are (more or less) associated with the disk (you talk to the same 'thing' to turn the lights on/off as you do to read/write to the disk, unlike scsi)
-
jbk
but AFAIK, there's nothing in there to enumerate a disk to some physical location
-
jbk
you have to basically just manually say 'port 0 corresponds to slot X', etc.
-
Smithx10
yea, I tried checking the Manual for that SM and it has nothing documented "_"
-
Smithx10
Next time a SKU comes in, we probably gonna make this apart of the process during first install
-
jbk
unfortunately, there's no easy way to add a topo map after the fact in smartos -- it more or less needs to be added to the repo
-
rmustacc
jbk: So the problem is that the memory controller doesn't know that information or provide you usually with a way to get it via i2c. So you need a way to map things together to get both the memory controller and smbios info.
-
jbk
is that something that could be done via a topo map? the one i'm using does reference the smbios enumerator, but no idea really if i'm specifying things correctly
-
rmustacc
You'd need additional code to glue it, but if you said you knew for certain that for example the location tag is guaranteed to map to a given location in the memory controller, probably.
-
rmustacc
A lot of these gotchas are why I've tried to experiment with designing the tree differently at Oxide, the PCIe tree we've prototyped, etc.
-
jbk
i guess i need to see what info i can get from the mc
-
jbk
the smbios info at least has enough to map a DIMM back to both a physical location and physical address
-
jbk
oh hrm.. this might be part of the problem.. for some reason.. x86gentopo_legacy is getting set to 1.. guess I need to figure out why
-
rmustacc
I'd expect that for everything?
-
rmustacc
Unless you had some random specific sun platform.