-
toasterson
Does anybody know what this means? `first of 2 errors: timed out waiting for /var/svc/provisioning to move for 416146bb-4d88-4834-b3f2-d5b15b4fa2cb`
-
bahamat
toasterson: It's almost always an issue internal to the zone. That file marker gets moved as part of the mdata:execute service and something is preventing that from happening. Another potential cause is when you have way too many zfs datasets (often caused by heavy docker use, in which case you can run imgadm vacuum).
-
bahamat
Or if you have something other than docker making tons of snapshots.
-
toasterson
bahamat: thats trying to spawn base64⊙13 as vm
-
bahamat
As in, inside of bhyve?
-
bahamat
Or joyent brand zone?
-
toasterson
joyent branded zone. it's on coal
-
bahamat
OK, well then it might be the performance of vmware
-
toasterson
it's on libvirt
-
toasterson
on a pretty beefy server
-
toasterson
well I'll try minimal-64-lts
-
bahamat
But ultimately, /var/svc/provisioning needs to be moved to /var/svc/provision_success
-
bahamat
It's not about "beefy". It's about whatever is causing latency.
-
toasterson
a qcow2 on nvme raid1 via virtio should not have a problem
-
bahamat
-
bahamat
Well you're going to have to figure that part out.
-
bahamat
But I can tell you, the *only* way that happens is if the zone can't move that file before the timeout.
-
toasterson
Can the IO slowness happen if memory is insufficient? i.e coal is swapping?
-
bahamat
Yes.
-
bahamat
Anything that causes latency might be the culprit.
-
bahamat
You might have better luck with standalone SmartOS so that you don't have the whole headnode worth of services competing for cpu cycles.
-
toasterson
True. I might just boot a smartos USB image instead and clean the disk
-
bahamat
coal defaults to 8G of RAM, but if you've got enough physical memory it will do much better with more.
-
toasterson
Well that sounds like a simeple tweak :(
-
toasterson
:)
-
toasterson
wrong smiley
-
toasterson
ok, both enlarging ram to 32gb and stopping the CPU hungry Arm VM did not help. I'll try smartos clean next.
-
toasterson
well we can rule out the headnode being to overwhelmed
-
toasterson
on smartos I now get `first of 1 error: first of 1 error: vminfod watchForChanges "VM.js startZone (e23a358d-08b5-6603-92c6-ea75655674df)" timeout exceeded`
-
toasterson
looks like we dont't like something from virtio?
-
jperkin
toasterson: out of interest why running a zone image that was obsolete nearly 8 years ago?
-
toasterson
jperkin: because thats what the guide tells me to run :)
-
toasterson
is that the wrong image?
-
jperkin
which guide?
-
toasterson
-
jperkin
ok, so nothing current, just wanted to check
-
toasterson
what is the current guide to build images then?
-
jperkin
we don't have one as we don't ship per-product images any longer, they just aren't worth the time spent on them when you can effectively get the same thing from a base image + pkgin <whatever you want>
-
toasterson
fair. but what is the name of the base image in that case? base-64?
-
jperkin
base-64-lts for the yearly LTS releases, or base-64-trunk for the latest
-
toasterson
thanks
-
toasterson
and I again get `first of 2 errors: timed out waiting for /var/svc/provisioning to move for 0ec7acbd-36ed-cc97-b2e7-b1810e72c0b3`
-
toasterson
so it's not the image either :)
-
jperkin
there should be a log inside the image with the output from sm-prepare-image or whatever it is
-
wiedi_
toasterson: if you misconfigured the network this might also happen
-
toasterson
jperkin: grepping for sm-prepare-image in /var/log yields nothing. Is there a troubleshooting guide to get the log locations on where to start debugging?
-
toasterson
-
jperkin
no, it's just a case of reading the scripts and figuring it out, these aren't really supported parts of the stack
-
wiedi_
toasterson: well if your dhcp server doesn't provide you with an ip on the admin nictag that might happen
-
toasterson
hmmmmm, smartos does not have dhcp by default?
-
jperkin
a server? no
-
neuroserve
-
bahamat
toasterson: there's no default networking. But if you specify dhcp, it will do it.
-
toasterson
it was the dhcp option :)
-
toasterson
wiedi_: your tool is not usable on the same smartos host you want to build on i gues?
-
bahamat
Well, dhcp absolutely works.
-
toasterson
Well, not for me :) as I had no dhcp on that network
-
bahamat
Well if you told it to use dhcp, but you don't have a server to give it an address, it's not going to make up one on its own.
-
toasterson
True that. Unfortunetely there is no Dhcp embedded in my coffe when I do these things late at night.
-
Smithx10
Ordered some CNs with Mellanox 100GB nics, hope all goes well lol
-
danmcd
Cool re: Mellanox.
-
wiedi_
toasterson: you could do that but it might be a bit more complicated to install the dependencies. The idea is to use it on your local laptop or so and it will connect to the smartos box you want to build on via ssh
-
toasterson
yeah but my smartos box is behind a JUmpserver :)
-
Smithx10
danmcd: maybe @arekinath wants some MNX time to work on the drriver ;p ?
-
wiedi_
it will respect your ~/.ssh/config where you can use ProxyJump
-
nahamu
Tailscale can be run on a SmartOS GZ if that's something you want to try.
-
toasterson
wiedi_: hmmm, does it still need a dns entry to add the host?
-
toasterson
Because i get host add failed when i try to add it by the name in .ssh/config
-
wiedi_
it should not, as long as you can do "ssh host" it should be ok
-
danmcd
@Smithx10 Chelsio is another alternative, that gets actual illumos drops from the company.
-
Smithx10
Yea, I got screwed and had to go Mellanox because the 200 -> 100 splitter cables had to be qsfp56
-
Smithx10
the switch was 32 ports of 200gb. Chelsio only had qsfp28 cards until the next gen comes out
-
Smithx10
I checked with you earlier and they should be supported, luckily they are only for storage, we have known working intel currently so if we have an issue, the local pool will still work fine and they wont be storage available
-
danmcd
Ack.
-
arekinath
Smithx10: don't need paid MNX time, UQ will pay me for it since we use those parts extensively anyway
-
arekinath
but it should Just Work (TM)
-
arekinath
mellanox are pretty good at keeping the same base hardware interface at the moment
-
jbk
just don't start asking about RDMA :)
-
Smithx10
jbk: i promise.
-
Smithx10
after taking to rmustacc I learned a bunch
-
arekinath
do ask about memory usage and LSO though, I have plans for those ;)
-
jbk
haha
-
Smithx10
just going to be mounting nvme-tcp from within the guest atm
-
Smithx10
unless we can get nvme-tcp into the OS / BHYVE
-
Smithx10
or Propolis which might eventually get into smartos
-
jbk
hopefully i'll get some testing of my vmxnet3 changes (once they're good they'll be going upstream)
-
jbk
since it'll hopefully provide at least some conceptual testing for dealing with jumbo frames better
-
jbk
though it seems it and i40e at least both end up hitting a similar problem (kernel memory fragmentation making 9k DMA allocations 'slow')
-
jbk
it might maybe be useful for other NICs -- at least as a guide
-
Smithx10
not sure what would be involved to adding a nvme-tcp client into illumos
-
arekinath
mlxcx parts can have SRQs, which let you share one big pool of RX buffers between a lot of much smaller rings at once, and the hardware splits them up as needed
-
arekinath
which I would like to use for machines with lots of vnics
-
jbk
oh that would be nice
-
arekinath
so they can have full depth rings without paying for all the buffers for all of them all the time
-
Smithx10
-
jbk
it looked like other OSes basically deal with the problem by just allocating PAGESIZE chunks and just let jumbo frames be segmented by the NIC on tx and rx
-
Smithx10
-
Smithx10
we are going to use drbd, from a bunch of linux machines to replicate volumes, and just have users attach over the HVM linux nvme-tcp
-
Smithx10
but it could be smoother to add that into the OS
-
Smithx10
and add them at the vmadm layer vs within the guest. Not sure how much perf that would gain