00:11:31 No hockey tomorrow, just let me know what to do, im a bit confused 00:17:20 maybe ill try zdb -dddd zones/$dataset 00:24:04 for that dataset, https://us-east-storage.solutions.iqvia.com/bruce_dev/public/zdb-output-dd and https://us-east-storage.solutions.iqvia.com/bruce_dev/public/zdb-output-dddd 00:30:35 so it looks like..... from that output.... there are a lot of ZFS plain file and ZFS directory that write 512 bytes but take up a 128k block "47087 1 128K 512 0 512 512 100.00 ZFS plain file" 00:31:43 Yeah, that's what it looks like. 00:32:17 Percent empty: 78.749621 00:32:36 So..... what is writing out the large amount of /system/contracts? 00:32:38 is that normal? 00:38:18 https://us-east-storage.solutions.iqvia.com/bruce_dev/public/inodes 00:38:46 has a list of du --inodes sorted 00:39:12 Well a contract is a group of processes 00:39:24 SMF, for example, each service runs in a contract. 00:40:33 What's that behavior like? 00:40:54 Does every process that spins up in lx create a directory / file in contracts? 00:42:09 No. Having 918k of them seems like way too many 00:42:19 This box has been up 00:42:06 up 224 days, 4:37, 4 users, load average: 0.23, 0.26, 0.25 00:42:51 I mean maybe... 00:43:16 but for example one of my LX zones, I've zlogin'd to it. The zlogin is under a contract 00:43:44 prstat shows Total: 33 processes, 301 lwps, load averages: 0.23, 0.26, 0.26 00:44:02 dhcpagent, which is running out of /native and is spawned by hypervisor is in another. All other processes are grouped into a single contract. 00:44:25 20220505T001410Z 00:47:56 https://gist.github.com/Smithx10/ebbf058e264b079723aa8e2587fe98d3 00:48:44 Any ideas on what could change that behavior ? Is it safe to rm -rf /system ? 00:51:24 No 00:51:36 But if you reboot the zone I think it'll clean it all up 00:52:37 I'll ask them to bounce 00:53:16 Guessing /system/contracts doesn't clean up ? 00:53:38 I think it should. 00:53:58 Any logs I should check to see if its having issues doing that? 00:56:57 I don't think so. IIUC that whole directory is managed by the kernel. 00:57:44 /system/contract is backed by ctfs. 00:57:49 So it's entirely synthetic. 00:58:20 So I don't think there's anything to do from a user perspective there. 00:59:02 But it seems odd to me that there'd be 918K files in it. 00:59:15 Not disagreeing there. 00:59:45 But being ctfs backed, a reboot should clear that up, shouldn't it? 01:00:02 Probably, yes. 01:00:09 Is this causing a cascading problem? 01:00:29 Though probably I would try to figure out what's in there or take a dump if you're going to reboot. 01:00:48 Well, this stemmed from trying to find out where all the space went. 01:01:09 I do not believe that any zfs space is used by that file system. 01:01:10 The zone is using about 13G, but is consuming about 20G 01:01:25 But we can check the underlying impl. 01:01:34 But I think every ctfs node is anonymous kernel memory. 01:07:32 Above there is a link to the sorted inodes on the lx zone 01:08:30 But since you transition to another mount point I don't think that would be backed by zfs. 01:08:34 I guess we could just inode count x 128 k and see if it even makes sense 01:09:22 The ctfs_mount doesn't have a hold to the underlying file system that's immediately obvious. 01:09:52 The zdb output has a lot of empty looking files that don’t show the path 01:10:03 Errr not files *. Objects 01:10:07 I've spent very little time so could be something else ultimately. 01:10:40 was on my phone, jsut got back on the lt 01:44:57 zomg. since it's a very self-contained golang binary program, I can just pull in the package from nearly any existing zone: 01:44:59 pkg_add https://pkgsrc.smartos.org/packages/SmartOS/2022Q4/x86_64/All/tailscale-1.34.2.tgz 03:39:20 Got the paths of the blocks with zdb https://us-east-storage.solutions.iqvia.com/bruce_dev/public/zdb-output-dddddddd 03:56:20 Does a symlink take up a recordsize in zfs? 04:22:44 Yeah, it's a file 04:53:56 Still not 100% certain whats taking up all the space, heading to bed tho 13:37:21 Worked with some folks over in #openzfs and found out that a lot of space was stuck in the "on delete queue" via zdb 13:37:32 looked into that, and restarted "promtail" 13:37:44 which is some grafana logger rotatter thingie 13:38:07 im guessing promtail was keeping them open 13:43:47 Not sure if its LX, or Just Illumos, but some windows folks https://github.com/grafana/loki/issues/3668#issuecomment-1252441103 ran into this 13:44:54 sigh 13:45:42 I run grafana on native but not promtail, sorry, so can't vouch for it being an issue on illumos 14:31:27 Smithx10 - do you have the promtail config? 14:33:03 neuroserve: https://gist.github.com/Smithx10/e598adf082d366c4f4fd883b602fb9b0 14:35:45 thats basically two logs (/var/log/messages and /var/log/haproxy.log) 14:36:47 Smithx10: i wonder if it periodically would stat the fd it's tailing and if it sees the link count drop, if it'd stop tailing the file or not.. 14:37:10 it seems like that would solve the issue... but also not something i've ever tested... 14:39:41 Smithx10 : do you know the promtail version, too? 15:05:11 version 1.6.1 15:11:43 ok - I've got 2.5.0 - current seems to be 2.7.1 15:36:55 Whats the command to get the SN for a disk ? SCSI c3t5000CCA27077CE29d0 HGST HUH721212AL4200 11176.00 GiB no no 15:37:21 The most reliable way is iostat -En 15:38:45 diskinfo -P ? 15:38:56 `diskinfo -c` can, but often times can't find it. It was never clear to me why, but it's apparently because they enumerate disks in a different way. 15:39:08 Anyway to light it up from within SmartoS? 15:39:14 trying to blink it so it can eb replaced 15:39:34 Yeah, but it's not straightforward. 15:39:47 yeah, i notice with my internal nvme disk that diskinfo doesn't show the serial.. but i think that might be because i'm using a custom topo map that doesn't enumerate the location right now 15:41:15 Smithx10: Ok, so use iostat -En to get the serial number of the device 15:41:56 Then use /usr/lib/fm/fmd/fmtopo to dump devices. Find your serial number to get the enclosure and bay number 15:42:40 Then use /usr/lib/fm/fmd/fmtopo | grep indicator and look for your enclosure/bay to get the fmri of the device 15:43:28 Then use `/usr/lib/fm/fmd/fmtopo -P facility.mode=uint32:1 ?indicator=ok2rm` to turn the light on 15:43:54 to turn the light off, do the same command but use uint32:0 instead 15:44:00 Then use /usr/lib/fm/fmd/fmtopo to dump devices. Find your serial number to get the enclosure and bay number dont see the serial number, do I need to turn up the output? 15:45:37 https://gist.github.com/Smithx10/b514b351aa78fc9089331e8021503655 15:46:00 As far as I know, there's not a way to increase the verbosity. 15:46:11 The serial is usually part of the FMRI. 15:47:36 I have an m2 SSD in my headnode, and I don't even see anything that might be it in the fmtopo output :-( 15:47:41 this is what i get from topo 15:47:41 https://gist.github.com/Smithx10/b514b351aa78fc9089331e8021503655 15:48:58 So starting on line 567 is scsi-devices 15:49:55 But that doesn't even have serial numbers in the FMRI... 15:50:03 So, no idea... 15:50:28 ./shrug wonder if I can do this in super micro oob 15:51:04 Yeah, I don't know. rmustacc would probably be the best person to ask to better understand this stuff. 15:51:19 He's the one that showed me how to drill down to light up the light. 15:51:55 But fmtopo often defies my expectations. I really don't understand it well. 15:53:05 I gotta take off. Back in about an hour. 16:00:11 lol, this seems like it shouldnt be so hard rofl 16:05:59 Smithx10: Can you add -V to your fmtopo command? 16:08:23 rmustacc: https://us-east-storage.solutions.iqvia.com/bruce_dev/public/topo-v 16:10:01 So this is a platform Joyent had a map for. So diskinfo -P should have told you the slot. 16:10:05 Did it not? 16:10:36 rmustacc: https://gist.github.com/Smithx10/8adfff35cc52a1ec8088365be5f64c88 16:11:01 Curious, that's odd. 16:11:20 https://www.supermicro.com/en/products/system/4U/6049/SSG-6049P-E1CR36L.cfm 16:11:48 Right, this is a system we specifically designed all this support for back in the day. 16:13:13 Is ses not reporting anything on that system? 16:16:10 what is "ses"? 16:16:48 scsi enclosure services 16:17:35 How do I tell? 16:18:00 basically it's a device that lives on the SCSI bus/fabric that manages an enclosure.. so it's what handles things like indicator/fail lights, power for a disk bay, fans, ... 16:24:18 Do you have anything like a /dev/es/XXX? IIRC. 16:33:33 https://gist.github.com/Smithx10/3852714e1bd836cba7b6f335c50b8a17 17:31:51 You can ask sestopo for more info there. I don't know why fmtopo doesn't have the ses enumeration. 17:45:52 rmustacc: https://us-east-storage.solutions.iqvia.com/bruce_dev/public/sestop-ses0 https://us-east-storage.solutions.iqvia.com/bruce_dev/public/sestop-ses1 18:43:42 i've been told that the SES devices can be a bit loose with the spec 18:44:09 and that there's a lot of 'we'll just document how it works today vs. how it should work' 21:04:41 :( 21:05:15 gotta boot up linux to run lldp during our next expansion :( 21:16:30 you use aggrs, right? 22:08:03 Smithx10 : still that lldp problem?