07:55:14 knielsen: i can try it on 13.2 and 14.0-CURRENT later 08:59:14 graphics/p5-Image-ExifTool <3 11:41:35 reproduced on 14-current 11:41:55 (current as of a week or so ago) 11:44:40 also reproduced on 13.2-stable as of, um, about Jul 1 11:45:04 Ok, thanks for testing 11:45:20 can you file it as a bug? 11:45:33 Yes, will do. On bugs.freebsd.org? 11:45:42 yes 11:46:13 will do, thanks for the hints and help 11:46:15 _xor: boot/loader.conf is processed before booting the kernel, rc.conf is processed after booting and only when going into multiuser mode 11:50:34 the more interesting question of course is why is it happening 11:51:11 specifically how are we in tmpfs_write without an exclusive lock on the vnode 11:58:39 seems to be agnostic to the state of debug.vn_io_fault_enable 12:00:11 and indeed, tmpfs doesn't seem to declare MNTK_NO_IOPF so vn_io_fault shouldn't be in use anyway 12:11:42 tmpfs doesn't enable MNTK_SHARED_WRITES so there should be an exclusive lock 12:27:15 lock type tmpfs: EXCL by thread 0xfffffe0083f673a0 (pid 814, a.out, tid 100136) 12:27:31 vkarlsen: awesome tool, isn't it? I use it to remove all image/video metadata before uploading anywhere. 12:32:20 aha! mystery solved. 12:32:44 the problem is not in tmpfs_write but rather in the read side 12:32:45 kenrap: Yeah, and it can output to a number of different formats <3 12:34:16 now the interesting question is why doesn't it happen for other filesystems 12:37:08 * kenrap needs to study from manpages more apparently 12:39:09 knielsen: if you haven't filed the bug yet, an important fact is that sysctl debug.vn_io_pgcache_read_enable=0 prevents it 12:39:53 knielsen: the pgcache_read code path does not lock the vnode, which is why the issue occurs 12:40:12 Ok, I'll include that in the report 12:40:25 ooh 12:40:35 and now I can make it happen on real filesystems too 12:40:40 ouch 12:40:44 so it's not just tmpfs 12:41:06 the problem is that the vn_io_fault code path is hiding the bug 12:41:41 if you disable vn_io_fault, and enable vn_io_pgcache_read_enable, then the bug shows up for ufs filesystems 12:41:47 Cool! Just another example of how it's often worth it to get to the bottom of random/sporadic test failures and not just ignore them... 12:42:29 the bug doesn't occur with vn_io_fault because that respects range locking on the data, rather than relying on the vnode lock 12:42:58 (i.e. the write operation registers a lock on the range of offsets it is writing to) 12:44:24 vn_io_pgcache_read_enable is trying to optimize the common case where file data is already in the vm object, but in the process it's sneaking through all the locks 12:45:24 so. workaround is sysctl debug.vn_io_pgcache_read_enable=0 12:46:13 that'll make all tmpfs reads go through the vnode lock and tmpfs_read 12:47:17 anyway, important to stress that even though this shows up mostly in tmpfs, it's a bug in the VFS layer generally 12:47:25 ack 12:47:42 don't worry about putting all the details in your report, I'll comment on it with my findings 12:51:45 next question to figure out is why the foffset_lock mechanism wasn't preventing this 12:52:37 ah, separate open files, obviously 12:54:03 and vn_rangelock is not invoked on this code path 12:54:20 ok, I think my understanding is complete 13:01:03 in fact it's possible that vn_io_pgcache_read_enable is broken beyond easy repair 13:01:57 ah, no, maybe not 13:02:37 it can only be reached if range locks aren't being used, so what it actually needs to do is to lock the vnode... but then again, the sole purpose for its existence seems to be to avoid locking the vnode 13:04:37 RhodiumToad: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=272678 13:04:39 Title: 272678 – VFS: Incorrect data in read from concurrent write 13:05:15 ta, will write up my conclusione 16:06:45 how did you discover this knielsen? surely you weren't just writing to the same 367 bytes from two threads? 16:14:27 rtprio: Tests were failing in the MariaDB CI system. It's the binary log (replication log). Writer threads are appending events to the binlog and signalling dump threads that new data is available to send to connected slaves 16:15:38 (I managed to get a ktrace with some specific read/write pattern when the bug triggered, and the test program is just a minimal way to try to reproduce that particular situation as closely as possible) 16:22:48 when a new binlog is created, two writes happen close together from two different threads. That gives the opportunity for one writer threads to wake up the reader so it can run concurrently with the second writer thread 16:23:40 That's why it just so happens that this race/bug is mostly triggered in the testcases at this point of creating a new binlog 16:28:59 ooh, patch from kib, that was quick 16:30:01 looks promising. I'll try it out on my -current vm 16:40:00 looks like it works 16:40:26 * RhodiumToad tries it with a few million iterations 16:47:08 I wonder if a simple `tail -f` can trigger it? But may be unlikely to hit the exact timing 16:48:05 probably need multiple writers 16:48:49 tail -f uses a kevent to detect that the file has been extended, so with only one writer there'd usually be no race unless the writer is writing many small blocks in succession quickly 16:50:40 anyway, 10 minutes running the test prog, with kib's patch, has not produced any error 16:51:14 (no reason why it would, since the patch just extends range locking to this case) 17:10:49 that commit message needs MFCing 17:10:57 has anyone tested this with 12? 17:13:16 I have not. 17:14:48 * RhodiumToad updates his 12-stable tree 17:15:27 I always forget about 12… 17:15:35 it looks like the bug is not in 12, since the offending read_pgcache code path is not there. 17:22:12 Khelp module "ertt" can't unload until its refcount drops from 1 to 0. <-- huh. haven't seen that before 17:22:25 (just before poweroff) 17:37:34 and of course buildworld decides it has to rebuild everything just because i enabled ccache 17:40:06 to spite you 18:17:31 I have a box on 12.2 and I want to update it but freebsd-update says `This may be because upgrading from this platform (amd64) or release (13.1-RELEASE-p3) is unsupported by freebsd-update. Only platforms with Tier 1 support can be upgraded by freebsd-update. See https://www.freebsd.org/platforms/index.html for more info.` 18:17:32 Title: Platforms | The FreeBSD Project 18:17:41 is there anyway to update it it without wiping it? 18:19:34 freebsd-update seems to think it's on 13.1, not 12.2? 18:19:48 what do uname -U and uname -K say? 18:22:13 RhodiumToad, I did -r 13.1 to get it to 13.1 18:22:29 this is uname -a: `FreeBSD freebsd-s-2vcpu-4gb-nyc1-01 12.2-RELEASE-p4 FreeBSD 12.2-RELEASE-p4 GENERIC amd64` 18:22:44 you didn't reboot since updating to 13.1 ? 18:22:59 it won't update to 13.1 18:23:07 what do uname -U and uname -K say? 18:23:25 I get that error when I run 3.1-RELEASE-p3 18:23:29 what do uname -U and uname -K say? 18:23:40 both -U and -K give 1202000 18:24:23 I didn't update to 13.1, I get that message when I run `freebsd-update upgrade -r 13.1-RELEASE-p3` 18:24:51 ok. so 12.2 is well out of support 18:25:15 yes :) It's a little VM I've forgotten about for a while :( 18:25:21 can you update 12.2 to 12.4? 18:25:54 let me see 18:28:37 I think this is looking promising 18:28:55 Thank you 18:43:26 That did work. Thank you 18:45:27 and from 12.4 to 13.1 ? 18:49:38 havn't done that yet 18:50:20 it's starting promising though 18:53:57 Best way update X.Y to X.last, then X+1.last, then X+2.last and etc. 18:56:34 that's why I suggested 12.2 -> 12.4 :-) 19:17:49 Is there a ports equivalent to Debian's 'build-dep'? 19:18:03 what does it do? 19:18:14 Builds all dependencies for a package, but not the package itself 19:18:40 I mainly want it so I can install the deps for weechat, but install my own weechat from the git repos 19:19:05 do you want to build the dependencies, or install them as packages? 19:19:28 Eh, I'd rather build them, but packages would work just as well, I suppose 19:22:33 make depends in the port dir might do it 19:23:04 personally what I'd do is make my own port 19:24:19 interesting 19:24:27 I'll look into make depends, thanks! 20:28:25 I didn't think I had installed anything via ports, so I just went ahead and did the freebsd-upgrade install and now I get `ld-elf.so.1: Shared object "libncurses.so.8" not found, required by "bash"` 20:28:41 whenever I try to ssh in, even if I try to run a command instead of just a shell 20:30:45 can you get in other than via ssh? 20:31:00 not sure? I can see 20:31:48 knielsen: interesting. good fine 20:31:50 *find 20:34:46 RhodiumToad, I have a console shell via the vps provider, but because I'm dumb I cannot find the root password in my password manager 20:35:02 and logging in as my user gives the same libncurses message 20:35:34 can you get in via single-user mode? 20:35:52 let me see 20:35:57 (if it asks for the password even in single user, there's a workaround if you can get the loader prompt) 20:37:49 reasons to keep a root account with a shell from base… 20:38:30 doesn't help if you forget the root password :-) 20:38:31 (i like (t)csh) 20:38:36 yes, I'm in with a root shell 20:38:47 pw youruser -s sh 20:38:51 er 20:38:56 pw usermod youruser -s sh 20:39:45 chsh -s also works, if you can't remember pw syntax 20:40:34 r/o root file system 20:40:36 give me a sec 20:40:44 mount -u / 20:41:13 (ideally, fsck -p / first) 20:41:39 (assuming this is ufs) 20:43:31 That worked, change the shell, now doing a pkg update 20:44:02 I guess I thought pkg upgrade would have happened as part of hte update. It saying to recompile anything I compiled and pkg is all prebuilt I thought 20:44:30 (more trying to not come off as a complete and total moron than complaining) 20:46:44 sshd died now it won't start claminig host keys can't be found... they're there :/ 20:47:38 Do they have the correct permissions and owner? 20:48:06 Appears root, 600 20:48:47 what's the exact error 20:49:31 Not host key files found 20:50:02 That's with `service sshd start` and `/usr/sbin.sshd -dD` 20:50:37 did you change the config at all? 20:51:00 Started poudriere, left, VPNed back in, ssh was connwction refused. vpro to the console. 20:51:04 Nope 20:52:06 ls -ld / /etc /etc/ssh /etc/ssh/*_key 20:52:09 poudriere did something.. it won't quit 20:52:44 what OS version? 20:52:53 13.2 20:55:11 It's something not obvious. 20:55:45 poudriere is hunt trying to start in ineractive jail 20:56:06 did you try turning it off and back on? 20:56:32 However, /, etc, and key files are there. and file understands them 20:56:48 I think I'll just restart 20:56:57 I have oob so I can watch it 21:00:56 shurg. restart just fine 21:01:28 anything interesting in /var/log/messages? 21:01:55 RhodiumToad, Thank you, yet again :) I'm updating my application now. The system seems to be working 21:02:39 Just some single 15 for apcupsd. 21:03:22 signal 15 in auth.log for sshd too 21:03:34 That was before the restart 21:04:45 See if poudriere can finish... 22:08:17 ok well the interactive jail still hangs.. but it's killable via poudriere jail -k.. so whatever it was who knows.. now why can poudriere jail -i accept input from stdin like it always could. 22:09:16 echo "cmd; ...." | poudriere testport -ij ... Anyone able to confirm that works in 13.2?