-
meena
knielsen: i can try it on 13.2 and 14.0-CURRENT later
-
vkarlsen
graphics/p5-Image-ExifTool <3
-
RhodiumToad
reproduced on 14-current
-
RhodiumToad
(current as of a week or so ago)
-
RhodiumToad
also reproduced on 13.2-stable as of, um, about Jul 1
-
knielsen
Ok, thanks for testing
-
RhodiumToad
can you file it as a bug?
-
knielsen
Yes, will do. On bugs.freebsd.org?
-
RhodiumToad
yes
-
knielsen
will do, thanks for the hints and help
-
RhodiumToad
_xor: boot/loader.conf is processed before booting the kernel, rc.conf is processed after booting and only when going into multiuser mode
-
RhodiumToad
the more interesting question of course is why is it happening
-
RhodiumToad
specifically how are we in tmpfs_write without an exclusive lock on the vnode
-
RhodiumToad
seems to be agnostic to the state of debug.vn_io_fault_enable
-
RhodiumToad
and indeed, tmpfs doesn't seem to declare MNTK_NO_IOPF so vn_io_fault shouldn't be in use anyway
-
RhodiumToad
tmpfs doesn't enable MNTK_SHARED_WRITES so there should be an exclusive lock
-
RhodiumToad
lock type tmpfs: EXCL by thread 0xfffffe0083f673a0 (pid 814, a.out, tid 100136)
-
kenrap
vkarlsen: awesome tool, isn't it? I use it to remove all image/video metadata before uploading anywhere.
-
RhodiumToad
aha! mystery solved.
-
RhodiumToad
the problem is not in tmpfs_write but rather in the read side
-
vkarlsen
kenrap: Yeah, and it can output to a number of different formats <3
-
RhodiumToad
now the interesting question is why doesn't it happen for other filesystems
-
» kenrap needs to study from manpages more apparently
-
RhodiumToad
knielsen: if you haven't filed the bug yet, an important fact is that sysctl debug.vn_io_pgcache_read_enable=0 prevents it
-
RhodiumToad
knielsen: the pgcache_read code path does not lock the vnode, which is why the issue occurs
-
knielsen
Ok, I'll include that in the report
-
RhodiumToad
ooh
-
RhodiumToad
and now I can make it happen on real filesystems too
-
knielsen
ouch
-
RhodiumToad
so it's not just tmpfs
-
RhodiumToad
the problem is that the vn_io_fault code path is hiding the bug
-
RhodiumToad
if you disable vn_io_fault, and enable vn_io_pgcache_read_enable, then the bug shows up for ufs filesystems
-
knielsen
Cool! Just another example of how it's often worth it to get to the bottom of random/sporadic test failures and not just ignore them...
-
RhodiumToad
the bug doesn't occur with vn_io_fault because that respects range locking on the data, rather than relying on the vnode lock
-
RhodiumToad
(i.e. the write operation registers a lock on the range of offsets it is writing to)
-
RhodiumToad
vn_io_pgcache_read_enable is trying to optimize the common case where file data is already in the vm object, but in the process it's sneaking through all the locks
-
RhodiumToad
so. workaround is sysctl debug.vn_io_pgcache_read_enable=0
-
RhodiumToad
that'll make all tmpfs reads go through the vnode lock and tmpfs_read
-
RhodiumToad
anyway, important to stress that even though this shows up mostly in tmpfs, it's a bug in the VFS layer generally
-
knielsen
ack
-
RhodiumToad
don't worry about putting all the details in your report, I'll comment on it with my findings
-
RhodiumToad
next question to figure out is why the foffset_lock mechanism wasn't preventing this
-
RhodiumToad
ah, separate open files, obviously
-
RhodiumToad
and vn_rangelock is not invoked on this code path
-
RhodiumToad
ok, I think my understanding is complete
-
RhodiumToad
in fact it's possible that vn_io_pgcache_read_enable is broken beyond easy repair
-
RhodiumToad
ah, no, maybe not
-
RhodiumToad
it can only be reached if range locks aren't being used, so what it actually needs to do is to lock the vnode... but then again, the sole purpose for its existence seems to be to avoid locking the vnode
-
knielsen
-
VimDiesel
Title: 272678 – VFS: Incorrect data in read from concurrent write
-
RhodiumToad
ta, will write up my conclusione
-
rtprio
how did you discover this knielsen? surely you weren't just writing to the same 367 bytes from two threads?
-
knielsen
rtprio: Tests were failing in the MariaDB CI system. It's the binary log (replication log). Writer threads are appending events to the binlog and signalling dump threads that new data is available to send to connected slaves
-
knielsen
(I managed to get a ktrace with some specific read/write pattern when the bug triggered, and the test program is just a minimal way to try to reproduce that particular situation as closely as possible)
-
knielsen
when a new binlog is created, two writes happen close together from two different threads. That gives the opportunity for one writer threads to wake up the reader so it can run concurrently with the second writer thread
-
knielsen
That's why it just so happens that this race/bug is mostly triggered in the testcases at this point of creating a new binlog
-
RhodiumToad
ooh, patch from kib, that was quick
-
RhodiumToad
looks promising. I'll try it out on my -current vm
-
RhodiumToad
looks like it works
-
» RhodiumToad tries it with a few million iterations
-
knielsen
I wonder if a simple `tail -f` can trigger it? But may be unlikely to hit the exact timing
-
RhodiumToad
probably need multiple writers
-
RhodiumToad
tail -f uses a kevent to detect that the file has been extended, so with only one writer there'd usually be no race unless the writer is writing many small blocks in succession quickly
-
RhodiumToad
anyway, 10 minutes running the test prog, with kib's patch, has not produced any error
-
RhodiumToad
(no reason why it would, since the patch just extends range locking to this case)
-
meena
that commit message needs MFCing
-
meena
has anyone tested this with 12?
-
RhodiumToad
I have not.
-
» RhodiumToad updates his 12-stable tree
-
meena
I always forget about 12…
-
RhodiumToad
it looks like the bug is not in 12, since the offending read_pgcache code path is not there.
-
RhodiumToad
Khelp module "ertt" can't unload until its refcount drops from 1 to 0. <-- huh. haven't seen that before
-
RhodiumToad
(just before poweroff)
-
RhodiumToad
and of course buildworld decides it has to rebuild everything just because i enabled ccache
-
mason
to spite you
-
gp5st
I have a box on 12.2 and I want to update it but freebsd-update says `This may be because upgrading from this platform (amd64) or release (13.1-RELEASE-p3) is unsupported by freebsd-update. Only platforms with Tier 1 support can be upgraded by freebsd-update. See
freebsd.org/platforms/index.html for more info.`
-
VimDiesel
Title: Platforms | The FreeBSD Project
-
gp5st
is there anyway to update it it without wiping it?
-
RhodiumToad
freebsd-update seems to think it's on 13.1, not 12.2?
-
RhodiumToad
what do uname -U and uname -K say?
-
gp5st
RhodiumToad, I did -r 13.1 to get it to 13.1
-
gp5st
this is uname -a: `FreeBSD freebsd-s-2vcpu-4gb-nyc1-01 12.2-RELEASE-p4 FreeBSD 12.2-RELEASE-p4 GENERIC amd64`
-
RhodiumToad
you didn't reboot since updating to 13.1 ?
-
gp5st
it won't update to 13.1
-
RhodiumToad
what do uname -U and uname -K say?
-
gp5st
I get that error when I run 3.1-RELEASE-p3
-
RhodiumToad
what do uname -U and uname -K say?
-
gp5st
both -U and -K give 1202000
-
gp5st
I didn't update to 13.1, I get that message when I run `freebsd-update upgrade -r 13.1-RELEASE-p3`
-
RhodiumToad
ok. so 12.2 is well out of support
-
gp5st
yes :) It's a little VM I've forgotten about for a while :(
-
RhodiumToad
can you update 12.2 to 12.4?
-
gp5st
let me see
-
gp5st
I think this is looking promising
-
gp5st
Thank you
-
gp5st
That did work. Thank you
-
RhodiumToad
and from 12.4 to 13.1 ?
-
gp5st
havn't done that yet
-
gp5st
it's starting promising though
-
VVD
Best way update X.Y to X.last, then X+1.last, then X+2.last and etc.
-
RhodiumToad
that's why I suggested 12.2 -> 12.4 :-)
-
phlux
Is there a ports equivalent to Debian's 'build-dep'?
-
RhodiumToad
what does it do?
-
phlux
Builds all dependencies for a package, but not the package itself
-
phlux
I mainly want it so I can install the deps for weechat, but install my own weechat from the git repos
-
RhodiumToad
do you want to build the dependencies, or install them as packages?
-
phlux
Eh, I'd rather build them, but packages would work just as well, I suppose
-
RhodiumToad
make depends in the port dir might do it
-
RhodiumToad
personally what I'd do is make my own port
-
phlux
interesting
-
phlux
I'll look into make depends, thanks!
-
gp5st
I didn't think I had installed anything via ports, so I just went ahead and did the freebsd-upgrade install and now I get `ld-elf.so.1: Shared object "libncurses.so.8" not found, required by "bash"`
-
gp5st
whenever I try to ssh in, even if I try to run a command instead of just a shell
-
RhodiumToad
can you get in other than via ssh?
-
gp5st
not sure? I can see
-
rtprio
knielsen: interesting. good fine
-
rtprio
*find
-
gp5st
RhodiumToad, I have a console shell via the vps provider, but because I'm dumb I cannot find the root password in my password manager
-
gp5st
and logging in as my user gives the same libncurses message
-
RhodiumToad
can you get in via single-user mode?
-
gp5st
let me see
-
RhodiumToad
(if it asks for the password even in single user, there's a workaround if you can get the loader prompt)
-
meena
reasons to keep a root account with a shell from base…
-
RhodiumToad
doesn't help if you forget the root password :-)
-
meena
(i like (t)csh)
-
gp5st
yes, I'm in with a root shell
-
RhodiumToad
pw youruser -s sh
-
RhodiumToad
er
-
RhodiumToad
pw usermod youruser -s sh
-
meena
chsh -s also works, if you can't remember pw syntax
-
gp5st
r/o root file system
-
gp5st
give me a sec
-
RhodiumToad
mount -u /
-
RhodiumToad
(ideally, fsck -p / first)
-
RhodiumToad
(assuming this is ufs)
-
gp5st
That worked, change the shell, now doing a pkg update
-
gp5st
I guess I thought pkg upgrade would have happened as part of hte update. It saying to recompile anything I compiled and pkg is all prebuilt I thought
-
gp5st
(more trying to not come off as a complete and total moron than complaining)
-
skered
sshd died now it won't start claminig host keys can't be found... they're there :/
-
gp5st
Do they have the correct permissions and owner?
-
skered
Appears root, 600
-
RhodiumToad
what's the exact error
-
skered
Not host key files found
-
skered
That's with `service sshd start` and `/usr/sbin.sshd -dD`
-
RhodiumToad
did you change the config at all?
-
skered
Started poudriere, left, VPNed back in, ssh was connwction refused. vpro to the console.
-
skered
Nope
-
RhodiumToad
ls -ld / /etc /etc/ssh /etc/ssh/*_key
-
skered
poudriere did something.. it won't quit
-
RhodiumToad
what OS version?
-
skered
13.2
-
skered
It's something not obvious.
-
skered
poudriere is hunt trying to start in ineractive jail
-
skered
did you try turning it off and back on?
-
skered
However, /, etc, and key files are there. and file understands them
-
skered
I think I'll just restart
-
skered
I have oob so I can watch it
-
skered
shurg. restart just fine
-
RhodiumToad
anything interesting in /var/log/messages?
-
gp5st
RhodiumToad, Thank you, yet again :) I'm updating my application now. The system seems to be working
-
skered
Just some single 15 for apcupsd.
-
skered
signal 15 in auth.log for sshd too
-
skered
That was before the restart
-
skered
See if poudriere can finish...
-
skered
ok well the interactive jail still hangs.. but it's killable via poudriere jail -k.. so whatever it was who knows.. now why can poudriere jail -i accept input from stdin like it always could.
-
skered
echo "cmd; ...." | poudriere testport -ij ... Anyone able to confirm that works in 13.2?