01:03:15 <tozhu> hello all, is there any plan to implement or migrate the AIO/DIO feature from open zfs into illumos zfs?  the AIO/DIO is very important feature for database application, it avoid double buffer operation, and PostgreSQL 18 has support AIO (the v16 support DIO already), and is it important for virtual machine/(VM) use case?
01:28:32 <jclulow> tozhu: If by AIO you mean "async I/O" I'm not sure that's related to DIO, by which I assume you mean the somewhat dubious "direct I/O" stuff they've done in OpenZFS?
01:29:43 <tozhu> jclulow, Yes, AIO is async I/O, and DIO is direct I/O, yes AIO and DIO is done in OpenZFS 2.3 release
01:30:29 <jclulow> Do you mean aiowrite(3C) and aioread(3C), as in https://illumos.org/man/3C/aioread ?
01:33:50 <tozhu> jclulow:  I’m not sure if it is aiowrite/aioread on ZFS on illumos,  here is the description for Postgres on FreeBSD, https://speakerdeck.com/macdice/aio-and-dio-for-postgresql-on-freebsd
01:36:52 <tozhu> and here  https://openzfs.org/wiki/OpenZFS_Developer_Summit_2021_talks#The_Addition_of_Direct_IO_to_ZFS_.28Brian_Atkinson.29
01:38:41 <tozhu> I think the Direct IO to ZFS operation is designed for NVMe driver, which will not use ARC, 
01:40:01 <tozhu> and there is the details, https://www.youtube.com/watch?v=cWI5_Kzlf3U
01:43:56 <tozhu> the latest OpenZFS release is 2.3.2, so not sure if any plan to backport / migration the Direct IO feature into illumos ZFS 
03:23:48 <jclulow> tozhu: Yes, the Direct I/O stuff is pretty dubious, and unrelated to the Async I/O stuff.
03:24:39 <jclulow> Unfortunately the way it's implemented, as I recall, the user space program could modify (by accident, or maliciously) the write buffer after it's been submitted to the kernel, and after it's been checksummed, but before it then gets written to the disk
03:25:09 <jclulow> This would make it essentially impossible to determine whether a checksum error on read was a result of a media problem or a user software bug
03:25:36 <jclulow> If you're retiring devices after a certain threshold of checksum errors, it could be a way to create a denial of service attack on a shared system
03:26:29 <jclulow> Separately, we have lio_listio(3C), see https://illumos.org/man/3C/lio_listio
03:26:37 <jclulow> and we have aioread(3C) and aiowrite(3C)
03:26:50 <jclulow> Those don't really seem like ZFS-level things so much as things for all file systems?
03:30:31 <jbk> though IIRC for filesystems, aio{read,write} really is just saving you the trouble of doing your own non-blocking I/O by using an internal thread pool in libc to do the blocking/waiting for you (as opposed to kaio).. though that's probably fine most of the time as well..
03:31:15 <jclulow> jbk: Historically this was a big problem for PostgreSQL given how ruthlessly single-threaded it has been
03:31:51 <jclulow> So, merely issuing a number of I/Os in advance via some aio interface is probably a huge win for, e.g., synchronous replication, or recovery from logs after an unclean shutdown
03:36:20 <jbk> yes, i seem to recall that for the longest time threads were 'too new and unproven' 
03:36:31 <jbk> (according to postgres)
03:36:50 <jbk> though perhaps 30+ years is long enough now :)
03:58:34 <tozhu> jclulow , jbk   so you advice aioread/aiowrite and lio_listio to do the same thing for Postgres which for FreeBSD ?
03:59:09 <tozhu> so it still had the double buffer? one buffer is in ARC
06:03:53 <Aedil> Hi illumos!
07:22:21 <tozhu> jclulow, with aio_read/ aio_write and lio_listio api, it can implement async I/O,   without Direct IO support, the operation must be in fs cache or ARC for ZFS, but for Database application, hope the IO can be write from DB buffer(such as shared_buffers) into disk directly, with out write fscache or ZFS ARC, so how there is Direct IO support in ZFS
07:28:27 <tsoome> tozhu currently the direct io support for zfs is only in linux + openzfs, and there too it depends on the availability of specific interfaces, which may or may not be present in specific kernel.  in some cases, the internal implementation may use async io instead.  The point is, Direct IO, in its historical meaning (to bypass buffer caches to avoid read ahead overhead) is one specific optimization technique, but it is not the only one and
07:28:27 <tsoome>  there are alternative ones. 
07:30:17 <tozhu> tsoome, for our use case, the database application, we hope avoid double buffer(avoid fs cache), so what’s the alternative is advices, thank you
07:32:16 <tsoome> well, alternatives are already mentioned - async io. but, why do you think the fs cache is the problem there? 
07:33:54 <tsoome> anyhow, *if* you think it is, just set up that DB on linux with openzfs and see if directio does fix your problem. if it does, then you have your answer.
09:12:56 <tozhu> tsoome: got it, thanks for the advice
10:50:48 <Aedil> https://www.linux-magazin.de/news/maxx-interactive-desktop-reanimiert-den-irix-desktop-von-sgi/
13:21:37 <jbk> yeah, see if it's actually a problem or not...
13:22:04 <jbk> the whole 'double cache' issue has been around for 25+ years, and there's been products around for just as long to avoid it
13:22:35 <jbk> yet I know of at least a few F500 companies that got along on some pretty busy (and critical) systems just fine with the whole dreaded double buffering
13:50:55 <tsoome> people tend to forget about why the feature was created and what was the hw back then. But also, they forget the zfs is not just filesystem with [huge] cache. For some reason, I havent heard anyone asking "how I can remove cache from my enterprise disk array".
13:51:46 <tsoome> but of course, there is this small difference about volatile versus non-volatile cache;)
13:53:24 <jbk> yeah, but really since zfs is always consistent on disk, the only big difference is a non-volatile cache won't need to be warmed up after a reboot
13:53:40 <jbk> well is supposed to be always consistent on disk at least :)
13:57:23 <jbk> we've not been able to recreate it, but we did have an incident where something like 50 out of 90 disks all locked up within a few minutes of each other due to a firmware bug that required some work to get zfs to import the pool afterwards
15:39:53 <sommerfeld> jbk: even ZFS has some baseline assumptions about how badly the storage can scramble your data.  If it told the host that a block was written and really synced to disk but after a firmware crash gave you an older version of the block contents I can see that causing some heartburn...
15:42:00 <jbk> yeah, i'm not sure of the details other than basically the disk would completely lock up and require a power cycle
15:42:08 <jbk> so maybe that messed up it's onboard cache or such
15:42:32 <sommerfeld> ssd or spinning rust?
15:43:20 <jbk> spinning rust as i recall
15:45:35 <jbk> thankfully it was limited to a specific model of drive and required doing something that you're not supposed to do, but a bug elsewhere was causing to happen
15:54:48 <sommerfeld> at least with zfs you have a very clear indication of whether or not the storage is behaving as promised...
16:27:30 <gitomat> [illumos-gate] 17416 Want more xlocale.h functions for C++ locale support -- Bill Sommerfeld <sommerfeld⊙ho>
20:26:59 <gitomat> [illumos-gate] 17393 gitignore could cover more -- Patrick Mooney <pmooney⊙pc>