Camelia, the Perl 6 bug

IRC log for #gluster-dev, 2013-08-20

| Channels | #gluster-dev index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:23 hagarth joined #gluster-dev
00:55 asias joined #gluster-dev
01:21 awheeler joined #gluster-dev
01:31 mohankumar_ joined #gluster-dev
01:53 bala joined #gluster-dev
02:26 mohankumar_ joined #gluster-dev
02:56 shubhendu joined #gluster-dev
03:19 lalatenduM joined #gluster-dev
03:27 bharata joined #gluster-dev
03:38 awheeler joined #gluster-dev
03:40 lala_ joined #gluster-dev
03:45 hagarth joined #gluster-dev
03:52 itisravi joined #gluster-dev
03:56 ppai joined #gluster-dev
04:22 bulde joined #gluster-dev
04:35 aravindavk joined #gluster-dev
05:29 mohankumar joined #gluster-dev
05:34 kanagaraj joined #gluster-dev
05:38 lalatenduM joined #gluster-dev
05:38 hagarth joined #gluster-dev
05:40 lalatenduM joined #gluster-dev
05:45 ababu joined #gluster-dev
05:51 raghu joined #gluster-dev
05:55 ppai joined #gluster-dev
06:16 lalatenduM joined #gluster-dev
06:28 bala joined #gluster-dev
07:21 ababu joined #gluster-dev
07:40 badone joined #gluster-dev
07:44 vshankar joined #gluster-dev
08:10 bulde joined #gluster-dev
08:39 ppai joined #gluster-dev
08:58 ababu joined #gluster-dev
09:21 ppai joined #gluster-dev
09:23 bulde joined #gluster-dev
09:46 deepakcs joined #gluster-dev
09:47 ndarshan joined #gluster-dev
09:54 bulde joined #gluster-dev
10:34 kkeithley1 joined #gluster-dev
10:55 lpabon joined #gluster-dev
11:15 bala joined #gluster-dev
11:16 bala joined #gluster-dev
11:17 hagarth joined #gluster-dev
11:23 ndarshan joined #gluster-dev
11:49 lpabon joined #gluster-dev
11:50 bala joined #gluster-dev
12:26 mohankumar joined #gluster-dev
12:59 bulde1 joined #gluster-dev
14:06 wushudoin joined #gluster-dev
14:16 lpabon joined #gluster-dev
14:28 Technicool joined #gluster-dev
14:37 [o__o] joined #gluster-dev
14:38 [o__o] joined #gluster-dev
14:41 [o__o] joined #gluster-dev
15:12 awheeler joined #gluster-dev
15:12 awheeler joined #gluster-dev
15:42 hagarth joined #gluster-dev
16:09 bulde joined #gluster-dev
16:28 jbrooks joined #gluster-dev
16:36 awheeler joined #gluster-dev
17:12 bulde joined #gluster-dev
17:19 bulde joined #gluster-dev
17:23 bala joined #gluster-dev
18:10 lalatenduM joined #gluster-dev
19:09 awheeler joined #gluster-dev
20:43 a2_ bfoster, ping
20:45 foster a2_: pong
20:45 a2_ foster, hey
20:45 a2_ i had soemq uestions
20:45 a2_ *some questions
20:45 a2_ dylseixc :p
20:45 foster :)
20:46 a2_ anyways.. i had a question in xf
20:46 a2_ s
20:46 a2_ arghh.. hate thiskeyboard
20:46 a2_ *in xfs
20:46 foster haha, new keyboard?
20:46 a2_ yeah
20:46 hagarth1 joined #gluster-dev
20:46 a2_ so when an inode has an entry in the journal
20:46 a2_ i am assuming it has to be pinned in memory till the journal is committed and the on-disk changes are completed, no?
20:47 foster yeah, an item (i.e., inode) is locked and modified...
20:47 foster then pinned and transaction committed
20:47 foster then when written to the log, unpinned and available for writeout
20:48 a2_ oh, it is unpinned right after journal commit, and not stayed pinned till on-disk is modified?
20:48 foster by journal commit, you mean written to the on-disk journal?
20:48 a2_ yes.. written to on-disk journal, but not to the physical inode location
20:49 foster yeah, pinned basically means whether it can or cannot be written to the final destination on disk
20:49 foster so we don't want it written to the final location until it's written to the log
20:50 a2_ ah.. i see.. so here was my misunderstanding.. writeout to the disk does not "read the journal and perform the action"?
20:50 foster right
20:50 foster journal should only ever be read on unclean mount
20:50 foster after unclean shutdown I should say
20:50 foster (or repair perhaps)
20:51 foster iow, the normal runtime sequence is lock, modify, pin, write to the journal, unpin, writeback, free journal space
20:53 foster and there's an in memory list (ail - active item list) that defines what's in the journal (as opposed to reading the journal)
20:55 a2_ so.. lock, modify, pin, unlock, journal commit, unpin, writeback
20:55 foster yep, and "remove from journal"
20:55 a2_ so pin is before unlock?
20:56 foster yeah
20:56 foster the calling code will lock the item
20:56 foster and then add to a transaction, I believe the generic transaction code will pin and unlock when a transaction is committed
20:57 foster thus further modifications can be made to the item
20:57 foster but the details of relogging I haven't really explored much
20:59 a2_ how do we know whether a given inode needs a commit or not
20:59 foster journal commit?
21:00 a2_ yes
21:00 a2_ basically if the inode has uncomitted changes or not (i.e clean or dirty)
21:00 foster so the calling code works through a transaction interface (similar to afr I suppose)
21:00 foster the caller adds items to a transaction
21:00 foster i.e., - xfs_trans_log_inode(mytrans, myinode);
21:01 foster then commits a transaction, at which point all the items go onto a committed items list
21:01 foster then at some point that guy is flushed to the on-disk log
21:01 foster and the items move to the AIL, and so on
21:03 a2_ hmm
21:04 a2_ so an item being in AIL does not mean it is "dirty" right..
21:04 foster clear as mud? :)
21:04 foster well in the general sense I think it does
21:04 foster i.e., modifying the item and attaching it to the transaction means its "dirty"
21:05 foster the AIL just means it's already written to the log and can now be written to the final location on disk
21:05 a2_ committed items list = entries which are already on ondisk journal?
21:05 foster nope, that's the ail ...
21:05 a2_ ah
21:06 a2_ so after a log flush, the bunch of entries are just "moved" from CIL to AIL?
21:06 foster create transaction -> attach item -> commit transaction -> items to CIL -> CIL flushes to disk, items to AIL
21:06 foster yeah
21:06 foster then writeback -> removed from AIL
21:06 a2_ so a commit transaction is really just committing it in memory
21:07 foster right, it's a collection of things that need to be atomic on disk
21:07 a2_ got it
21:07 a2_ so, here's what i was thinking..
21:07 a2_ in __writeback_single_inode() in fs-writeback.c
21:08 a2_ (if you have it open in front of you -)
21:08 foster just pulled it up
21:09 a2_ after the spin_unlock(), you can inspect @dirty and check if it has I_DIRTY_PAGES set in it
21:09 a2_ if it did not, that means we had reached a point in the control flow when the inode did not have any dirty pages for writeback
21:09 foster ok
21:10 a2_ but that does not mean the file data was made durable, because XFS still probably needs to flush the inode updates (extents, size etc.) to disk
21:11 a2_ so when a file has actually become "stable" (durable) is something which requires a combination of not having dirty pages in page cache + FS having committed metadata to disk
21:12 a2_ the goal i was exploring was, to add a new event to inotify, like IN_CLEANSED
21:12 a2_ which returns an event when the requested file moved from dirty->clean
21:13 foster interesting
21:13 a2_ and by clean, the same state as if fsync() was run on it (except, nobody issues the fsync and the file reaches that state inthe natural flow of things)
21:13 foster right, it has to at least be in the log
21:13 foster (on disk)
21:14 a2_ right.. so i was wondering, if such a "cleanse" notification was requested on a file/dir, can XFS know unambiguously that at a certain point in the flow of things the inode became "clean" at a point in time and raise the event
21:15 a2_ and as long as the event was requested after finishing write() on the file, the notification would serve as a slow but efficient fsync()
21:15 a2_ (efficient = keeps overall filesystem throughput high)
21:16 a2_ another way could be, remember the latest lsn of the inode at the time of requesting the event, and when the last on-disk committed lsn becomes >= the requested inode lsn, raise the event
21:17 a2_ lsn is global in the filesystem right?
21:18 foster it's attached to the item in some manner but I'm not 100% sure
21:18 a2_ ok.. i assumed lsns were an ordered set (even linear)
21:18 foster I think it effectively maps an item to how much "data" needs to be flushed to the log to cover it
21:19 a2_ ok
21:19 foster I think it also relates to addressing the circular log
21:19 foster (i.e., is made up of a log block address and a "sequence/cycle" of the log)
21:20 a2_ lsn = ip->i_itemp->ili_last_lsn; error = _xfs_log_force_lsn(mp, lsn, XFS_LOG_SYNC, &log_flushed);
21:20 foster yeah, that looks like the high-level mapping
21:20 a2_ based on that i assumed every inode had some "position" in the global log of transactions, and fsync requested a flush of the log "at least till that position"
21:22 foster yeah
21:22 a2_ hmm, another question.. if i do a chmod() and that results in a transaction, make it to the CIL, then gets log flushed to disk, and is waiting in the AIL
21:22 foster I think in either case it's a matter of mapping that to a point in time to the original request
21:22 a2_ say the lsn became 100 because of this chmod()
21:23 a2_ then I do a chown(), results in a transaction , make it to CIL, bump the ili_last_lsn to 200
21:23 a2_ what happens to the inode which was in AIL waiting on the chmod() writeout? is it moved back into CIL?
21:25 a2_ or can an inode be in AIL and CIL at the same time?
21:25 foster I'm not totally sure about the relogging case
21:26 a2_ i'm imagining the functionality to work like this:
21:26 a2_ xfs_ilock(ip);
21:26 a2_ lsn = ip->i_temp->ili_last_lsn;
21:26 a2_ xfs_unlock(ip);
21:27 a2_ _xfs_log_wait_lsn(mp, lsn, ...);
21:27 a2_ instead of forcing up to that lsn, just wait till the natural flow finishes up to that lsn
21:28 foster that's the high level thought I had with the async fsync thing, is that something we could even do with inotify?
21:28 foster (i.e., block a thread)
21:29 a2_ probably not.. the threading model would be the same as async fsync
21:29 a2_ (except i am guessing async fsync would initiate a push, while this inotify equivalent wouldn't push anything, but just passively wait for its event to occur, maybe even minutes later)
21:30 a2_ actually, maybe not
21:30 foster oh that's right, we were discussing a flush with the old idea
21:30 a2_ maybe async fsync is the right interface here.. not sure async fsync has to initiate a push
21:31 a2_ should it :-?
21:32 a2_ hmm, the async_fsync() would have to do the filemap_write_and_wait_range()
21:32 foster right
21:32 a2_ whereas inotify(IN_CLEANSE) was meant to be "purely passive"
21:32 a2_ kind of "only watch and let me know"
21:33 foster yeah, that might mean some layering magic though right?
21:33 foster i.e., if there's extending data in pagecache, the low level log might not even see the item yet
21:34 a2_ yeah, i don't think inotify(IN_CLEANSE) is even possible withotu changes in both VFS _and_ every filesystem
21:34 foster yeah
21:34 a2_ yeah.. it would have to be a combination of first waiting for page cache to become clean (in that point in flow in __writeback_single_inode()) and _then_ waiting for the corresponding filesystem to synchronize metadata to disk
21:35 a2_ not sure if it's a good idea given this complexity :p
21:35 foster and what happens if the inode is dirtied again between those two points? :P
21:35 foster hehe requires some thought about all the state possibilities anyways
21:36 a2_ once the page-cache is clean we have the lsn based "positinoing" even if it gets re-dirtied
21:36 a2_ but yeah.. lsn is XFS specific.. other FSes might not have anything similar
21:36 foster ok, so you'd need the "page cache clean" event in the first place
21:37 a2_ but oh wait.. we could do this.. if inotify(IN_CLEANSE) was only a page-cache cleansing event and nothing more
21:37 a2_ and then if we can request the filesystem to flush its logs
21:37 a2_ we could wait in IN_CLEANSE event on a big bunch of files, and when they all raise the event request XFS for a log flush? is that even possible?
21:38 a2_ is that even clean :|
21:38 a2_ (clean == clean design, not "not dirty" :p)
21:39 foster it's all the synchronization and serialization that needs to happen that sounds hairy
21:39 a2_ yeah..
21:40 a2_ well, it looks like it is a really hard problem to get durability guarantee of a lot of small files :|
21:41 a2_ maybe a big-hammer sync() is better than dealing with options on every file
21:41 a2_ what is your hope/opinion about async fsync with bulk io_submit()?
21:42 foster well, my expectation is that even if doable, we might get that punted back at us by upstream ;) ...
21:42 foster s/might/likely
21:43 foster but this reminds me, did you see my mail a few weeks back about some afr performance tests I ran?
21:43 a2_ yes, i do.. you seemed to observe that the extra gluster-level fops was resulting in a lot of performance jitter, did i understand this right?
21:44 foster yeah, I don't have it all in my head atm, but my speculation was that at least the async fsync model wouldn't buy us much
21:45 a2_ btw, by "threads" i assumed you meant application threads?
21:46 foster right, parameter to that smallfile tool
21:47 a2_ i am assuming the "jitter" was because of the bursty nature of callbacks (wait in bulk, unwind in bulk)?
21:48 a2_ we were probably clogging the reply socket in the server with all the callbacks in a burst resulting in a temporary drop in perf
21:49 a2_ and the cycle would repeat every few seconds
21:49 a2_ and when we were actually doing the fsync, maybe the perf was too low to expose this jitter :-?
21:50 a2_ wild guess!
21:50 foster possibly, I'm looking for this old mail...
21:50 a2_ subject: afr -- fsync/fdatasync tests
21:51 foster oh right, the fdatasync() test
21:52 foster so jitter aside, the fdatasync() results weren't so great iirc
21:53 a2_ it was slightly better
21:54 foster yeah, 97 iops to 109
21:55 a2_ hmm
21:56 a2_ i'm working on a new proposal for AFR
21:56 a2_ basically decoupling consistency and durability
21:57 a2_ today we fsync() even when app asks for non-durable writes, only because we have only one consistency tracking (xattrs)
21:57 a2_ ideally, if one server never crashes and the other server crashes and comes back, to heal things in the right way nothing had to have been fsynced anywhere
21:58 a2_ but when both servers crash, we only need to bring them back into some arbitrary state, as long as they are consistent
21:59 a2_ with this model the algorithm becomes a bit different
22:00 foster oh, ok.
22:00 foster so consistency over durability, unless durability is requested
22:01 a2_ exactly
22:01 a2_ the goal is, we never do per-file fsync() and do a more batchy syncfs() in the new algorithm
22:02 a2_ a lot of operations can be performed in a much more batchy way
22:02 foster indeed
22:03 foster sounds interesting
22:03 a2_ yeah.. i'll send out the proposal on -devel@ hopefully in a couple of days
22:03 foster perhaps more clean than exporting consistency out of lower layers :)
22:33 tg2 joined #gluster-dev
23:28 an joined #gluster-dev

| Channels | #gluster-dev index | Today | | Search | Google Search | Plain-Text | summary