Perl 6 - the future is here, just unevenly distributed

IRC log for #gluster-dev, 2015-12-29

| Channels | #gluster-dev index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:03 hgichon joined #gluster-dev
00:14 jwang_ joined #gluster-dev
00:56 EinstCrazy joined #gluster-dev
00:56 EinstCrazy joined #gluster-dev
01:16 zhangjn joined #gluster-dev
01:35 skoduri joined #gluster-dev
01:48 zhangjn joined #gluster-dev
03:21 kotreshhr joined #gluster-dev
03:28 ggarg joined #gluster-dev
03:34 zhangjn_ joined #gluster-dev
03:36 shubhendu joined #gluster-dev
03:36 nbalacha joined #gluster-dev
03:38 Nagaprasad joined #gluster-dev
03:40 nbalacha joined #gluster-dev
03:45 overclk joined #gluster-dev
03:50 atinm joined #gluster-dev
04:06 itisravi joined #gluster-dev
04:15 nishanth joined #gluster-dev
04:15 kanagaraj joined #gluster-dev
04:15 Manikandan joined #gluster-dev
04:17 Manikandan joined #gluster-dev
04:17 zhangjn joined #gluster-dev
04:17 kotreshhr joined #gluster-dev
04:26 nishanth joined #gluster-dev
04:27 nishanth joined #gluster-dev
04:28 nishanth joined #gluster-dev
04:30 nishanth joined #gluster-dev
04:32 ppai joined #gluster-dev
04:45 pppp joined #gluster-dev
04:45 hgowtham joined #gluster-dev
04:46 Humble joined #gluster-dev
04:47 kotreshhr joined #gluster-dev
04:50 sakshi joined #gluster-dev
04:54 ashiq joined #gluster-dev
05:04 ndarshan joined #gluster-dev
05:08 Apeksha joined #gluster-dev
05:10 itisravi joined #gluster-dev
05:12 zhangjn joined #gluster-dev
05:12 gem joined #gluster-dev
05:17 aravindavk joined #gluster-dev
05:17 zhangjn joined #gluster-dev
05:21 apandey joined #gluster-dev
05:28 skoduri joined #gluster-dev
05:38 Manikandan joined #gluster-dev
05:41 zhangjn joined #gluster-dev
05:43 ggarg joined #gluster-dev
05:46 Bhaskarakiran joined #gluster-dev
05:52 Manikandan joined #gluster-dev
05:53 Manikandan joined #gluster-dev
05:53 rastar joined #gluster-dev
05:54 vimal joined #gluster-dev
06:02 poornimag joined #gluster-dev
06:06 rtlaur joined #gluster-dev
06:06 vmallika joined #gluster-dev
06:12 vmallika joined #gluster-dev
06:15 asengupt joined #gluster-dev
06:21 rastar joined #gluster-dev
06:25 Saravana_ joined #gluster-dev
06:25 kanagaraj joined #gluster-dev
06:26 rafi joined #gluster-dev
06:32 sakshi joined #gluster-dev
06:42 sakshi joined #gluster-dev
06:54 rastar joined #gluster-dev
06:56 rastar joined #gluster-dev
06:59 asengupt joined #gluster-dev
07:00 spalai joined #gluster-dev
07:01 rastar joined #gluster-dev
07:02 rastar joined #gluster-dev
07:03 rastar joined #gluster-dev
07:04 spalai1 joined #gluster-dev
07:05 msvbhat joined #gluster-dev
07:15 pranithk joined #gluster-dev
07:17 Bhaskarakiran joined #gluster-dev
07:17 rastar joined #gluster-dev
07:23 nbalacha joined #gluster-dev
07:33 kotreshhr joined #gluster-dev
07:49 zhangjn joined #gluster-dev
08:07 nbalacha joined #gluster-dev
08:21 kshlm joined #gluster-dev
08:53 zhangjn joined #gluster-dev
09:04 kshlm joined #gluster-dev
09:06 rafi1 joined #gluster-dev
09:11 rafi joined #gluster-dev
09:17 pranithk xavih: regarding http://review.gluster.org/13039, I agree that the fix can be made in dht also. But every time some xlator above misbehaves, only ec volumes will give problem. There is also one train of thought that having a gfid should be good enough to perform operations (which is the reason loc has gfid/pargfid inspite of having loc->inode, loc->parent).  we made sure to handle operations on inodes that are not linked yet carefully, it
09:17 pranithk xavih: Persistent complaint is that the same operation works fine on afr.
09:20 xavih pranithk: afr is more simple than ec. Erasure coding needs more logic, and this logic depends on more factors than what afr needs for any operation. If other xlators do not correctly set the data about the inodes, ec cannot do its work
09:20 xavih but this is not a problem of ec. It's working correctly for the information it has received
09:22 pranithk xavih: But ideally gfid should be good enough to perform the operation.
09:22 pranithk xavih: That at least is the point of introducing loc.gfid/pargfid even when loc->inode/parent exist
09:22 xavih pranithk: it's not possible. EC needs information about the file that the gfid doesn't contain
09:23 xavih pranithk: where EC can get the file type ?
09:23 pranithk xavih: But we just made it possible right?
09:24 xavih pranithk: no, the patch simply tries to get information from an invalid file
09:24 xavih pranithk: but that file could be a directory or anything else, not necessarily a regular file
09:25 xavih pranithk: I think it's a hack to solve a problem caused by other xlators
09:27 pranithk xavih: That is definitely one way to look at it. But it can also be seen as making ec as independent as possible. Let us forget about what other xlators are doing for now. How can we make ec allow operations with just gfid? What extra information should we gather to do it?
09:29 pranithk xavih: If you absolutely hate this approach tell me :-). I feel this is something ec can solve.
09:31 nbalacha joined #gluster-dev
09:32 pranithk xavih: Do you not at all want to go this route?
09:33 spalai joined #gluster-dev
09:34 xavih pranithk: ec has always had big problems managing loc structures because the information received was not always consistent. It tries to do its best to reconstruct a loc as full as possible. This has eliminated most of the problems. If this approach is still not valid, I'll need a detailed description of each loc's fields and how it must be managed in the presence/absence of other fields
09:37 pranithk xavih: I agree :-(. It did cause a lot of problems. I also agree that we are mostly done except in this case. I will be happy to explain where the problem comes with loc's inodes.
09:39 xavih pranithk: note that ec checks the inode type *after* having locked the file. If after having locked the file, the inode contents are not trustable, I don't know how to manage it
09:40 xavih pranithk: and note that the lock can be acquired even if we only have a gfid
09:40 pranithk xavih: exactly!
09:40 pranithk xavih: afr has this problem because it will have to know if it is a file/directory based on type it will have to take inodelk/entrylk. But ec is good here!
09:41 xavih pranithk: after having the file lock, ec needs to know the file type
09:41 pranithk xavih: which we can gather from the bricks...
09:41 xavih pranithk: using an additional lookup ?
09:44 pranithk xavih: We can get the inode->ia_type from locks itself. No need to do extra lookup.... I am starting not like this approach :-/
09:44 xavih pranithk: currently inodelk() doesn't return any information about the inode
09:45 pranithk xavih: I am asking raghavendra to come online...
09:45 pranithk xavih: we can add it... but I am not liking all this slowly :-)
09:45 pranithk xavih: raghavendra maintains dht along with shyam
09:45 raghug joined #gluster-dev
09:46 pranithk raghug: context is, why does dht send setattr/setxattr before the inode gets linked in dht?
09:47 pranithk raghug: ec needs inode to be linked before performing setattr/setxattr because based on the type of the file, it needs to decide which xattrs to inspect...
09:48 pranithk raghug: we added code to handle IA_INVAL. But it looks like a hack once xavi and I discussed about it in detail.
09:48 raghug those setxattrs/setattrs are part of healing of directories during lookup. inode gets linked in fuse-bridge. However, the lookup is unwound to fuse only after heal
09:48 xavih raghug: before doing anything, ec locks the inode, so it seems reasonable to assume that after a successful lock, the inode information will be valid
09:49 raghug dht can link the inode before attempting heal.
09:49 raghug But since lookup is not unwound yet, it might cause problems in xlators b/w fuse and dht
09:49 msvbhat joined #gluster-dev
09:50 raghug as in, lets say if io-stats creates a context in inode in lookup_cbk, that context won't be setup till lookup is unwound from dht
09:50 pranithk raghug: but dht can't link the inode right? because there could be more xlators above dht (tiering comes to mind) which may detect that for the same name, some other file type may exist on the other tier?
09:51 raghug yes.
09:51 raghug I was explaining the same thing
09:51 raghug if dht links inode, it might cause issues for translators b/w fuse and dht
09:51 pranithk raghug: We are kind of in dead lock here. Is this the reason why Amar introduced loc.gfid/loc.pargfid into the mix and we need to work even when we just know the gfid of the file/directory?
09:52 raghug pranithk: I don't know the history, but seems fair enough
09:53 raghug but again it might still be a problem
09:53 pranithk raghug: Dude, it is causing bad problems in ec. We need to know more about the inode before we do the operation. But until we link the inode we don't know the type. Should each xlator remember the type in that case?
09:53 xavih raghug: one question... the inode that dht passes to ec comes from a previous lookup or create call, right ?
09:53 pranithk xavih: yes
09:53 raghug xavih: yes,
09:54 xavih raghug: doesn't that inode contain good information ?
09:54 xavih raghug: if so, the inode doesn't need to be linked, only passed inside loc...
09:54 raghug only after its linked (associate an inode with information in iatt like gfid etc) and that linking happens in fuse
09:54 pranithk xavih: until inode_link is called, we can't make any assumptions
09:55 pranithk xavih: it is the same inode used by the cluster xlators to send to all the subvolumes. Each subvolume may see different gfid/type for same (pargfid, name)
09:55 xavih pranithk, raghug: that's weird, because many translators might be working with an inode that's not really valid yet... conceptually it doesn't seem right
09:55 pranithk xavih: so we can't really modify the inode->ia_type
09:56 raghug pranithk: how does afr solves this? doesn't it also do healing?
09:56 spalai joined #gluster-dev
09:56 pranithk raghug: afrv1 used to have problems. Now afr's self-heal takes only gfid. Nothing more
09:56 raghug xavih: I agree. But I am not seeing a solution here
09:57 xavih pranithk: setting ia_type of a inode when multiple bricks return inconsistent information should be the task of cluster xlators, I think
09:57 raghug xavih: that can be done by failing lookup
09:58 raghug cluster translators can fail lookup, if the subvols don't match
09:58 xavih raghug: what is this useful for ?
09:58 pranithk raghug: Can dht do inode_new() for fresh lookup and use that inode instead of the loc->inode passed to it, to perform lookup so that this inode is local only to that particular dht? It can modify it happily?
09:59 raghug it can. Solves the problem of invalid inode
09:59 raghug But, it doesn't solve the problem of a single gfid being represented by multiple inodes
09:59 msvbhat joined #gluster-dev
10:01 pranithk raghug: Ah! crazy.
10:01 raghug and we cannot have multiple inodes representing a single gfid. It can cause nasty problems. Though in practise, this shouldn't cause any issue as there are not many translators in client which can be affected by this problem (like posix-locks on bricks)
10:01 pranithk raghug: The only way out I see is to remember ia_type in ctx in each of the xls
10:02 raghug but thats kind of hacky assumption (my previous comment) :)
10:02 xavih pranithk: ia_type is the problem right now, but it could also be ia_size or any other field in another place
10:02 pranithk xavih: lets store ia_buf
10:02 pranithk xavih: yuck
10:02 pranithk xavih: :-D
10:03 pranithk xavih: crazy stuff... :-(
10:04 pranithk raghug: What is a way out? I am lost for ideas :-/
10:05 raghug pranithk: I think we can go with creating a new inode, which has all the relevant information filled (but not linked in inode table)
10:05 raghug not without any issues, but lesser among different evils
10:06 raghug and dht using that new inode for fops during healing
10:06 xavih pranithk, raghug: what about adding an st_buf field in loc ?
10:06 pranithk raghug: I hate that solution too. ctx won't be available for other xls
10:06 raghug ah! how about using the same inode?
10:06 pranithk xavih: same problem as inode. Which subvolume's st_buf do you believe?
10:07 xavih raghug: but this requires to create a new inode, right ? this was a problem...
10:07 raghug just copy all the information from ia_buf to inode
10:07 raghug xavih: no, we can use the same inode
10:07 raghug xavih: inode-link does two things
10:07 raghug 1. fill the inode with relevant information from iabuf
10:08 raghug 2. link the inode into inode table, so that inode_grep or inode_find can find that inode
10:08 raghug I am suggesting dht does step 1 without doing 2
10:08 pranithk raghug: we can't use same inode. This inode is already wound to different subvolumes. We shouldn't change it. If we create new inode, it won't have context that is built out of lookup. So it is again chicken and egg problem
10:08 raghug no, its not wound yet
10:08 pranithk raghug: imagine this.
10:09 raghug we initiate heal only when we get replies from all subvols
10:09 Bhaskarakiran joined #gluster-dev
10:09 raghug so there are no in-progress lookups when dht initiates heal
10:09 pranithk raghug: tiering winds lookup to both cold and hot dhts. In each dht, we have different types for same (pargfid, name). Do you see the race now?
10:09 raghug pranithk: ...
10:10 raghug pranithk: should it matter
10:10 raghug ?
10:10 pranithk raghug: didn't get your question
10:10 raghug how does that will cause a problem?
10:11 raghug dht has not unwound the call to tier
10:11 raghug when lookup is unwound to tier, tier will know about the mismatch and it can take corrective action
10:11 pranithk raghug: but dht is changing the ia_type before unwind in both the hot/cold dhts
10:11 raghug pranithk: ah! got it
10:12 pranithk raghug: yeah it will cause such awesome bugs we will die
10:12 raghug :|
10:12 pranithk raghug: wait. Can we change dht self-heal to take gfid?
10:12 pranithk raghug: not liking it either.
10:13 pranithk raghug: thinking...
10:13 raghug it won't even work
10:13 zhangjn joined #gluster-dev
10:13 aravindavk joined #gluster-dev
10:13 raghug dht needs to pass a valid inode to its children during heal
10:13 raghug and dht cannot construct a valid inode till lookup is complete
10:14 raghug and lookup cannot complete till heal is done
10:14 raghug circular dependency :)
10:14 pranithk raghug: ergo deadlock
10:14 raghug yes
10:14 pranithk raghug: why should it complete self-heal before unwinding lookup?
10:14 xavih pranithk, raghug: what if inodelk() (and maybe others) return an st_buf argument in cbk ?
10:15 raghug because fops will fail if directory doesn't have a valid layout
10:15 raghug to avoid that, we try to heal and then unwind
10:15 raghug xavih: stbuf is not the problem
10:16 xavih raghug: yes, it's because ec needs to decide things based on file type after having locked the inode
10:16 raghug dht already has iabuf when it starts heal. Or am I missing something
10:16 xavih raghug: ec doesn't receive the needed information from upper xlators, but it could get it from lower xlators
10:16 raghug oh.. will it be ok if dht winds down iabuf?
10:17 raghug say in xdata
10:17 raghug ugly, but might work
10:17 xavih raghug: this also seems a hack...
10:17 pranithk raghug: xavih's solution of getting ia_buf seems cleaner.
10:17 raghug so, you can check inode for ia_type first. If its invalid you can look into xdata
10:17 pranithk xavih: But I don't like that much either. Circular dependency seems to be the fundamental problem
10:18 pranithk xavih: like raghug pointed out earlier
10:18 xavih having st_buf is inodelk_cbk could also avoid a lookup in some cases (self-heal)
10:18 raghug xavih: I don't have any issues with iabuf in inodelk_cbk, if it is sufficient for you
10:19 pranithk xavih: no doubt. But all of these are fixes for manifestations introduced by that circular dependency aren;t they?
10:19 xavih pranithk: yes, the circular dependency is not solved, but I don't see how to solve it with the current architecture. loc_t structure and inode management should be rethinked...
10:19 raghug xavih: +1
10:19 pranithk xavih: my point exactly :-)
10:19 raghug pranithk: +1 :p
10:19 xavih pranithk: and this is a bigger change
10:20 raghug xavih: do you've a solution in mind?
10:20 raghug just curious :)
10:21 xavih raghug: not really, but I always hated loc_t... unfortunately I don't know all details about loc_t contents and inode management, so I can't propose a full solution
10:22 raghug if all you need is only type, we can introduce a type field in loc_t.
10:22 xavih raghug: maybe the first approach would be to use independent inodes for initial lookups/creates/...
10:22 pranithk raghug: I think the whole solution lies in splitting lookup/resolve correctly. At the moment lookup and resolve are same i.e. lookup.
10:22 pranithk raghug: all fops come with resolve_and_resume()
10:22 xavih raghug: and add support is clustered xlators to decide which inode is the good one and merge contexts if necessary
10:23 pranithk raghug: We should let lookup just what it is supposed to do, i.e. given pargfid/name() give its gfid.
10:23 pranithk raghug: But xlators can choose to mark saying resolution is not done. in lookup_cbk()
10:23 pranithk raghug: In which case, resolve_and_resume() must call resolve() which will do all these actions like self-heal etc because the inode is already linked
10:24 xavih brb
10:25 sakshi pranithk, if we do not heal as a part of lookup, then further fops (dependant on that directories layout) will have a problem since healing of a layout (in case of anomalies is not done)
10:26 pranithk sakshi: we will not heal as part of lookup but resolve
10:26 pranithk sakshi: which is name-less lookup
10:27 pranithk sakshi: we can remember the fact that heal is needed in dht's inode ctx. Once resolve fop comes and we detect it needs heal. Perform setattr/setxattr etc properly.
10:27 pranithk sakshi: Upper xlators will make sure to send resolve because each fop is called with resolve_and_resume()
10:28 pranithk raghug: what do you think?
10:28 raghug pranithk: brb.. will get some coffee
10:28 raghug pranithk: thinking
10:28 pranithk raghug: xavih: Is my solution so complex that you guys both need break ;-)
10:28 pranithk raghug: I think xavih's lunch time
10:29 pranithk raghug: I have a meeting now. So brb. But I think this solution will work.
10:29 hgowtham_ joined #gluster-dev
10:29 raghug hmm.. I'll think about it
10:34 xavih pranithk: sorry, I'm back
10:34 xavih pranithk: I think it's a good approach and it seems very clean to me
10:43 kshlm joined #gluster-dev
10:44 Humble joined #gluster-dev
11:04 sakshi joined #gluster-dev
11:07 zhangjn joined #gluster-dev
11:15 nbalacha joined #gluster-dev
11:20 atinm joined #gluster-dev
11:29 itisravi joined #gluster-dev
11:30 Manikandan REMINDER: Gluster Community Bug Triage meeting in #gluster-meeting at 12:00 UTC (~in 30 minutes)
11:30 pranithk joined #gluster-dev
11:35 zhangjn joined #gluster-dev
11:36 gem joined #gluster-dev
11:47 vimal joined #gluster-dev
11:54 Manikandan_ joined #gluster-dev
11:54 Manikandan_ joined #gluster-dev
11:55 Manikandan_ joined #gluster-dev
11:59 Manikandan joined #gluster-dev
12:01 Manikandan joined #gluster-dev
12:01 Manikandan joined #gluster-dev
12:06 Manikandan joined #gluster-dev
12:12 EinstCrazy joined #gluster-dev
12:13 spalai joined #gluster-dev
12:32 zhangjn joined #gluster-dev
12:33 zhangjn joined #gluster-dev
12:37 EinstCra_ joined #gluster-dev
12:45 kshlm joined #gluster-dev
12:47 zhangjn joined #gluster-dev
13:02 zhangjn joined #gluster-dev
13:03 zhangjn joined #gluster-dev
13:04 zhangjn joined #gluster-dev
13:24 spalai left #gluster-dev
13:29 zhangjn joined #gluster-dev
13:37 zhangjn joined #gluster-dev
13:40 zhangjn joined #gluster-dev
13:45 hgowtham__ joined #gluster-dev
13:48 kotreshhr left #gluster-dev
14:08 hgowtham__ joined #gluster-dev
14:22 gem joined #gluster-dev
14:49 atinm joined #gluster-dev
15:21 shubhendu joined #gluster-dev
15:30 aravindavk joined #gluster-dev
15:47 nishanth joined #gluster-dev
16:39 rafi joined #gluster-dev
16:46 shubhendu joined #gluster-dev
17:01 ggarg joined #gluster-dev
17:02 jobewan joined #gluster-dev
17:46 kanagaraj joined #gluster-dev
18:31 hagarth joined #gluster-dev
21:48 EinstCrazy joined #gluster-dev

| Channels | #gluster-dev index | Today | | Search | Google Search | Plain-Text | summary