Perl 6 - the future is here, just unevenly distributed

IRC log for #gluster-dev, 2015-10-20

| Channels | #gluster-dev index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:04 zhangjn joined #gluster-dev
00:52 vimal joined #gluster-dev
01:02 zhangjn joined #gluster-dev
01:11 EinstCrazy joined #gluster-dev
01:28 rafi joined #gluster-dev
01:48 Humble joined #gluster-dev
02:55 gem joined #gluster-dev
03:14 hagarth joined #gluster-dev
03:21 overclk joined #gluster-dev
03:34 nbalacha joined #gluster-dev
03:37 shubhendu joined #gluster-dev
03:42 sakshi joined #gluster-dev
03:52 [o__o] joined #gluster-dev
03:53 mjrosenb ok, I'm able to spend this evening tracking down that pesky EIO
03:53 * mjrosenb waits for other people to join the channel
03:58 maveric_amitc_ joined #gluster-dev
04:02 itisravi joined #gluster-dev
04:08 overclk mjrosenb, how far did you reach?
04:12 mjrosenb not too far, I'm decently familiar with the posix code
04:12 mjrosenb but on the other end, there is the dht code, and a bunch of layers above that
04:12 mjrosenb and I'm not sure where the EIO is happening.
04:13 overclk mjrosenb, anything in the logs?
04:14 mjrosenb not in the logs that I could find.
04:14 mjrosenb Is there a particular log that I should check in?
04:17 overclk mjrosenb, brick logs and/or client log file (fuse, nfs, whatever-the-client-is).
04:20 ppai joined #gluster-dev
04:21 mjrosenb I assume things about a ping timer aren't relevant?
04:21 deepakcs joined #gluster-dev
04:23 mjrosenb I don't see anything interesting printed to the terminal on the brick, nor on the client (both were started with --debug)
04:23 overclk mjrosenb, not much w.r.t. EIO
04:23 mjrosenb var/log/glusterfs/bricks/local.log  doesn't have anything interesting that I didn't add myself.
04:24 mjrosenb and the last line in /var/log/glusterfs/data.log is from the last time I unmounted the gluster volume.
04:24 mjrosenb on print statement I added right before STACK_UNWIND_STRICT in posix.c is the return value and errno of posix_lookup
04:25 mjrosenb the return value is 0, and the errno is 22.
04:25 mjrosenb is that a function that returns 0 on success, or 0 on failure?
04:27 jiffin joined #gluster-dev
04:27 overclk mjrosenb, if op_ret is 0, then it's a success (op_errno should not matter then)
04:28 kshlm joined #gluster-dev
04:30 mjrosenb cool.
04:30 mjrosenb so the issue doesn't seem to be the return value of posix_lookup
04:31 mjrosenb this is of course, assuming that STACK_UNWIND_STRICT is more or less returning from an RPC?
04:33 overclk mjrosenb, an UNWIND invokes the callback of the previous xlator. When the callback is for the server xlator, there is when RPC comes into picture.
04:34 mjrosenb sounds good.  I didn't do any real configuration on the bricks, so presumably the thing above the posix xlator is the server xlator?
04:36 overclk mjrosenb, not really, but you're close to being correct. There are xlators such as index, access-control, locks.
04:37 mjrosenb hrmm, the gfid of the file I'm trying to chmod is ffffffd0
04:37 mjrosenb that feels far too ordered.
04:39 overclk mjrosenb, that's what in the gfid xattr?
04:39 mjrosenb yes.
04:40 mjrosenb for /local/incoming/.test/beta
04:40 mjrosenb which is the file that I tried to chmod earlier
04:40 overclk mjrosenb, can you dump (getxattr "-d" option)the xattrs for this file form the brick ?
04:41 overclk s/form/from/
04:41 mjrosenb whoops
04:41 mjrosenb my bad.
04:41 mjrosenb getextattr was being silly
04:42 rafi1 joined #gluster-dev
04:47 mjrosenb I used the -x option to dump it in hex
04:47 mjrosenb and evidently, it dumped each byte sign extended to 32 bits
04:48 mjrosenb this wrapped all the way around on my terminal, so there was a second line containing only 'ffffffd0'
04:49 overclk mjrosenb, ok. use -ehex for that.
04:51 mjrosenb this is freebsd, the only relevant option in the man page is -x
04:56 overclk mjrosenb, maybe just try to dump the values of the xattr and pipe it to hexdump?
04:58 pranithk joined #gluster-dev
04:58 maveric_amitc_ joined #gluster-dev
05:00 aspandey joined #gluster-dev
05:01 poornimag joined #gluster-dev
05:04 mjrosenb yeah, like I said, it looks fine after I take that into account:
05:04 mjrosenb 0d09 4828 9fb6
05:04 mjrosenb 0000020 48e7 862a 1dbb ac12 9a97 0ad0
05:09 aravindavk joined #gluster-dev
05:10 ndarshan joined #gluster-dev
05:10 mjrosenb so right, I'm hoping to find the place where we actually tell the kernel that the operation should return EIO, so I can find out why we got there, and trace backwards from there
05:10 hagarth joined #gluster-dev
05:10 skoduri joined #gluster-dev
05:11 mjrosenb and hopefully, it won't be too complicated.
05:11 Humble joined #gluster-dev
05:16 mjrosenb I guess another big question that I have is should the dht xlator talk to the remote brick for anything other than the initial lookup, and later setattr for the chmod call?
05:17 poornimag joined #gluster-dev
05:20 asengupt joined #gluster-dev
05:20 overclk mjrosenb, I don't think I understand that clearly.
05:21 overclk mjrosenb, what do you mean by "should the dht xlator..." ? mind to elaborate more?
05:28 Bhaskarakiran joined #gluster-dev
05:29 vmallika joined #gluster-dev
05:31 hgowtham joined #gluster-dev
05:33 kanagaraj joined #gluster-dev
05:38 mjrosenb so, I believe I have verified that the calls to posix_lookup and posix_setattr are at least returning non-error codes, I want to make sure that no other calls are being made to posix_*, and the issue is stemming either from someplace else, or the actual data returned by one of those two.
05:43 kotreshhr joined #gluster-dev
05:46 Gaurav__ joined #gluster-dev
05:46 Ameet joined #gluster-dev
05:47 itisravi joined #gluster-dev
05:48 ashiq joined #gluster-dev
05:49 atalur joined #gluster-dev
05:49 raghu joined #gluster-dev
05:50 Manikandan joined #gluster-dev
05:50 mjrosenb overclk: does that make more sense?
05:50 overclk mjrosenb, ok, I see what you mean.
05:51 overclk mjrosenb, In that case, the logs should really have the failure message(s).
05:53 mjrosenb http://paste.pound-python.org/show/Mu8T5F43EcmnIh4DVSPm/
05:54 anekkunt joined #gluster-dev
05:55 mjrosenb that looks like the return value of 0 is in fact *not* no-error.
05:58 Bhaskarakiran_ joined #gluster-dev
06:02 overclk mjrosenb, it's just a debug log with retval and errno printed. it's still not an error.
06:05 mjrosenb but when I strace chmod, it most definitely gives EIO: fchmodat(AT_FDCWD, "beta", 0755)        = -1 EIO (Input/output error)
06:05 kdhananjay joined #gluster-dev
06:06 gem joined #gluster-dev
06:13 mjrosenb right, so this is why my next course of action is to find where the gluster client tells the kernel that it should EIO
06:17 kanagaraj joined #gluster-dev
06:18 jiffin1 joined #gluster-dev
06:20 kanagaraj_ joined #gluster-dev
06:26 anekkunt joined #gluster-dev
06:27 ggarg joined #gluster-dev
06:28 kanagaraj joined #gluster-dev
06:28 skoduri joined #gluster-dev
06:33 poornimag joined #gluster-dev
06:37 kanagaraj joined #gluster-dev
06:41 mjrosenb so, IIRC, gluster ignores libfuse, and opens /dev/fuse itself, and speaks the raw protocol?
06:53 kanagaraj joined #gluster-dev
06:55 mjrosenb ooh, another (hopefully simple) question: when I do a backtrace in gdb, will the callstack have things where A made an RPC to B on another machine, then B made an RPC back to the client, and called C?
06:59 poornimag joined #gluster-dev
07:00 raghu joined #gluster-dev
07:04 kanagaraj_ joined #gluster-dev
07:04 mjrosenb so like fuse_setattr_cbk, is this something that is being called because the remote setattr completed, or because the lookup prior to attempting a setattr completed?
07:05 lalatenduM joined #gluster-dev
07:07 jiffin1 joined #gluster-dev
07:10 kanagaraj__ joined #gluster-dev
07:23 poornimag joined #gluster-dev
07:37 anekkunt joined #gluster-dev
07:37 ggarg joined #gluster-dev
07:44 maveric_amitc_ joined #gluster-dev
07:45 kanagaraj_ joined #gluster-dev
07:46 kdhananjay joined #gluster-dev
07:48 kanagaraj joined #gluster-dev
07:49 kshlm joined #gluster-dev
07:49 overclk mjrosenb, yeh, gluster reads /dev/fuse and parses raw..
07:50 Humble joined #gluster-dev
07:52 kanagaraj joined #gluster-dev
07:52 overclk mjrosenb, the callback is invoked either way - success or error with op_ret 0 or -1 respectively.
07:54 maveric_amitc_ joined #gluster-dev
08:04 mjrosenb it can be called from multiple places?
08:04 mjrosenb also, these stacks are getting annoying.
08:05 rraja joined #gluster-dev
08:36 overclk mjrosenb, "places" here would typically be a STACK_UNWIND.
08:44 mjrosenb and the corresponding STACK_WIND specifies the callback to use?
08:45 overclk mjrosenb, correct.
08:45 spalai joined #gluster-dev
08:59 mjrosenb woah, it looks like chmod returns sometime when md-cache is finishing up.
09:08 mjrosenb I suspect it is when MDC_STACK_UNWIND (setattr, frame, op_ret, op_errno, prebuf, postbuf, xdata); gets 'called'
09:10 overclk mjrosenb, you mean the call didn't reach the server and md-cache unwound it?
09:15 mjrosenb I mean, I'm stopped in gdb on that line, and chmod still hasn't returned
09:15 mjrosenb when I hit 'n' to step over it, chmod fails with EIO
09:21 tdasilva joined #gluster-dev
09:22 mjrosenb if the function above me on the stack has STACK_WIND, then the current function was called like a normal function, not an rpc, right?
09:28 vmallika joined #gluster-dev
09:36 maveric_amitc_ joined #gluster-dev
09:38 skoduri joined #gluster-dev
09:39 overclk mjrosenb, yeh, you're correct about the function invocation part.
09:50 mjrosenb │158             res = writev (priv->fd, iov_out, count);                                                               │
09:50 mjrosenb that call causes chmod to fail.
09:52 mjrosenb which is a reasonable place for it to go away
09:52 mjrosenb since it is communicating with the fuse device ther
09:52 mjrosenb *there
09:52 mjrosenb but, nothing around seems to give an indication that there is an error.
09:53 mjrosenb the kernel for the client is 3.9.6, while this is rather old, I feel like it should still work.
09:54 overclk mjrosenb, that's just writing to the fuse device and telling the kernel about the result. the actual error (if any) would have happened elsewhere..
09:57 mjrosenb a few stack frames up, in fuse_setattr_cbk, op_ret is 0; is that the return value that will be given to fuse, or is it an unrelated return value?
09:58 rjoseph joined #gluster-dev
09:59 overclk mjrosenb, so you're running freebsd servers and a linux client?
10:00 Manikandan joined #gluster-dev
10:00 overclk mjrosenb, looking at a stack frame it should be easy to tell where op_ret got a -1.
10:01 mjrosenb overclk: yes.
10:01 mjrosenb but it doesn't have a -1; it is 0.
10:03 overclk mjrosenb, just a pointer: check who is preparing "iov_out" and examine that routine "iff" op_ret is 0 all way up.
10:03 mjrosenb http://paste.pound-python.org/show/lRKHB3A7yFgcCqTPM92y/
10:03 mjrosenb this looks like the data is reasonable for not being an error.
10:04 overclk mjrosenb, looks ok to me too..
10:04 mjrosenb since I'm pretty sure this data just gets written into the device
10:05 mjrosenb maaaybe my kernel is messed up
10:06 mjrosenb well, I've been meaning to replace the failing disk, guess I can do that now.
10:06 mjrosenb I would be *so* sad if I haven't been using gluster for the past couple of weeks because my kernel oopsed like a month ago.
10:08 overclk mjrosenb, If you need more help in debugging drop a mail to the devel ML..
10:17 shyam joined #gluster-dev
10:19 poornimag joined #gluster-dev
10:19 raghu joined #gluster-dev
10:27 overclk joined #gluster-dev
10:31 kotreshhr1 joined #gluster-dev
10:35 pranithk joined #gluster-dev
10:46 overclk joined #gluster-dev
10:48 hagarth joined #gluster-dev
10:51 maveric_amitc_ joined #gluster-dev
11:03 Manikandan joined #gluster-dev
11:23 firemanxbr joined #gluster-dev
11:25 overclk joined #gluster-dev
11:26 kotreshhr joined #gluster-dev
11:47 kkeithley nixpanic: ping. are you really gone?
11:48 kkeithley ndevos: ^^^
12:00 kotreshhr joined #gluster-dev
12:10 hagarth pranithk: ping, around?
12:13 pranithk hagarth: yes
12:17 hagarth pranithk: I am thinking of implementing an iops interface for our xlators .. basically something that keeps track of reads, writes processed by the translator and dump that in statedump
12:17 vmallika joined #gluster-dev
12:17 hagarth pranithk: could be useful for debugging performance problems
12:18 pranithk hagarth: oh, so what will be the differences between io-stats and this one?
12:18 hagarth pranithk: io-stats is stationary as of now (at the endpoints or more correctly it just measures fops & latencies at the top of the graph)
12:19 pranithk hagarth: yes. Which also has the number of fops. So how will iops be different?
12:19 hagarth pranithk: if we want to figure out iops at various points in the graph, there is no good way to do this
12:19 pranithk hagarth: ah!
12:19 pranithk hagarth: so what you are saying is we are going to put this in between xlators? like trace xlator?
12:19 hagarth pranithk: subset of fops is what iops would be interested
12:20 hagarth pranithk: no, xlators themselves would track this information and dump on demand.
12:20 pranithk hagarth: oh, so each xlator will have this info...
12:21 pranithk hagarth: got it
12:21 hagarth no need to load additional translators for this. I view performance as an essential attribute of each xlator. so we could create a new interface to handle performance statistics.
12:21 pranithk hagarth: what is the interface?
12:21 spalai left #gluster-dev
12:22 hagarth something like a .stats/.perf symbol exposed by each translator.. xlators wishing to participate can have this symbol.
12:23 hagarth pranithk: having this should increase visibility of potential bottlenecks within gluster stack
12:24 pranithk hagarth: There is no question about it :-)
12:25 pranithk hagarth: okay, the way I would have implemented is to put it in xl->iops. add code in stack-wind,stack-unwind(may-be) and have dump-ops kind of thing to print it in meta xlator/statedump
12:25 pranithk hagarth: I am not understanding the use of symbol :-
12:25 pranithk hagarth: :-(
12:25 hagarth I also need to read through the patches posted by the facebook folks about metrics. Interested to see where they are headed.
12:25 pranithk hagarth: ah! got it
12:25 pranithk hagarth: by why the symbol?
12:25 hagarth pranithk: "iops" is a symbol which gets set as xl->iops during init :)
12:26 EinstCrazy joined #gluster-dev
12:26 pranithk hagarth: got it. But why should it be exposed...
12:26 pranithk hagarth: we can have it as private member itself right?
12:27 hagarth pranithk: we could do that too. having a new symbol would help bundle related attributes & functions. a placeholder for future expansion too.
12:27 pranithk hagarth: I think some part you have in your head you didn't communicate :-). Or I conveniently ignored
12:28 hagarth pranithk: the joys of working remote :)
12:28 pranithk hagarth: hehe
12:28 kkeithley joined #gluster-dev
12:28 pranithk hagarth: still didn't get how symbol is helpful. Why don't you send a patch. May be I will understand it then
12:31 hagarth pranithk: what do you intend setting xl->iops to?
12:34 pranithk hagarth: are you saying xl->iops are going to be structure of functions?
12:34 pranithk hagarth: I was thinking xl->iops is going to be array :-)
12:34 pranithk hagarth: array of counters which we will be dumping :-)
12:36 kdhananjay joined #gluster-dev
12:43 hagarth pranithk: xl->perf could be a structure of functions with ability to dump iops, throughput, latency etc. and possibly more.
12:43 pranithk|afk hagarth: impeccable timing. Now I got it
12:45 shyam hagarth: Consider additionally introducing trace points (for systemtap or LTTNg) in the same thread
12:45 shyam The xlator dump being discussed is very useful, and needed
12:46 shyam If we could also register some trace points in the same effort it would serve well for additional/external scripts to dig into stacks and other details while tracing problems
12:47 hagarth shyam: possibly rename perf to monitor
12:47 hagarth and bundle all monitoring infrastructure in that.
12:47 overclk joined #gluster-dev
12:48 shyam hagarth: I assume this is built into the xlator, and not an xlator by itself, right?
12:48 aravindavk joined #gluster-dev
12:48 hagarth shyam: yes, that is the idea! xlators should spew out self-contained observations
12:48 shyam hagarth: good, agree
12:49 shyam It is always difficult (without further automated graph instrumentation) to introduce io-stats between xlators
12:49 shyam So something that the xlators themselves do would be better
12:49 shyam or built into the WIND/UNWIND as pranithk|afk suggested above
12:49 hagarth right and I somehow do not like the idea of adding more xlators into the graph for minimal benefits.
12:52 shyam hagarth: I would then strongly suggest that we consider/add trace points as a part of this as well (maybe in the WIND/UNWIND)  so that deeper inspection is possible at runtime (dtrace like for *BSD and LTTNg/systemtap/? for Linux)
12:52 hagarth shyam: that would be cool
12:52 shyam hagarth: yup, and if we find the right point and can abstract the xlator name to generate the trace point, almost free in terms of implementation ;)
12:53 hagarth shyam: yeah :D
13:00 shyam hagarth: pranithk|afk: Once you have  a feature page, please point me to it, do not want to lose the trace points conversation when doing this. It could end up as an orthogonal feature, in which case we can decide to fork, but just stating otherwise.
13:01 Bhaskarakiran joined #gluster-dev
13:05 hagarth shyam: sure, I plan to do this soon. there are a bunch of problems that can benefit from this.
13:07 shyam hagarth: Thanks, let em know how I can help.
13:07 shyam em -> me
13:07 hagarth shyam: will do
13:09 maveric_amitc_ joined #gluster-dev
13:19 mjrosenb joined #gluster-dev
13:20 mjrosenb bah.  I rebooted the machine, and chmod is still giving an EIO
13:20 mjrosenb does anyone else see something wrong with the buffer that was sent to gluster?
13:20 zhangjn joined #gluster-dev
13:21 zhangjn joined #gluster-dev
13:22 zhangjn joined #gluster-dev
13:23 zhangjn joined #gluster-dev
13:30 overclk joined #gluster-dev
13:32 * mjrosenb wonders if we're speaking the wrong version of the protocol or something
13:34 mjrosenb gah, the log of fuse data isn't deserialized at all :-(
13:40 dlambrig left #gluster-dev
13:40 64MAD1F06 joined #gluster-dev
13:48 overclk joined #gluster-dev
13:58 zhangjn joined #gluster-dev
13:59 zhangjn joined #gluster-dev
14:00 zhangjn joined #gluster-dev
14:01 overclk_ joined #gluster-dev
14:38 rafi joined #gluster-dev
14:42 rafi1 joined #gluster-dev
14:42 nbalacha joined #gluster-dev
14:45 rafi joined #gluster-dev
14:55 overclk joined #gluster-dev
15:03 ira joined #gluster-dev
15:03 cholcombe joined #gluster-dev
15:06 firemanxbr joined #gluster-dev
15:08 firemanxbr joined #gluster-dev
15:09 firemanxbr joined #gluster-dev
15:17 kotreshhr left #gluster-dev
15:17 jiffin joined #gluster-dev
15:49 pranithk joined #gluster-dev
16:03 overclk joined #gluster-dev
16:17 kshlm joined #gluster-dev
16:36 hagarth joined #gluster-dev
16:42 kshlm joined #gluster-dev
16:55 overclk joined #gluster-dev
17:06 shubhendu joined #gluster-dev
17:09 firemanxbr joined #gluster-dev
17:11 kshlm joined #gluster-dev
17:35 rafi joined #gluster-dev
17:36 pranithk left #gluster-dev
17:38 rafi joined #gluster-dev
17:45 rafi joined #gluster-dev
17:48 rafi joined #gluster-dev
17:52 rafi joined #gluster-dev
17:54 rafi joined #gluster-dev
18:12 badone joined #gluster-dev
19:12 jobewan joined #gluster-dev
20:09 jobewan joined #gluster-dev

| Channels | #gluster-dev index | Today | | Search | Google Search | Plain-Text | summary