Perl 6 - the future is here, just unevenly distributed

IRC log for #gluster-dev, 2015-02-12

| Channels | #gluster-dev index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
02:40 badone__ joined #gluster-dev
02:48 ilbot3 joined #gluster-dev
02:48 Topic for #gluster-dev is now Gluster Development Channel - http://gluster.org | For general chat go to #gluster | Patches - http://review.gluster.org/ | Channel Logs - https://botbot.me/freenode/gluster-dev/ & http://irclog.perlgeek.de/gluster-dev/
03:47 kanagaraj joined #gluster-dev
03:59 itisravi joined #gluster-dev
04:01 hagarth joined #gluster-dev
04:05 shubhendu joined #gluster-dev
04:07 atinmu joined #gluster-dev
04:08 bala joined #gluster-dev
04:17 nishanth joined #gluster-dev
04:28 spandit joined #gluster-dev
04:33 gem joined #gluster-dev
04:34 prasanth_ joined #gluster-dev
04:36 anoopcs_ joined #gluster-dev
04:36 jiffin joined #gluster-dev
04:36 ndarshan joined #gluster-dev
04:39 jiffin1 joined #gluster-dev
04:40 rafi joined #gluster-dev
04:40 schandra joined #gluster-dev
04:49 kaushal_ joined #gluster-dev
04:49 nkhare joined #gluster-dev
04:52 deepakcs joined #gluster-dev
04:52 ppai joined #gluster-dev
05:00 schandra joined #gluster-dev
05:10 anoopcs_ joined #gluster-dev
05:20 Manikandan joined #gluster-dev
05:20 Manikandan_ joined #gluster-dev
05:25 kdhananjay joined #gluster-dev
05:25 overclk joined #gluster-dev
05:34 hagarth joined #gluster-dev
05:39 soumya joined #gluster-dev
05:42 nkhare joined #gluster-dev
06:00 rafi1 joined #gluster-dev
06:31 hagarth joined #gluster-dev
06:33 raghu` joined #gluster-dev
06:33 shubhendu joined #gluster-dev
06:51 atinmu joined #gluster-dev
07:08 itisravi joined #gluster-dev
07:10 rafi joined #gluster-dev
07:17 hagarth joined #gluster-dev
07:22 shubhendu joined #gluster-dev
07:23 atinmu joined #gluster-dev
07:30 nkhare joined #gluster-dev
07:41 anrao joined #gluster-dev
07:48 bala joined #gluster-dev
07:48 kanagaraj joined #gluster-dev
07:48 nishanth joined #gluster-dev
07:48 spandit joined #gluster-dev
07:48 gem joined #gluster-dev
07:48 prasanth_ joined #gluster-dev
07:48 ndarshan joined #gluster-dev
07:48 jiffin1 joined #gluster-dev
07:48 kshlm joined #gluster-dev
07:48 deepakcs joined #gluster-dev
07:48 ppai joined #gluster-dev
07:48 schandra joined #gluster-dev
07:48 anoopcs joined #gluster-dev
07:48 Manikandan joined #gluster-dev
07:48 kdhananjay joined #gluster-dev
07:48 overclk joined #gluster-dev
07:48 soumya joined #gluster-dev
07:49 raghu` joined #gluster-dev
07:49 rafi joined #gluster-dev
07:49 hagarth joined #gluster-dev
07:49 shubhendu joined #gluster-dev
07:49 atinmu joined #gluster-dev
07:49 nkhare joined #gluster-dev
07:49 bala joined #gluster-dev
08:05 itisravi joined #gluster-dev
08:30 shubhendu joined #gluster-dev
08:40 bala joined #gluster-dev
08:48 jiffin joined #gluster-dev
08:48 schandra joined #gluster-dev
08:55 nishanth joined #gluster-dev
09:11 ws2k3 joined #gluster-dev
09:40 nkhare joined #gluster-dev
09:43 anrao joined #gluster-dev
09:50 bala joined #gluster-dev
10:01 overclk joined #gluster-dev
10:07 anrao joined #gluster-dev
10:24 schandra joined #gluster-dev
10:50 nkhare joined #gluster-dev
10:51 badone__ joined #gluster-dev
11:11 bala joined #gluster-dev
11:14 overclk joined #gluster-dev
11:41 kkeithley1 joined #gluster-dev
11:52 ira joined #gluster-dev
11:54 nishanth joined #gluster-dev
12:04 anoopcs_ joined #gluster-dev
12:12 pranithk joined #gluster-dev
12:12 shubhendu joined #gluster-dev
12:12 kshlm joined #gluster-dev
12:16 bala joined #gluster-dev
12:26 kanagaraj joined #gluster-dev
12:44 pranithk left #gluster-dev
12:53 anoopcs_ joined #gluster-dev
13:01 bala joined #gluster-dev
13:02 ndarshan joined #gluster-dev
13:34 anoopcs_ joined #gluster-dev
13:34 anoopcs__ joined #gluster-dev
13:41 nishanth joined #gluster-dev
13:56 prasanth_ joined #gluster-dev
14:00 shyam joined #gluster-dev
14:18 hagarth joined #gluster-dev
14:18 prasanth_ joined #gluster-dev
14:19 ndarshan joined #gluster-dev
14:46 xavih ndevos: ping. I've more information about the nfs.t problem. It seems a bigger problem not related to ec
14:47 ndevos xavih: hi there!
14:47 xavih ndevos: I've just sent an email
14:47 xavih ndevos: it also happens with a replicated volume
14:47 ndevos xavih: oh, okay, I'll check that in a bit
14:48 ndevos xavih: do you think it has it been introduced with the epoll changes?
14:49 xavih ndevos: I don't think so. But probably it might have made the problem worse
14:49 ndevos xavih: ah, right, I bet it can make some race conditions more obvious
14:49 xavih ndevos: I think it's a saturation problem
14:50 xavih ndevos: I've seen more than 1400 requests coming from NFS that are being processed simultaneously
14:50 ndevos oh, wow, thats quite a lot
14:50 xavih ndevos: it seems that NFS doesn't wait for answers before sending more requests
14:51 xavih ndevos: gluster gets busy (I'm using non powerful machines) and NFS times out (messages seen on /var/log/messages)
14:51 ndevos yeah, I do not think it has to wait, depending on how the clients send the procedures
14:51 xavih ndevos: but if it doesn't wait, it makes it very difficult to handle such amount of requests...
14:52 xavih ndevos: specially if servers are busy or aren't very powerful
14:52 ndevos xavih: there is a rpc throttling mechanism, maybe we need to tune that for the epoll changes
14:53 xavih ndevos: I've tried disabling it to allow maximum throughput with bricks (if I understood correctly what the option does), but it also happens
14:54 xavih ndevos: I can try with a smaller value. Does this option influence on NFS connection ?
14:56 ndevos xavih: the throttling in nfs influences nfs-client <-> nfs-server, the lower the value, the less RPC requests the NFS-server keeps open
14:56 xavih ndevos: I'll try lowering the value...
14:56 ndevos at least, I think thats how it is meant to work, I never met the guy who wrote the code
15:00 _Bryan_ joined #gluster-dev
15:02 shyam joined #gluster-dev
15:03 xavih ndevos: using outstanding-rpc-limit=1 with a replica 2 seems to work (however it seems that sometimes the writes stop for a second or two and then continue)
15:03 ndevos xavih: hmm, good to know!
15:04 xavih ndevos: now I'm doing the same with a replica 3, and it's having a lot of stops. Some of them of many seconds. It will probably fail
15:04 ndevos :-/
15:06 ndevos xavih: are you testing with nfs.outstanding-rpc-limit or rpc.outstanding-rpc-limit?
15:06 shubhendu joined #gluster-dev
15:07 ndevos ah, rpc.outstanding-rpc-limit might well be server.outstanding-rpc-limit
15:07 xavih ndevos: some simple numbers: NFS is sending (at the beginning) about 330 requests per second. This means that if gluster cannot serve each request in 3 ms (quite unlikely) it won't be able to sustain the throughput
15:07 xavih ndevos: I'm using server.outstanding-rpc-limit
15:08 ndevos xavih: right, so that is the nfs-server <-> brick connection
15:08 ndevos xavih: the nfs.outstanding-rpc-limit would limit the client in sending its procedures
15:08 xavih ndevos: no, these are requests received by the nfs server
15:09 xavih ndevos: I can try with nfs.outstanding-rpc-limit
15:09 ndevos by default, the nfs.outstanding-rpc-limit is set to 16. and for server. it is 64
15:09 xavih ndevos: It has just failed with a replica 3
15:10 xavih ndevos: I'll repeat with the other option
15:10 ndevos hmm, maybe I'm misremembering things... it's been a while I had to look at that piece of code
15:17 jobewan joined #gluster-dev
15:31 xavih ndevos: I can't make it work with any combination of options
15:31 ndevos xavih: thats on a replica 3?
15:32 xavih ndevos: yes
15:33 ndevos xavih: I guess you should disable the test case for now, I will not be able to look into it before next weel _/
15:33 ndevos :-/
15:33 xavih ndevos: ok, I'll send a patch
15:34 shyam xavih: Without the MT epoll patch does this happen in your setups?
15:34 shyam I know you state that this can happen either way, but setting the event-threads to 1 also reproduces the issue?
15:34 ndevos xavih: please open a bug against the nfs component and file the patch against that, a followup should then revert your patch and include a fix
15:34 soumya joined #gluster-dev
15:35 xavih shyam: I tried once with a version before MT epoll and it worked, however with event-threads set to 1 it also failed
15:36 xavih shyam: however I haven't made many tests with these combinations, so it might have been good (ar bad) luck
15:36 xavih ndevos: I'll do
15:51 dlambrig joined #gluster-dev
15:52 shyam xavih: jFYI, with MT epoll set to 1 thread, the behavior on the socket is still edge triggered, so it would attempt to read as much as it can from the socket, before rearming the socket for events (just stating). This should not alter the write behavior though (i.e sending protocol requests, which is where the problem starts I believe).
15:54 xavih shyam: could this read of all available data allow to process more requests than the theoretically limit configured with outstanding-rpc-limit ?
15:56 shyam outstanding rpc limit should limit the # of requests awaiting a response, so further RPC requests should not be sent. So, in this case the reader, or reciver of requests should read as many as was written and should recieve no more data from the socket and go back to waiting on that. So I am not sure this should happe
15:56 * shyam checking where and how the outstanding-rpc-limit is implemented
15:59 xavih shyam, ndevos: it seems that the only option that alleviates the problem is server.outstanding-rpc-limit. nfs.outstanding-rpc-limit seems to not change anything, at least in a visible way
15:59 shyam xavih: Which one are you referring to, nfs.outstanding-rpc-limit or server.outstanding-rpc-limit?
15:59 xavih shyam: I'm testing both
15:59 shyam k
16:00 xavih shyam: the one that minimizes the problem seems to be server.outstanding-rpc-limit
16:01 shyam xavih: ok, thanks
16:01 * shyam checking the code
16:10 xavih shyam: have you been able to reproduce it ?
16:10 shyam xavih: Hmmm... the throttle at the server end is ultimately implemented by turning off the poling for in events (fn: socket_throttle), so in case a thread is already reading data from the socket, then based on the edge triggered method, it will not stop till there is no data in the socket, which means we are not reading event (or RPC) by event, hence can overflow the throttle.
16:10 shyam xavih: nope
16:12 shyam xavih: but I think the read till data present on the edge triggered epoll may break the throttle. In the previous case we processed this RPC by RPC, so we could let the network pipe accumulate the packets, and so the write end (i.e client/protocol) would get an error on write and wait to write more requests etc.
16:12 shyam xavih: With the current apporach, this throttle can kick in only when the client sort of takes a breather, and we remove the socket poll in event
16:13 xavih shyam: interesting
16:13 xavih shyam: I can try to check if this is causing the problem
16:13 shyam xavih: I guess this is where the problem starts, so this is introduced by MT epoll, older cases the throttling would have taken care of this not happening 9I think)
16:13 shyam xavih: how do you propose doing that?
16:14 shyam xavih: checkout, code beore MT epoll and run some tests, etc?
16:14 xavih shyam: I'm not sure yet. Knowing this I'll look at the code and try to enforce the rpc limit in some way, just to see if this is the root cause
16:14 shyam xavih: The default is 64 for server rpc limit
16:15 shyam xavih: If it is reproducible, then you could go back in time on code, and check if we still accumulate 1500 odd requests when running the tests, I believe that should be far lesser
16:15 xavih shyam: ok, I'll try it
16:15 shyam Then we know the accumulation starts with MT epoll and that is most probably because throttle is broken
16:16 shyam xavih: Sorry, I am not able to reproduce this (maybe faster machine etc.) thanks
16:16 xavih shyam: np :)
16:17 shyam xavih: ok let me know how that goes and will pick up the thread from there.
16:19 xavih shyam: ok
16:27 kshlm joined #gluster-dev
16:29 glusterbot joined #gluster-dev
16:36 kshlm joined #gluster-dev
16:39 hagarth joined #gluster-dev
16:48 xavih shyam: with a version just before MT epoll patch, the maximum number of ongoing requests have been 22
17:01 xavih shyam: with the MT epoll patch applied, the number of requests grows to more than 1300
17:06 tdasilva joined #gluster-dev
17:17 xavih shyam: there's a bug opened for this: https://bugzilla.redhat.com/show_bug.cgi?id=1192114 (at that time it seemed an NFS problem)
17:17 glusterbot Bug 1192114: unspecified, unspecified, ---, bugs, NEW , NFS I/O error when copying a large amount of data
17:20 gem joined #gluster-dev
18:25 shyam xavih: Thank you, I think I maybe able to monitor the requests as well, and this should provide a good test case basis (when the fix is provided)
18:25 shyam xavih: How do you monitor these ongoing requests?
18:36 jiffin joined #gluster-dev
19:20 ndevos JustinClift: I would *really* appreciate it if you can jump on the "Marcelo Barbosa presentation" email to -infra and explain the plan/status of the Gerrit update - Marcelo should be able to help with that
19:39 badone__ joined #gluster-dev
20:39 dlambrig left #gluster-dev
21:34 badone__ joined #gluster-dev

| Channels | #gluster-dev index | Today | | Search | Google Search | Plain-Text | summary