Perl 6 - the future is here, just unevenly distributed

IRC log for #gluster-dev, 2016-06-09

| Channels | #gluster-dev index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:38 [o__o] joined #gluster-dev
01:52 [o__o] joined #gluster-dev
02:04 shyam joined #gluster-dev
02:06 dlambrig joined #gluster-dev
02:26 hagarth joined #gluster-dev
02:32 penguinRaider joined #gluster-dev
03:30 raghug joined #gluster-dev
03:35 nishanth joined #gluster-dev
03:42 josferna joined #gluster-dev
03:44 itisravi joined #gluster-dev
03:55 shubhendu joined #gluster-dev
04:03 atinm joined #gluster-dev
04:04 pkalever joined #gluster-dev
04:23 nbalacha joined #gluster-dev
04:35 ppai joined #gluster-dev
04:35 prasanth joined #gluster-dev
04:57 nbalacha joined #gluster-dev
05:07 gem joined #gluster-dev
05:08 kshlm joined #gluster-dev
05:15 skoduri joined #gluster-dev
05:18 kotreshhr joined #gluster-dev
05:22 Apeksha joined #gluster-dev
05:31 Chr1st1an joined #gluster-dev
05:31 jiffin joined #gluster-dev
05:36 hgowtham joined #gluster-dev
05:37 ppai joined #gluster-dev
05:37 mchangir joined #gluster-dev
05:41 ndarshan joined #gluster-dev
05:42 Manikandan joined #gluster-dev
05:47 pranithk1 joined #gluster-dev
05:49 aspandey joined #gluster-dev
05:53 raghug joined #gluster-dev
05:54 ashiq joined #gluster-dev
06:03 kdhananjay joined #gluster-dev
06:07 spalai joined #gluster-dev
06:13 kdhananjay nigelb: Could you revoke the -1 vote that you've given on patch http://review.gluster.org/#/c/14658/ ?
06:20 pranithk1 joined #gluster-dev
06:20 atalur joined #gluster-dev
06:23 msvbhat_ joined #gluster-dev
06:23 overclk joined #gluster-dev
06:26 itisravi joined #gluster-dev
06:26 pkalever joined #gluster-dev
06:30 gem joined #gluster-dev
06:40 rafi joined #gluster-dev
06:41 msvbhat_ joined #gluster-dev
06:46 rastar joined #gluster-dev
06:52 rraja joined #gluster-dev
06:55 pur__ joined #gluster-dev
07:05 spalai left #gluster-dev
07:06 penguinRaider joined #gluster-dev
07:10 spalai joined #gluster-dev
07:12 hchiramm joined #gluster-dev
07:23 kshlm misc, Good morning!
07:24 kshlm I'm investigating why gerrit is getting random votes from jenkins. I sent a mail to the infra list about this.
07:25 kshlm The gerrit sshd_logs have all addresses as coming from 127.0.0.1
07:25 misc kshlm: hi. I am on pto until monday :)
07:25 kshlm Ah, okay.
07:25 misc but I can answer questions
07:26 misc just that I am about to leave and likely away from network
07:26 misc (not afk, because i always carry my malptop out of paranoia)
07:26 kshlm Any idea why the addresses are 127.0.01?
07:26 misc on gerrit ?
07:26 kshlm Yup.
07:26 kshlm The gerrit sshd_logs
07:26 kshlm It started mid-february, so I guess after the migration from iWeb.
07:26 misc /etc/xinetd.d/ssh_gerrit
07:27 misc we use a prot redirect in xinetd
07:27 misc port
07:28 misc before, it was some iptables rules
07:28 kshlm Okay.
07:28 misc and why, I have no idea, i think that maybe gerrit did refuse to listen to port22, or it was not running as root
07:28 misc so you can get the ip in /var/log/message
07:28 kshlm Let me check
07:29 kshlm I remember gerrit listening for sshd 22 before.
07:30 kshlm I do remember seeing it in gerrit.config long time back.
07:30 kshlm Don't know when that was changed.
07:30 misc neither do I, I just replaced the iptables stuff from /etc/rc.local :)
07:31 misc (because in the future, it would be easier to manage with ansible/salt)
07:31 kshlm Thanks for pointing to /var/log/messages
07:31 kshlm misc++
07:31 glusterbot kshlm: misc's karma is now 28
07:31 kshlm Also, is it possible that the old jenkins server came back online all of a sudden?
07:32 kshlm I'm just wondering if that could be the case.
07:33 pranithk1 joined #gluster-dev
07:34 Saravanakmr joined #gluster-dev
07:34 misc kshlm: we killed it with fire
07:34 misc then spread the ashes over the ocean
07:35 misc then dried the ocean to burn it again and spread the ashes over another ocean
07:35 misc so nope
07:35 kshlm Yeah. That was a stupid theory.
07:35 misc mhh
07:35 misc wait
07:36 misc seems the virt server is still up, should have been stopped
07:36 misc uh
07:36 misc kshlm: ok, the server was rebooted 4 days ago
07:36 misc and so indeed, the jenkins came back from the dead
07:37 kshlm huh.
07:37 misc [root@engg ~]# uptime  07:36:28 up 4 days, 21:51,  1 user,  load average: 0.00, 0.00, 0.00
07:37 misc shell.gluster.com also came back from dead...
07:37 misc I just killed the VM
07:38 misc so good catch
07:38 misc kshlm++
07:38 glusterbot misc: kshlm's karma is now 89
07:38 kshlm Cool!
07:38 kshlm Not a stupid theory at all.
07:38 misc well, I guess you did see the ip of the old server ?
07:38 kshlm Not yet.
07:39 kshlm I was just guessing.
07:39 misc then you were guessing right :)
07:39 kshlm The votes were linking to about 2 month old jobs.
07:39 kshlm I'll verify the logs now.
07:39 misc ok so time for me to leave and take my train (if I can get there), text me if needing anything
07:39 kshlm And that could well be the reason we've been having build failures as well.
07:40 kshlm Sure, see ya later.
07:40 kshlm I;m off for lunch as well.
08:28 kdhananjay joined #gluster-dev
08:29 nbalacha joined #gluster-dev
08:52 gem joined #gluster-dev
09:04 pranithk1 xavih: Let me know when we can talk about that bug
09:04 xavih pranithk1: hi. I could talk now
09:05 pranithk1 xavih: cool
09:06 pranithk1 xavih: I am asking the Perf guy who found this bug to join the discussion too. Give me a minute
09:06 xavih pranithk1: ok
09:06 Apeksha joined #gluster-dev
09:07 kdhananjay joined #gluster-dev
09:08 kramdoss_ joined #gluster-dev
09:09 ambarish joined #gluster-dev
09:10 pranithk1 xavih: ambarish knows you. He tests gluster for performance related regressions etc both replication/distributed-replication/ec/plain distribute
09:10 ambarish xavih, Hi!
09:10 xavih ambarish: Hi :)
09:11 pranithk1 ambarish: Could you explain the parallel workload? I remember there was a set of dds + untars in parallel
09:12 ambarish pranithk1, xavih yup..i had a couple of dds as well as the tar balll untar from 4 clients and 6 subdirs
09:12 ambarish pranithk1, xavih if it helps,i can try a lesser load
09:12 pranithk1 ambarish: no no, we want to know the exact commands at this point... That would be helpful
09:12 pranithk1 xavih: ambarish did all this on nfs
09:13 pranithk1 xavih: when ambarish tried this work load on fuse mounts, things looked fine
09:14 msvbhat__ joined #gluster-dev
09:15 xavih ambarish: are you sure you tested with patch http://review.gluster.org/14174 ? in the bug reports it says that version used was 3.7.9-8, but the patch has not been released in any version yet. It was added after 3.7.11 was released...
09:15 pranithk1 xavih: It is our internal version. We merged that patch very recently. Yes this version includes that patch
09:15 xavih pranithk1: ah, ok
09:16 nishanth joined #gluster-dev
09:16 pranithk1 xavih: Hmm... may be he is afk-ish. Let me resume where I left off and he can add more details
09:16 ambarish pranithk1, xavih Iam dding 440kb,2gb and 20GB files,kernel untar from 2 subdirs and recursive ls from multi clients
09:17 pranithk1 xavih: he is back :-)
09:17 ambarish pranithk1, xavih ack sowie!
09:17 shubhendu joined #gluster-dev
09:17 pranithk1 xavih: ambarish had 4 nfs clients mounted from 4 different servers.
09:18 pranithk1 xavih: While all this was going on, he does add-brick and rebalance and it leads to both assertion failures and I/O errors on the nfs mount
09:18 pranithk1 xavih: sosreport-gqas011.sbu.lab.eng.bos.redhat.com-20160606153131/var/log/glusterfs/nfs.log:[2016-06-06 18:56:15.764195] W [MSGID: 112199] [nfs3-helpers.c:3419:nfs3_log_common_res] 0-nfs-nfsv3: /stress19 => (XID: f64c491d, GETATTR: NFS: 2(No such file or directory), POSIX: 14(Bad address)) [Invalid argument]
09:19 ambarish pranithk1, xavih one more thing..the same test under the same workload passes on Dist Rep and pure dist vols
09:20 xavih pranithk1: I didn't see that error. I thought the problem was the assertion...
09:20 pranithk1 xavih: I need to find a way of attaching the logs. We have an internal server where the logs get uploaded. I will find a way to get the logs uploaded to the bz
09:21 xavih pranithk1, ambarish: I did some similar tests to discover the assertion failed bug, now it seems to work. The only difference is the add-brick operation...
09:21 xavih pranithk1, ambarish: I'll try to repeat with this workload
09:22 ambarish xavih yep i saw that the earlier bug was raised by ya,the assertion failures were in the brick logs...The one i hit,is specific to rebal..and the errors are in the rebalance log
09:23 pranithk1 xavih: that was the log we got for the failure. Initially I thought Bad address is EBADFD(Later realized it is EFAULT) and even the logs on posix gave so many logs with EBADFD(This is actual EBADFD).  This is why I sent http://review.gluster.org/14669, after this patch it neither gives assertion failures nor this particular bug. We get the ec bug we know with nfs where if the write wind happens for 3 of the 6 bricks and nfs server dies the
09:23 pranithk1 xavih: I saw 2 assertion failures per file in each of the nfs server logs
09:24 xavih pranithk1, ambarish: I did find a problem with nfs getting hung with a getattr operation. Maybe they are related...
09:24 pranithk1 xavih: With fuse there is no problem with any of these tests. All this happens with nfs alone.
09:25 xavih pranithk1: the asserion still appear when rebalance is executed ?
09:25 pranithk1 xavih: After the fix above, no
09:25 pranithk1 xavih: Frankly I don't know what the connection is
09:26 pranithk1 xavih: That patch was based on wrong assumption although it does fix a specific problem, I am yet to find out why it is fixing the earlier problem
09:27 xavih pranithk1: what do you mean when you say that this happens when a write is sent to 3 out of 6 ?
09:27 pranithk1 xavih: All the assertion failures are in setattr
09:28 xavih pranithk1: nfs (ganesha ?) dies or it doesn't send more requests for some reason ?
09:28 pranithk1 xavih: ah!, we discussed about this theoretical possibility long time back. If we have 4+2 ec volume. dd is performed on nfs-client. When add-brick is done nfs-server is restarted. When this server is restarted the ec transaction could have already sent 3 of the writes which increase the size of the fragment files
09:29 pranithk1 xavih: When nfs-server comes back up, nfs-client sends the write again, but this time, the stat won't match on 3 of the bricks. That leads to I/O error
09:29 pranithk1 xavih: rings a bell
09:29 pranithk1 xavih: ?
09:30 xavih pranithk1: yes, we talked about that, and we agreed that it's difficult to solve if nfs is killed instead of gracefully stopped...
09:31 pranithk1 xavih: yeah :-).
09:31 xavih pranithk1: without any way to do some transactional operation and no way to stop nfs gracefully, I don't see how to solve it...
09:31 pranithk1 xavih: But this issue is different from that. There was no I/O error because of ec write failures
09:31 xavih pranithk1: anyway this shouldn't cause any assertion failure
09:31 pranithk1 xavih: I think we can fallback to earlier known good size. At least we can provide some such option...
09:31 xavih pranithk1: ah, ok
09:32 xavih pranithk1: add-brick is processed when no I/O is ongoing ?
09:32 pranithk1 xavih: no no, dd was going on when he did this
09:32 xavih pranithk1: this can cause this problem, isn't it ?
09:33 pranithk1 xavih: yeah. But the bug ambarish found doesn't have any logs to indicate it. Only after the fix I gave in posix, he sees failures because of that size-mismatch
09:34 pranithk1 xavih: There are no assertion failures either this time with the fix. I am yet to find out why. I am going away on a personal work so may not be available till Monday. So not sure when I will get back to you...
09:34 xavih pranithk1: ok, don't worry. I'll try to investigate
09:35 xavih pranithk1: one question: with your fix (I'll try to determine why your fix has fixed the other problem) it seems that there only remains the problem of writes "cut" in the middle when add-brick is executed, right ?
09:35 pranithk1 xavih: Yes, that is what we found
09:36 xavih pranithk1: and we still agree that this is hard to solve now ?
09:36 pranithk1 xavih: Thanks! if you want something to be tested or need some logs, feel free to reach out to ambarish. He is hitting it every time when he runs the work load. If you want to give a patch with some debug logs like we did with bhaskar long time back, that should be fine too
09:36 xavih pranithk1: or do you want to find a solution ?
09:36 pranithk1 ambarish: ^^ okay right?
09:36 ambarish xavih, pranithk1 yup sure..HTH :)
09:36 pranithk1 xavih: We can solve the "cut" in the middle later
09:36 misc kshlm: so, even if wrong vote were given on patch, the tests would either work fine (so the vote didn't change much), or would fail later for subsequent patches, no ?
09:37 pranithk1 xavih: I think first problem to solve is the assertion failures along with the I/O error on the mount..
09:37 xavih pranithk1: sorry. I'm lost. Didn't you say that your patch has solved them ?
09:38 kshlm misc, The tests are fine, no problems with either the tests or slaves
09:38 kshlm misc, they should run fine for subsequent changes.
09:39 kshlm misc, I'm trying to get proper verification that it was zombie-jenkins leading to build failures on the slaves,
09:39 atalur joined #gluster-dev
09:39 misc kshlm: cause I can see why jenkins voting -1 is bad, but I guess that zombie-jenkins voting +1 is the same as regular jenkins ?
09:39 kshlm misc, but apart from ssh logins from zombie-jenkins I can't find any other logs.
09:40 misc kshlm: i think your hypothesis si right, it would likely do that
09:40 kshlm misc, that would be the same. But since we can't access the build logs, I'd rather have the jobs run again by new-jenkins.
09:40 ambarish xavih, pranithk1 LMK if i can help in any way
09:42 pranithk1 xavih: yes, it did, I just don't know what is the RCA. We need to find the RCA for why this error is coming
09:42 xavih ambarish: thanks, I'll let you know if I find something :)
09:42 xavih pranithk1: ok, I'll investigate :)
09:42 ambarish xavih, :)
09:42 pranithk1 xavih: thanks xavi
09:43 pranithk1 xavih: I will be afk for 20-30 minutes.
09:51 misc nigelb: FYI, I just found that backups.cloud was not in ansible (yet), so your keys was likely not present. I have added it now and it should be ok, and it also added firewall+selinux
09:52 misc (it should also appear in munin soon, I hope)
09:54 misc and shit, it locked me out
09:54 pranithk|afk xavih: one more thing. Remember the mkdir issue we talked about with raghug?
09:54 josferna joined #gluster-dev
09:55 pranithk xavih: In ec we need to make some code changes to make sure things work fine. ec_dir_write_cbk() doesn't ref xdata if op_ret < 0. aspandey will be sending out a patch
09:57 msvbhat_ joined #gluster-dev
10:01 misc nigelb: so no emergency, that was just selinux being turned on, and the server being selinux-less for too long
10:02 msvbhat__ joined #gluster-dev
10:03 pkalever joined #gluster-dev
10:12 Apeksha_ joined #gluster-dev
10:13 ndarshan joined #gluster-dev
10:16 xavih pranithk|afk: long time ago we discussed about the xdata ref issue. You were saying that it wasn't needed when op_ret < 0 ;-)
10:27 ppai joined #gluster-dev
10:28 pranithk xavih: Yep, it was me. Same bug existed in afr too :-)
10:32 raghug joined #gluster-dev
10:39 pkalever joined #gluster-dev
10:41 skoduri joined #gluster-dev
10:43 atinm joined #gluster-dev
10:44 mchangir joined #gluster-dev
10:46 xavih ambarish: the NFS was Ganesha or gluster's own NFS ?
10:46 ambarish xavih, it was gluster nfs
10:46 xavih ambarish: thanks
10:46 ambarish xavih, np
10:47 pkalever left #gluster-dev
10:49 pranithk1 joined #gluster-dev
11:02 ndarshan joined #gluster-dev
11:06 spalai left #gluster-dev
11:14 hchiramm ppai++ thanks !! Awesome
11:14 glusterbot hchiramm: ppai's karma is now 13
11:14 hchiramm http://libgfapi-python.readthedocs.io/en/latest/
11:17 nishanth joined #gluster-dev
11:19 ira joined #gluster-dev
11:20 ndarshan joined #gluster-dev
11:38 shubhendu_ joined #gluster-dev
11:40 atinm joined #gluster-dev
11:50 msvbhat_ joined #gluster-dev
12:14 msvbhat_ joined #gluster-dev
12:14 spalai joined #gluster-dev
12:15 spalai left #gluster-dev
12:23 msvbhat_ joined #gluster-dev
12:31 rraja hchiramm: in the link, you need to edit s/volume.unmount()/volume.umount()/ ?
12:34 ambarish yep
12:34 ambarish ack..wrong window
12:35 dlambrig joined #gluster-dev
12:38 ppai joined #gluster-dev
12:42 ppai rraja, thanks for pointing it out
12:43 hchiramm rraja++ , yep
12:43 glusterbot hchiramm: rraja's karma is now 1
12:44 ppai hchiramm, rraja have fixed it
12:50 luizcpg joined #gluster-dev
12:57 atinm xavih, can we get a review on http://review.gluster.org/14679 ?
12:58 xavih atinm: sure :)
13:00 shubhendu_ joined #gluster-dev
13:03 xavih atinm: done
13:14 pkalever joined #gluster-dev
13:35 shyam joined #gluster-dev
13:41 Manikandan joined #gluster-dev
13:46 kramdoss_ joined #gluster-dev
13:51 nbalacha joined #gluster-dev
14:08 pranithk1 joined #gluster-dev
14:15 pkalever joined #gluster-dev
14:17 pkalever left #gluster-dev
14:24 kotreshhr joined #gluster-dev
14:36 skoduri joined #gluster-dev
14:37 hagarth joined #gluster-dev
14:43 ank joined #gluster-dev
14:52 Jules- joined #gluster-dev
14:56 kotreshhr joined #gluster-dev
15:07 wushudoin joined #gluster-dev
15:16 atinm joined #gluster-dev
15:16 penguinRaider joined #gluster-dev
15:27 overclk joined #gluster-dev
15:28 anoopcs shyam, Can you please take a look at http://review.gluster.org/#/c/11177/14?
15:32 aspandey joined #gluster-dev
15:47 shyam anoopcs: Will do...
15:47 shyam anoopcs: expect some score by my EOD
15:47 anoopcs shyam, Thanks.. :-)
16:04 aspandey joined #gluster-dev
16:18 shubhendu_ joined #gluster-dev
16:24 kramdoss_ joined #gluster-dev
16:38 penguinRaider joined #gluster-dev
16:38 dlambrig joined #gluster-dev
16:43 kramdoss_ joined #gluster-dev
16:58 ambarish joined #gluster-dev
17:01 josferna joined #gluster-dev
17:12 luizcpg joined #gluster-dev
17:17 misc atinm: so, for the netbsd stuff, you did spoke to someone that fixed stuff, or I do miss something ?
17:24 atinm misc, there are couple of patches http://review.gluster.org/#/c/14653/ & http://review.gluster.org/#/c/14665/3 on which I am awaiting netbsd vote for few hours now
17:25 atinm misc, and I don't see that any of slaves picked it from the queue
17:25 atinm misc, so that's gave me an impression that all of them are offline
17:36 misc atinm: well, there is free netbsd builder, so that's puzzling
17:37 hagarth joined #gluster-dev
17:37 misc mhh, so that should be the regression test that are triggered, or the smoe test ?
17:41 misc atinm: ok, so that requires someone who know more than me about jenkins / gerrit
17:41 misc it do look correct on the interface
17:45 dlambrig joined #gluster-dev
17:47 atinm joined #gluster-dev
17:49 atinm misc, I am at -1 level on this, I've no clue how it works!
17:49 atinm misc, and that's why I come up with some stupid questions some times :-\
17:50 misc atinm: that's ok
17:51 misc I know that's crunch time due to release
17:51 misc (and well, something is maybe broken somewhere)
17:52 misc I do not really understand why jenkins schedule stuff when the disk is full and he know it, but well..
18:08 hagarth joined #gluster-dev
18:19 penguinRaider joined #gluster-dev
18:19 overclk joined #gluster-dev
18:56 nigelb joined #gluster-dev
18:57 overclk joined #gluster-dev
20:05 hagarth joined #gluster-dev
21:51 dlambrig joined #gluster-dev
22:07 dlambrig joined #gluster-dev
22:24 dlambrig joined #gluster-dev
23:08 dlambrig joined #gluster-dev
23:27 dlambrig joined #gluster-dev
23:47 pranithk1 joined #gluster-dev

| Channels | #gluster-dev index | Today | | Search | Google Search | Plain-Text | summary