Perl 6 - the future is here, just unevenly distributed

IRC log for #gluster, 2014-01-13

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:58 yinyin joined #gluster
01:01 B21956 joined #gluster
01:06 guntha__ joined #gluster
01:10 moshe joined #gluster
01:11 hagarth joined #gluster
01:12 TrDS left #gluster
01:18 moshe I'm having trouble getting my servers to talk to each other over ipv6, is doing this actually supported?
01:18 yosafbridge joined #gluster
01:32 zerick joined #gluster
01:35 zerick joined #gluster
01:37 mattappe_ joined #gluster
01:37 zerick joined #gluster
01:38 harish joined #gluster
01:50 yinyin joined #gluster
01:51 MrNaviPacho joined #gluster
01:54 kevein joined #gluster
01:58 diegows joined #gluster
02:03 mohankumar joined #gluster
02:06 psyl0n joined #gluster
02:10 mohankumar joined #gluster
02:21 theron joined #gluster
02:23 kshlm joined #gluster
02:26 brosner joined #gluster
02:32 badone joined #gluster
02:37 bala joined #gluster
02:41 kanagaraj joined #gluster
02:54 mattappe_ joined #gluster
02:57 bala joined #gluster
03:05 mattappe_ joined #gluster
03:10 psyl0n joined #gluster
03:11 bharata-rao joined #gluster
03:16 theron joined #gluster
03:21 psyl0n joined #gluster
03:26 aravindavk joined #gluster
03:37 shubhendu joined #gluster
03:50 itisravi joined #gluster
03:55 ababu joined #gluster
04:01 hagarth @fileabug
04:01 glusterbot hagarth: Please file a bug at http://goo.gl/UUuCq
04:01 hagarth thanks glusterbot!
04:06 raghu joined #gluster
04:08 RameshN joined #gluster
04:11 shylesh joined #gluster
04:18 yinyin joined #gluster
04:27 kdhananjay joined #gluster
04:28 purpleidea JoeJulian: as requested, how to reproduce that bug you mentioned you saw. If you have additional comments, I've opened: https://bugzilla.redhat.com/show_bug.cgi?id=1051992
04:28 glusterbot Bug 1051992: unspecified, unspecified, ---, vraman, NEW , Peer stuck on "accepted peer request"
04:33 ppai joined #gluster
04:35 pk joined #gluster
04:41 MiteshShah joined #gluster
04:51 glusterbot New news from newglusterbugs: [Bug 1051993] Force argument is ambiguous <https://bugzilla.redhat.com/show_bug.cgi?id=1051993> || [Bug 1051992] Peer stuck on "accepted peer request" <https://bugzilla.redhat.com/show_bug.cgi?id=1051992>
04:54 ndarshan joined #gluster
05:00 purpleidea glusterbot: thanks!
05:00 glusterbot purpleidea: I do not know about 'thanks!', but I do know about these similar topics: 'thanks'
05:00 purpleidea glusterbot: thanks
05:00 glusterbot purpleidea: you're welcome
05:03 pk left #gluster
05:04 shyam joined #gluster
05:13 dusmant joined #gluster
05:13 MiteshShah joined #gluster
05:17 prasanth joined #gluster
05:22 wgao joined #gluster
05:23 wgao joined #gluster
05:25 wgao joined #gluster
05:31 jkroon joined #gluster
05:31 nshaikh joined #gluster
05:36 pk joined #gluster
05:37 vpshastry joined #gluster
05:48 lalatenduM joined #gluster
05:51 psharma joined #gluster
05:53 satheesh1 joined #gluster
05:57 benjamin__ joined #gluster
06:06 hagarth joined #gluster
06:09 Philambdo joined #gluster
06:15 meghanam joined #gluster
06:15 meghanam_ joined #gluster
06:17 anands joined #gluster
06:21 glusterbot New news from newglusterbugs: [Bug 1037511] Operation not permitted occurred during setattr of <https://bugzilla.redhat.com/show_bug.cgi?id=1037511>
06:43 DV joined #gluster
06:50 rastar joined #gluster
06:55 hagarth joined #gluster
07:00 vimal joined #gluster
07:08 tor joined #gluster
07:13 davinder joined #gluster
07:19 CheRi joined #gluster
07:23 RameshN joined #gluster
07:27 jtux joined #gluster
07:34 complexmind joined #gluster
07:38 rjoseph joined #gluster
07:40 solid_liq joined #gluster
07:40 solid_liq joined #gluster
07:46 overclk joined #gluster
07:50 RobertLaptop joined #gluster
07:52 glusterbot New news from newglusterbugs: [Bug 1047378] Request for XML output ignored when stdin is not a tty <https://bugzilla.redhat.com/show_bug.cgi?id=1047378>
07:53 hagarth joined #gluster
08:04 ctria joined #gluster
08:07 RameshN joined #gluster
08:15 Maxence joined #gluster
08:16 hybrid5121 joined #gluster
08:31 keytab joined #gluster
08:35 RameshN joined #gluster
08:38 rjoseph joined #gluster
08:41 cfeller joined #gluster
08:41 cfeller joined #gluster
08:41 overclk joined #gluster
08:42 benjamin__ joined #gluster
08:50 raghu joined #gluster
08:56 shubhendu joined #gluster
08:59 eseyman joined #gluster
09:00 andreask joined #gluster
09:02 kanagaraj joined #gluster
09:11 Philambdo joined #gluster
09:11 harish joined #gluster
09:19 KORG|2 joined #gluster
09:37 kanagaraj joined #gluster
09:41 blook joined #gluster
09:42 harish joined #gluster
09:46 pk blook: hi
09:46 blook pk: hi, how are you doing? :)
09:46 mgebbe_ joined #gluster
09:47 pk blook: sorry I did not respond to your mail, thought IRC chat was better....
09:47 mgebbe_ joined #gluster
09:47 mgebbe_ joined #gluster
09:48 blook pk: no problem, i think we did some good steps investigating the misbehavior
09:48 Philambdo joined #gluster
09:49 pk blook: yep
09:49 pk blook: so the thing with index xlator is this
09:49 pk blook: It keeps track of files under modification
09:49 pk blook: All the files' gfids are added as hard links to a base file. That is what you see in that directory as xattrop-<uuid>
09:51 pk blook: If that file is not present any more on the replica pair, it will be deleted eventually... but according to your mail that is not the case.... Are self-heal-daemons crawling fine?
09:51 blook pk: i think so, how can i prove this?
09:52 RameshN joined #gluster
09:52 pk blook: thinking...
09:52 pk blook: do gluster volume heal <volname>
09:53 blook pk: its crawling :)
09:53 blook pk: im strafing glustershd process, and after the command it starts crawling
09:53 blook pk: stracing
09:54 pk blook: stracing glustershd won't help I guess because the fops are sent over network, bricks are the ones which execute the actual syscalls
09:56 pk blook: check if that removes stale gfids from indices/xattrop directory
09:58 blook pk: no it does not, number of files are increasing, but those files are the dht-link files
09:59 pk blook: hmm...
09:59 blook pk: the files in the xattrop dir, which are not existent on the brick itself, aren't removed
10:00 pk blook: that is strange :-|
10:00 pk blook: It takes a while for the crawl to complete....
10:01 blook pk: yeah, i can compare the number of files in the xattrop dir before the weekend
10:01 blook pk: so the files are still increasing
10:02 blook pk: but like i said, it looks to me as these are just the dht-link files with "permission denied" errors
10:02 pk blook: Hmm.... Every 10 minutes this crawl is performed in self-heal-daemon. We did not worsen anything by executing gluster volume heal <volname>....
10:02 blook pk: those files which are not existing on the bricks aren't touched
10:02 hagarth blook: do you get EPERM only for the dht link files?
10:02 blook pk: also it looks like they are ignored in the gluster volume heal info output command
10:03 pk blook: interesting....
10:03 blook pk: true, i ran the command already
10:03 blook pk: i have to check that, but so far it pretty looks like it
10:04 pk blook: answer hagarth's question...
10:04 blook pk: every probe so far, have been dht link files with PERM error
10:05 Alex Hm, quick one. What's the behaviour of noatime relative to gluster? I've read that it's ignored (https://bugzilla.redhat.com/show_bug.cgi?id=825569) - but how about if the bricks are mounted noatime? "For those Volumes where you want noatime and nodiratime, why not set those mount options on the Bricks?"
10:05 glusterbot Bug 825569: high, medium, ---, kaushal, ASSIGNED , [enhancement]: Add support for noatime, nodiratime
10:06 Alex suggests that it has some sort of effect
10:07 blook hagarth, pk, yes the PERM errors are just dht link files
10:07 hagarth Alex: you could just mount with noatime, nodiratime on the servers.
10:07 Alex hagarth: on the clients themselves?
10:09 hagarth Alex: if the servers/bricks are mounted with noatime, noadirtime setting it on the client has no bearing
10:09 Alex hagarth: OK, that makes sense. I think what I actually meant was 'if I mount bricks with noatime, will gluster be upset', and it sounds like it should be okay.
10:09 blook hagarth, pk, i grepped the bricks log file for "operation not permitted" and did a gfattr lookup on all files listed in the log, all are linkto (dht) files
10:09 Alex (Sorry, pre-coffee, so everything's a bit hard today)
10:10 TonySplitBrain joined #gluster
10:11 hagarth Alex: yes, gluster doesn't get upset and yes, pre-coffee is always hard too :)
10:11 Alex Thank you :)
10:11 DV joined #gluster
10:12 hagarth blook: have you by any chance tried this behavior on 3.5qa releases?
10:12 pk blook: It would help if you could come up with a test case to re-create this behavior. Do you think that is possible?
10:13 blook hagarth, no, this is already a production system :(
10:13 blook pk, yes, but i don't know how to get deterministic test case
10:14 hagarth blook: ok :(
10:14 overclk joined #gluster
10:14 pk blook: How probable is the test case, how many attempts would give the issue?
10:14 hagarth blook: I have my suspicion on this patch - http://review.gluster.org/4890
10:14 glusterbot Title: Gerrit Code Review (at review.gluster.org)
10:15 blook pk, i just know there are files in xattrop dir, which say they should be self-healed, but they don't exist on the brick pair, which it accuses, they exist on some other brick pairs
10:15 hagarth it is not present in 3.4.x but is present in master/release-3.5 .. so was wondering if you tried with 3.5.
10:16 pk hmm...
10:16 hagarth blook: hmm, there seem to be 2 problems - 1. o/p of volume heal info listing certain files which are not present in the bricks 2. EPERM issue seen in the logs (for setattr fop).
10:16 hagarth blook: is that right?
10:17 pk hagarth: He said, it doesn't list those files...
10:17 pk hagarth: But those files are not cleaned up from xattrop...
10:17 blook hagarth, yes there are two problems
10:17 hagarth pk: it is just that xattrop directory contains those stale files?
10:17 blook hagarth, yes what pk says is correct
10:18 pk blook: Did you ever do rebalance on your setup?
10:18 blook pk: no
10:19 blook pk, hagarth, imho perhaps its a related issue to problem 1 https://bugzilla.redhat.com/show_bug.cgi?format=multiple&amp;id=1037511
10:19 glusterbot Title: Full Text Bug Listing (at bugzilla.redhat.com)
10:19 hagarth blook: hmm, then the patch I pointed to might not be relevant
10:20 blook pk, hearth, the dht link files with the owner/user permissions is deterministic and reproduceable
10:21 blook pk hagarth  the other problem is that i have files in xattrop dir which are not present on the brick itself, just on another brick pair
10:21 blook pk, hagarth so i guess perhaps its related to the other problem
10:21 hagarth blook: do you have a test case for the dht link files problem?
10:22 blook hagarth, i can set up one, yes
10:22 pk blook: I think we should put some effort into coming up with test case....
10:23 hagarth blook: that would be useful
10:23 hagarth blook: does the gfid of the files in xattrop provide any clue?
10:24 raghu joined #gluster
10:25 blook hagarth, no not really, i just resolve it into the .gluster/gfid structure and try to stat it, and its not present, its present on a different brick pair, but what is noticeable it has the same owner permissions, with the UID/GID that is not present on the glister-servers, just on the nfs client
10:25 RameshN joined #gluster
10:26 pk blook: Please come up with the test case. It would help in our debugging.
10:26 blook pk: im on my way :)
10:27 hagarth blook: UID/GID not being present on gluster servers shouldn't cause problems (though it is a good idea to have uniform identities everywhere)
10:27 blook hagarth, it causes problems on dht link files as far as i know
10:28 hagarth blook: ok, will await your test case
10:28 blook pk, hagarth thank you, i will be back with the test case
10:29 shubhendu joined #gluster
10:29 pk blook: Lets also update the bug with our testcase so that we can track the whole issue better
10:31 shubhendu joined #gluster
10:34 calum_ joined #gluster
10:35 keytab joined #gluster
10:37 rjoseph joined #gluster
10:57 complexmind joined #gluster
10:59 klaxa|work joined #gluster
11:03 hybrid512 joined #gluster
11:04 klaxa|work hi, we're experiencing quite some performance problems with gluster 3.3.2
11:04 klaxa|work maybe we set it up incorrectly
11:04 klaxa|work https://gist.github.com/anonymous/1cc60bdde98c7f934085
11:05 glusterbot Title: glusterfs paste (at gist.github.com)
11:05 klaxa|work the logs are being filled with those messages
11:05 klaxa|work for almost two weeks now
11:06 klaxa|work during a write of 100 mb with dd at ~10 mb/s from /dev/urandom on the mounted filesystem the load was as high as 65 for the last minute
11:06 pk klaxa|work: seems to be connection problem
11:06 klaxa|work i can write files to the mount though
11:07 klaxa|work wait a sec...
11:07 pk klaxa|work: If it is a replication volume, writes will be successful even when one brick is down
11:07 klaxa|work it seems the wrong brick is mounted on one of the two servers
11:07 klaxa|work i'll check for the local bricks
11:07 pk klaxa|work: see!
11:08 klaxa|work if i write a file on machine A to the mount, it appears in the brick of machine B
11:08 klaxa|work so i guess that much works
11:08 rwheeler joined #gluster
11:09 pk klaxa|work: yes
11:09 klaxa|work timestamps on the files of the bricks are correct too
11:09 pk klaxa|work: Connection problem is from glusterd to some process....
11:10 klaxa|work ah
11:10 KORG|2 joined #gluster
11:10 klaxa|work how does that occur?
11:10 klaxa|work and what can i do to prevent that from happening?
11:11 pk klaxa|work: thinking....
11:11 klaxa|work if you need more information, i'll try to provide them
11:11 KORG|2 joined #gluster
11:13 pk klaxa|work: I would check gluster volume status/gluster peer status to see if all the connections are fine...
11:13 ricky-ti1 joined #gluster
11:14 Zylon joined #gluster
11:15 klaxa|work pk: they seem to be fine, see: https://gist.github.com/anonymous/6a0388eb6d77ab227aa3
11:15 glusterbot Title: gist:6a0388eb6d77ab227aa3 (at gist.github.com)
11:15 klaxa|work also after search a bit, it seems that disable.nfs seems to cause that
11:16 pk klaxa|work: I was just going to ask you... what happened to nfs server....
11:16 klaxa|work https://bugzilla.redhat.com/show_bug.cgi?id=847821 this seems related - and unfixed
11:16 glusterbot Bug 847821: low, medium, ---, rabhat, ASSIGNED , After disabling NFS the message "0-transport: disconnecting now" keeps appearing in the logs
11:16 pk left #gluster
11:16 pk joined #gluster
11:18 pk klaxa|work: Yep, it should be fixed....
11:19 klaxa|work i doubt this is really bound to performance though?
11:19 pk klaxa|work: Nope it shouldn't
11:19 klaxa|work too bad, i thought i was onto something :/
11:19 pk klaxa|work: What is happening?
11:23 klaxa|work file writes create huge load
11:24 bala joined #gluster
11:24 pk klaxa|work: What do you mean by load?
11:24 klaxa|work system load
11:24 pk klaxa|work: high cpu? or something else
11:24 klaxa|work high cpu wait
11:24 harish joined #gluster
11:25 pk klaxa|work: Why don't you enable profiling and see what fops are giving this delay....
11:29 edward2 joined #gluster
11:30 pk klaxa|work: You know how to enable profile?
11:32 klaxa|work no i don't, can you point me to the documentation for that?
11:34 klaxa|work also how much overhead does that add? is it safe to run it on a productive environment?
11:35 klaxa|work do the docs for 3.2 also apply for 3.3.2?
11:35 pk klaxa|work: It is pretty lightweight
11:35 pk yep
11:35 klaxa|work i guess i'll test it in our test environment instead of asking questions :)
11:36 pk for profile it does
11:36 pk klaxa|work: yep...
11:37 klaxa|work >Starting volume profile on storage has been unsuccessful
11:37 klaxa|work oh well...
11:38 klaxa|work >[2014-01-13 11:36:53.827015] I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received RJT from uuid: 00000000-0000-0000-0000-000000000000
11:38 klaxa|work this doesn't look right :S
11:38 DV joined #gluster
11:39 klaxa|work btw, thanks for your help so far pk
11:44 diegows joined #gluster
11:47 kshlm joined #gluster
12:10 mattappe_ joined #gluster
12:12 kkeithley joined #gluster
12:13 itisravi joined #gluster
12:16 complexmind joined #gluster
12:19 vpshastry1 joined #gluster
12:31 mattappe_ joined #gluster
12:44 bfoster joined #gluster
12:53 hagarth joined #gluster
12:59 theron joined #gluster
12:59 complexmind joined #gluster
13:00 benjamin__ joined #gluster
13:03 andreask joined #gluster
13:03 ctria joined #gluster
13:05 ricky-ticky1 joined #gluster
13:09 Krikke joined #gluster
13:10 Krikke hey, I can't get glusterfs to mount on boot. I'm using ubuntu 12.04 and glusterfs 3.2.5
13:11 Krikke mount -a after boot mounts it normally
13:14 stickyboy joined #gluster
13:17 mattappe_ joined #gluster
13:17 kkeithley did you use the _netdev option on your gluster volume in /etc/fstab?
13:17 Krikke yes
13:23 blook joined #gluster
13:31 mattappe_ joined #gluster
13:42 primechuck joined #gluster
13:43 sroy_ joined #gluster
13:44 vpshastry joined #gluster
13:44 vpshastry left #gluster
13:49 tryggvil joined #gluster
13:53 Krikke I added the noauto option and mount the filesystem manually in /etc/rc.local seems to fix the problem
13:58 sroy_ joined #gluster
13:59 ira joined #gluster
14:04 ricky-ti1 joined #gluster
14:11 japuzzo joined #gluster
14:12 micu joined #gluster
14:14 theron_ joined #gluster
14:15 complexmind joined #gluster
14:17 dusmant joined #gluster
14:18 ctria joined #gluster
14:24 ababu joined #gluster
14:25 rjoseph joined #gluster
14:34 dbruhn joined #gluster
14:41 rwheeler joined #gluster
14:41 chirino joined #gluster
14:49 blook hagarth, are you there?
14:49 jobewan joined #gluster
14:49 hagarth blook: yes
14:50 blook hagarth, great, so figured out how to reproduce the misbehavior
14:50 blook hagarth, creating a directory with a unknown uid/gid on nfs mount point will set the afr counter on this file/directory to non-zero
14:51 blook hagarth, when i delete the file it deletes it, but it still hangs in the xattrop directory
14:51 blook hagarth, and at the moment it doesn't seems to be purged at anytime
14:51 hagarth blook: deletion is happening from the mount right?
14:51 blook hagarth, yes
14:53 hagarth blook: interesting, looks like deletion from the index is not happening
14:54 jag3773 joined #gluster
14:59 plarsen joined #gluster
14:59 theron joined #gluster
15:04 blook hagarth, would you say deleting files in the xattrop dir, which are not present on the bricks is good idea?
15:05 blook hagarth, im a little bit afraid of getting this directory flooded, causing some performance issues
15:06 jclift joined #gluster
15:08 vpshastry joined #gluster
15:08 vpshastry left #gluster
15:08 ccha4 hello, I have replica 2 on 1 server, I have these log message 0-management: readv failed (No data available)
15:09 ccha4 ok, fixed, there was an bad added peer server
15:11 hagarth blook: we would need to be doubly sure before deleting files in the xattrop directory
15:17 blook hagarth, now its not reproduceable anymore :/
15:18 blook hagarth, im doing exactly the same.......wtf
15:18 blook hagarth, are you one of the gluster devs?
15:20 ndevos blook: I dont think 'dev' states it correctly, it is more DEV+
15:24 reqr joined #gluster
15:25 blook ndevos, what does this mean? :)
15:26 theron joined #gluster
15:28 reqr Hi all! If i have split-brain, but i use my volume in samba, that's why i can't identify problem file, how i can find problem file? Can this help me : [2014-01-13 15:19:32.099707] W [client-rpc-fops.c:471:client3_3_open_cbk] 3-smb-client-2: remote operation failed: No such file or directory. Path: <gfid:221df442-190e-4fcf-a163-096ec44158ef> (00000000-0000-0000-0000-000000000000) ?
15:28 ndevos blook: it means that he has read/reviewed/written/designed more of gluster than (I'd guess) 99% of other contributors
15:28 ndevos s/other/all other/
15:28 glusterbot What ndevos meant to say was: blook: it means that he has read/reviewed/written/designed more of gluster than (I'd guess) 99% of all other contributors
15:28 blook ndevos, guess i have the right guy for it :)
15:29 ndevos blook: yes, and together with pk who is also a senior dev, you're in good hands :)
15:30 blook ndevos, great :) appreciate it ;)
15:30 cyberbootje joined #gluster
15:31 ndevos blook: but its late for those guys now, so they might not be back before tomorrow
15:31 spstarr_work joined #gluster
15:31 bugs_ joined #gluster
15:31 spstarr_work any folks around?
15:31 ccha4 hum what does this mean State: Peer Rejected (Connected) ?
15:31 spstarr_work Im having issues with GLuster and Ubuntu 12.04
15:32 lalatenduM reqr, try "#gluster volume heal <volumeName> heal-failed
15:32 lalatenduM "
15:32 ccha4 I just added a new server in a cluster
15:33 spstarr_work Transport endpoint is not connected
15:33 spstarr_work is the Ubuntu build of gluster broken? :(
15:33 lalatenduM ccha4, that means peer prove failed, you shouldn't be using it
15:33 lalatenduM as a peer
15:33 spstarr_work glusterfs 3.2.5 built on Jan 31 2012 07:39:59
15:33 ccha4 hum I can on this peer gluster volume info
15:33 ndevos ccha4: sounds like a problem in the op-version, that is saved in /var/lib/glusterd/glusterd.info - check if the op-version is the same on all peers
15:34 lalatenduM spstarr_work, "transport endpoint not connected " error comes when the brick on which file is present is down
15:34 ndevos spstarr_work: 3.2 is pretty old, you should probably look into using the ,,(ppa)
15:34 glusterbot spstarr_work: The official glusterfs packages for Ubuntu are available here: 3.3 stable: http://goo.gl/7ZTNY -- 3.4 stable: http://goo.gl/u33hy -- 3.5 QA: http://goo.gl/Odj95k
15:34 spstarr_work lalatenduM: both are up i can reach both, peer status shows both
15:35 vimal joined #gluster
15:35 lalatenduM spstarr_work, check the output of "gluster v status" for the volume
15:35 spstarr_work cluster 'v' status isnt valid in 3.2?
15:35 spstarr_work gluster volume info shows it
15:36 ccha4 operating-version=2 for all server
15:36 spstarr_work Brick1: 192.168.0.85:/mnt/data Brick2: 192.168.1.101:/mnt/data
15:36 spstarr_work i know it sees each brick
15:36 lalatenduM spstarr_work, you should check that no glusterfsd process is down
15:36 lalatenduM in all storage nodes
15:36 ccha4 [2014-01-13 15:28:19.108537] I [glusterd-rpc-ops.c:345:__glusterd_friend_add_cbk] 0-glusterd: Received RJT from uuid: fd91672c-c28c-42cd-a3ef-38326c2de3d1, host: 10.234.51.150, port: 0
15:36 ccha4 port 0 ?
15:36 spstarr_work glusterfsd is running on both
15:37 vimal joined #gluster
15:38 lalatenduM spstarr_work, did you check "gluster v status" output?
15:38 lalatenduM ccha4, what is the gluster version u r using?
15:39 ccha4 3.4.2
15:39 spstarr_work lalatenduM: gluster v status unrecognized word: v (position 0)
15:39 spstarr_work gluster volume status ?
15:39 spstarr_work no such option
15:39 lalatenduM spstarr_work, yes
15:39 spstarr_work gluster volume status unrecognized word: status (position 1)
15:40 lalatenduM spstarr_work, I think time for you to update to some updated version :)
15:40 B21956 joined #gluster
15:40 lalatenduM ccha4, what distribution u r using
15:40 lalatenduM ?
15:41 spstarr_work goiung to 3.4 then
15:41 ccha4 lalatenduM: ubuntu
15:41 ccha4 isn't it weird about port 0 ?
15:41 lalatenduM ccha4, hmm, I haven't used Ubuntu+gluster,
15:41 lalatenduM ccha4, yeah port 0 looks odd
15:42 lalatenduM ccha4, I would suggest you to send a mail on gluster-users on your issue, i am sure somebody will respond
15:43 lalatenduM ccha4, do you have iptables rules on these machines bychance
15:44 ccha4 hum I will check firewall rules
15:45 andreask joined #gluster
15:48 kaptk2 joined #gluster
15:49 _Bryan_ joined #gluster
15:52 cyberbootje joined #gluster
15:54 psyl0n joined #gluster
15:54 Technicool joined #gluster
16:01 jbrooks joined #gluster
16:07 tryggvil joined #gluster
16:07 lpabon joined #gluster
16:10 zerick joined #gluster
16:12 sroy_ joined #gluster
16:12 jag3773 joined #gluster
16:12 aixsyd joined #gluster
16:13 aixsyd 0-gv0-replicate-0: no active sinks for performing self-heal on file
16:13 aixsyd wat.
16:15 daMaestro joined #gluster
16:21 complexmind joined #gluster
16:24 aixsyd hey guys - having a hell of a time with gluster 3.4.2-1 - every time i create a cluter, nothing writes to it. heres my glustershd log file. any idea whats happening here?
16:25 aixsyd http://fpaste.org/68004/30302138/
16:25 glusterbot Title: #68004 Fedora Project Pastebin (at fpaste.org)
16:25 aixsyd identical issue on two sets of clusters
16:29 aixsyd dbruhn: JoeJulian: any thoughts?
16:30 aixsyd it'd be one thing if it was one set of clusters, but now 4 servers are doing it. all of them are on version 3.4.2-1 as well
16:30 theron joined #gluster
16:31 aixsyd I'm following the quick start guide to the letter - and its not working one bit. I think I should downgrade to 3.4.1 at this point
16:31 KORG|2 joined #gluster
16:35 dbruhn aixsyd, what log is this out of
16:35 dbruhn and are you sure you are on 3.4?
16:35 dbruhn I see a reference to 3.3 in that log.
16:35 aixsyd dbruhn: /var/log/glustershd.log
16:35 aixsyd and yes, triple positive.
16:36 aixsyd apt-get install --reinstall confirms its grabbing 3.4.2-1
16:36 aixsyd why its referencing 3.3 in that log is beyond me
16:38 dbruhn what are you seeing in the mnt log?
16:38 aixsyd one sec
16:38 aixsyd On the client?
16:39 kanagaraj joined #gluster
16:39 aixsyd i dont have a mnt log anywhere
16:39 dbruhn That should be on the client
16:39 aixsyd in /var/log/glusterfs/ ?
16:39 dbruhn yep
16:40 dbruhn should be mnt-<volname>.log
16:40 aixsyd sec
16:41 aixsyd http://fpaste.org/68009/13896312/
16:41 glusterbot Title: #68009 Fedora Project Pastebin (at fpaste.org)
16:42 eclectic joined #gluster
16:42 dbruhn everything is mounting without issue?
16:42 dbruhn and then you when you write, nothing writes?
16:43 aixsyd correct
16:43 aixsyd and every file i write shows up in volume heal info
16:43 aixsyd big long list
16:43 dbruhn anything in the brick logs?
16:43 aixsyd on the disk, every file shows up as "512" in size, not 512kb, or mb, just "512"
16:44 aixsyd one sec, everythings rebooting
16:44 dbruhn kk
16:44 aixsyd im totally downgrading to 3.4.1 though. i didnt have any issues like this with 3.4.1
16:45 dbruhn It would be good for the community if we could get an idea of what's going on, and get a bug report filed
16:45 complexmind joined #gluster
16:46 KaZeR__ joined #gluster
16:48 aixsyd hows this, ill downgrade one cluster, keep the other
16:48 TonySplitBrain joined #gluster
16:50 dbruhn Obviously gotta do what you gotta do.
16:50 aixsyd oh ffs - now ive downgraded to 3.4.1, and gluster hangs on volume create
16:51 dbruhn did you reboot after downgrading?
16:52 aixsyd yes
16:52 dbruhn or at least restart all of the services?
16:52 aixsyd rebooted
16:54 complexmind joined #gluster
16:55 aixsyd dbruhn: just to be sure i have something straight - when doing a peer probe - if i have an IB link between two servers, i should be using the IB link addresses, right?
16:55 dbruhn you are running IP over IB right?
16:56 dbruhn if that's the case, yes
16:56 dbruhn aixsyd, you're clients aren't on the IB subnet are they?
16:57 aixsyd dbruhn: theyre not
16:58 dbruhn That might be why you are having issues, are you resolving via IP address or hostname?
17:01 aixsyd IP
17:01 dbruhn In your earlier tests were you using host names?
17:01 aixsyd always ip
17:02 complexmind joined #gluster
17:02 dbruhn I'm honestly not really sure what is returned when the client connects to the system, but it does return the connection information for all of the servers in the cluster, and the client connects to all of them on it's own, gluster has very little server to server communication when it comes to client connections.
17:03 dbruhn are your host names resolvable from the client network side?
17:03 aixsyd well at this point, im not even worried about the client - i cant even create a volume.
17:03 aixsyd soemthings seriously messed up
17:04 dbruhn What error are you getting when trying to create a volume?
17:04 dbruhn and what are you seeing from "gluster peer status"
17:05 LoudNoises joined #gluster
17:09 davinder joined #gluster
17:10 complexmind joined #gluster
17:15 bala joined #gluster
17:16 complexmind joined #gluster
17:23 bala1 joined #gluster
17:24 aixsyd joined #gluster
17:25 aixsyd dbruhn: i'm starting over from scratch.
17:25 aixsyd for the 12th time
17:33 andreask joined #gluster
17:33 Mo_ joined #gluster
17:38 zaitcev joined #gluster
17:41 sroy_ joined #gluster
17:41 johnmark thinking about what to get for a demo machine
17:41 JMWbot johnmark: @3 purpleidea reminded you to: thank purpleidea for an awesome JMWbot (please report any bugs) [2502039 sec(s) ago]
17:41 JMWbot johnmark: @5 purpleidea reminded you to: remind purpleidea to implement a @harass action for JMWbot  [2430803 sec(s) ago]
17:41 JMWbot johnmark: @6 purpleidea reminded you to: get semiosis article updated from irc.gnu.org to freenode [2335333 sec(s) ago]
17:41 JMWbot johnmark: @8 purpleidea reminded you to: git.gluster.org does not have a valid https certificate [406640 sec(s) ago]
17:41 JMWbot johnmark: Use: JMWbot: @done <id> to set task as done.
17:41 johnmark - http://www.amazon.com/Intel-Next-Computing-Black-BOXDCCP847DYE/dp/B00B7I8HZ4/ref=lh_ni_t?ie=UTF8&amp;psc=1&amp;smid=ATVPDKIKX0DER
17:42 johnmark Intel NUCs look pretty cool
17:42 johnmark string 4 or so together, and instant clutser
17:42 johnmark I hope...
17:42 johnmark JMWbot: hmm, you appear to be a bit buggy ;)
17:42 JMWbot johnmark: Sorry, I can't help you with that!
17:42 johnmark purpleidea: ^^^
17:43 johnmark not all of those are current :)
17:43 plarsen joined #gluster
17:45 SFLimey joined #gluster
17:57 dbruhn johnmark, you get my email last week?
17:58 glusterbot New news from newglusterbugs: [Bug 977505] [RFE] remove brick message could be more clear <https://bugzilla.redhat.com/show_bug.cgi?id=977505>
17:59 phase5 joined #gluster
18:00 dbruhn aixsyd, you probably could have deleted the files @ /var/lib/glusterfs
18:00 phase5 left #gluster
18:00 dbruhn little less heavy handed
18:00 aixsyd v.v
18:03 badone joined #gluster
18:13 SpeeR joined #gluster
18:16 rwheeler joined #gluster
18:18 rotbeard joined #gluster
18:25 badone joined #gluster
18:29 jkroon joined #gluster
18:31 bennyturns joined #gluster
18:32 TrDS joined #gluster
18:35 radez joined #gluster
18:36 radez selinux is preventing me from running mysql with glusterfs as storage backing
18:37 radez is it not recommended to do this?
18:40 s2r2 joined #gluster
18:41 dbruhn selinux is a pain with gluster in general, is there any reason you need it?
18:47 ilbot3 joined #gluster
18:47 topic for #gluster is now Gluster Community - http://gluster.org | Patches - http://review.gluster.org/ | Developers go to #gluster-dev | Channel Logs - https://botbot.me/freenode/gluster/ & http://irclog.perlgeek.de/gluster/
18:51 radez dbruhn: I have a cluster setup and I need some shared storage to do an active / passive failover for my database
18:51 radez meaning I have a glusterfs cluster
18:51 dbruhn Understood, can you disable selinux?
18:52 radez I'm trying to keep it on...
18:52 dbruhn Ahh ok
18:52 _pol joined #gluster
18:52 radez I'll end up investigating other options before I turn it off
18:53 dbruhn http://www.gluster.org/category/selinux/
18:53 dbruhn that link might help
18:53 dbruhn I honestly don't use selinux with my stuff, and I know a lot of guys don't
18:54 glusterbot New news from resolvedglusterbugs: [Bug 976750] Disabling NFS causes E level errors in nfs.log. <https://bugzilla.redhat.com/show_bug.cgi?id=976750>
18:55 andreask joined #gluster
18:57 radez thx for the info dbruhn
18:57 johnmark radez: heya!
18:57 radez hey johnmark
18:57 johnmark good to see you in these parts
18:58 johnmark radez: hrm, when it comes to mysql-backed glusterfs, JoeJulian is your guy
18:58 radez johnmark: thx man, pushing out gluster on TryStack
18:58 johnmark he's been running those two together for a while
18:58 johnmark radez: nice!
18:58 johnmark well, eventually nice
18:58 radez heh
18:58 johnmark ;)
18:59 radez thx for the contact, I'll see if I can catch JoeJulian sometime, do you know what TZ he's in?
19:00 spstarr_work joined #gluster
19:00 johnmark he's PST
19:00 spstarr_work um
19:00 johnmark radez: he's usually around
19:00 spstarr_work why can't I do this?
19:00 spstarr_work "gluster volume create datastore replica 2 transport tcp 192.168.0.85:/var/www 192.168.1.101:/var/www force volume create: datastore: failed: /var/www or a prefix of it is already part of a volume"
19:00 glusterbot spstarr_work: To clear that error, follow the instructions at http://joejulian.name/blog/glusterfs-path-or-a-prefix-of-it-is-already-part-of-a-volume/ or see this bug https://bugzilla.redhat.com/show_bug.cgi?id=877522
19:00 spstarr_work lol it doesnt clear it
19:01 Liquid-- joined #gluster
19:01 radez johnmark: cool, maybe he'll ping me in a bit
19:01 spstarr_work setfattr -x trusted.glusterfs.volume-id /var/www   ?
19:01 johnmark spstarr_work: not sure. never had that error
19:01 johnmark did you have a previous version ever installed there?
19:02 dbruhn spstarr is there a .glusterfs directory in the directory you are trying to use as a brick?
19:02 spstarr_work ok it removed it but
19:02 spstarr_work now volume create: datastore: failed: Host 192.168.1.101 is not in 'Peer in Cluster' state
19:02 spstarr_work yet it is in the peer
19:02 spstarr_work then running it again
19:02 dbruhn what does "gluster peer status" show
19:02 spstarr_work volume create: datastore: failed: /var/www or a prefix of it is already part of a volume
19:02 glusterbot spstarr_work: To clear that error, follow the instructions at http://joejulian.name/blog/glusterfs-path-or-a-prefix-of-it-is-already-part-of-a-volume/ or see this bug https://bugzilla.redhat.com/show_bug.cgi?id=877522
19:03 spstarr_work State: Accepted peer request (Connected)
19:03 spstarr_work for that IP
19:03 sroy_ joined #gluster
19:03 spstarr_work i see no .glusterfs file
19:03 spstarr_work not on either node
19:04 dbruhn sounds like you might still have the volume setup in your config files maybe
19:04 spstarr_work but no volumes?
19:04 spstarr_work No volumes present
19:04 dbruhn gluster volume info
19:04 johnmark spstarr_work: what version are you using?
19:04 spstarr_work gluster volume info No volumes present
19:04 johnmark spstarr_work: and check out the bug report - https://bugzilla.redhat.com/show_bug.cgi?id=877522
19:04 spstarr_work glusterfs 3.4.2 built on Jan 11 2014 03:16:11
19:04 glusterbot Bug 877522: medium, unspecified, ---, jdarcy, CLOSED CURRENTRELEASE, Bogus "X is already part of a volume" errors
19:05 johnmark oh. hrm. ok
19:05 dbruhn what is in /var/lib/glusterd/vols
19:05 spstarr_work /var/lib/glusterd/vols = empty
19:05 spstarr_work ii  glusterfs-server                 3.4.2-ubuntu1~precise4            clustered file-system (server package)
19:07 johnmark spstarr_work: did you have any part of that path set up as a gluster brick? say /var/X or /var/www/X ?
19:07 spstarr_work just:
19:07 dbruhn interesting bug, did what johnmark posted look like it could have anything to do with it
19:07 johnmark dbruhn: I hope not, because that was fixed for 3.4.x :)
19:07 spstarr_work gluster volume create datastore replica 2 transport tcp 192.168.0.85:/var/www 192.168.1.101:/var/www force
19:08 B21956 joined #gluster
19:08 spstarr_work thats all im trying to do i removed 3.2 before
19:08 spstarr_work REMOVED everything in /var/lib/gluster so nothing remained then installed 3.4.2
19:08 spstarr_work and removed everything in /etc/gluster* prior
19:08 spstarr_work so it was clean slate
19:09 spstarr_work fyi:  /dev/xvdf2      20641404   45028  19547852   1% /var/www
19:09 spstarr_work /dev/xvdf2 on /var/www type ext4 (rw)
19:09 spstarr_work Amazon
19:10 johnmark so you had 3.2 running in /var/www ?
19:11 spstarr_work no it didnt work i never got to that point
19:11 spstarr_work i had connection issues even trying
19:11 johnmark oh ok
19:11 spstarr_work anything in debug?
19:11 semiosis spstarr_work: did you follow the instructions & run the setfattr commands from that path or a prefix of it page?
19:11 glusterbot semiosis: To clear that error, follow the instructions at http://joejulian.name/blog/glusterfs-path-or-a-prefix-of-it-is-already-part-of-a-volume/ or see this bug https://bugzilla.redhat.com/show_bug.cgi?id=877522
19:12 semiosis on the dir & its parents, /var/www, /var, /?
19:12 spstarr_work semiosis: prefix being /var/www  FROM within var ?
19:12 spstarr_work oh you mean all of it...
19:12 spstarr_work sec
19:12 johnmark yeah
19:12 johnmark it's the path prefix that (I think) is giving you trouble
19:12 spstarr_work only one of those is set
19:12 spstarr_work root@www:/# setfattr -x trusted.glusterfs.volume-id /var/www root@www:/# setfattr -x trusted.glusterfs.volume-id /var/www setfattr: /var/www: No such attribute
19:13 johnmark and /var ?
19:13 johnmark and /
19:13 spstarr_work oh and / ? lemme reset '/'
19:13 B219561 joined #gluster
19:13 johnmark and /var
19:13 johnmark and then restart glusterd
19:14 johnmark on another note, this type of error really shouldn't happen
19:14 spstarr_work done on both nodes and restarted
19:14 spstarr_work trying...
19:14 johnmark ok
19:14 spstarr_work ooh
19:14 spstarr_work gluster volume create datastore replica 2 transport tcp 192.168.0.85:/var/www 192.168.1.101:/var/www force volume create: datastore: success: please start the volume to access data
19:14 spstarr_work !!!!
19:14 spstarr_work :D
19:14 johnmark woohoo
19:14 johnmark :)
19:15 spstarr_work i suspect we should add that step do that URL
19:15 spstarr_work RESTART glusterfsd
19:15 spstarr_work ....
19:15 semiosis "For the directory (or any parent directories) that was formerly part of a volume, simply:"
19:15 semiosis that apparently is not clear enough
19:15 spstarr_work volume start: datastore: success
19:16 spstarr_work now the fun part... if fstab will get this remounting properly
19:16 semiosis also has "Finally, restart glusterd to ensure it's not "remembering" the old bricks."
19:16 spstarr_work ah yes
19:16 spstarr_work 'Finally, restart glusterd to ensure it's not "remembering" the old bricks.'
19:16 spstarr_work i didnt see that...
19:16 spstarr_work ....
19:16 semiosis maybe we need to wrap it in a <BLINK> tag :)
19:17 spstarr_work lol :)
19:17 semiosis s/we need/joe needs/
19:17 glusterbot What semiosis meant to say was: maybe joe needs to wrap it in a <BLINK> tag :)
19:19 spstarr_work now to see if i can use fstab with  /etc/glusterfs/mystuff.vol
19:19 spstarr_work and not get a connection issue...
19:19 semiosis spstarr_work: what distro?
19:19 spstarr_work Ubuntu 12.04
19:19 r0b joined #gluster
19:19 semiosis using the PPA package of 3.4.2?
19:20 spstarr_work ya
19:20 semiosis why are you mount a vol file?
19:20 semiosis instead of server:volname?
19:20 spstarr_work as per this:
19:20 spstarr_work http://www.jamescoyle.net/how-to/439-mount-a-glusterfs-volume
19:20 spstarr_work More redundant mount
19:21 spstarr_work though they list fstab as : gfs1.jamescoyle.net:/datastore /mnt/datastore glusterfs defaults,_netdev,backupvolfile-server=gfs2.jamescoyle.net 0 0   only
19:21 spstarr_work so that may be just all I do need to do basically
19:21 * spstarr_work tries
19:21 semiosis hmmm, idk about that article
19:22 semiosis we usually recommend using ,,(rrdns) for the mount server to achieve high availability mounting
19:22 glusterbot You can use rrdns to allow failover for mounting your volume. See Joe's tutorial: http://goo.gl/ktI6p
19:22 spstarr_work well there's only two nodes
19:22 semiosis fine
19:23 semiosis in any case, working with volfiles is strongly discouraged
19:23 psyl0n joined #gluster
19:23 spstarr_work i see
19:23 spstarr_work but if you dont have DNS? :)
19:24 semiosis set up dns
19:24 spstarr_work no internal DNS for these two VM instances
19:24 spstarr_work unless i can host hostfile
19:24 spstarr_work but GNU libc wont do round robin in /etc/hosts
19:25 semiosis did you say you're in ec2?
19:25 spstarr_work yeah but not using route 53..
19:25 semiosis well route53 is pretty nice, you should consider it
19:25 spstarr_work I will ask my PHB :)
19:25 semiosis imo the "right way" to do this is to set up dedicated CNAMEs for your instance public-hostnames
19:26 semiosis iirc route53 can even do round robin on CNAME, which isn't a standard dns feature
19:26 spstarr_work if i can get route 53 then yes I can do DNS and all that jazz ;)
19:26 spstarr_work mmhm now this is going to be messy
19:26 spstarr_work ERROR: /var/www is in use as a brick of a gluster volume
19:27 semiosis yep
19:27 spstarr_work since /var/www IS a mount point making the brick the same mount point clobbers... but its empty
19:27 semiosis cant do that
19:27 spstarr_work ok so then i need to mount /var/www to say /mnt/www and then /var/ww as the brick mount?
19:27 spstarr_work i can do that
19:27 spstarr_work w
19:27 semiosis i think you have it backwards
19:28 semiosis bricks should be something separate, like /bricks/myvol1
19:28 semiosis then client mount would go on /var/www
19:28 spstarr_work but against /bricks/myvol1
19:28 semiosis remember all access goes through the client mount
19:28 spstarr_work i see...
19:29 * spstarr_work recreates volume
19:29 semiosis also note, since your servers are also clients, i strongly recommend using replica 3 & enabling quorum
19:29 semiosis this prevents a possible split brain scenario
19:29 dbruhn +1
19:30 spstarr_work so a 3rd VM for just that use
19:30 semiosis with replica 2, if the machines get disconnected from each other, and the same file is written on both, then that file will be split brained when the machines reconnect
19:30 semiosis they would all be identical
19:30 semiosis all three the same
19:30 dbruhn semiosis, question on quorum, I have a replica 2 system with 12 servers, do I still need replica 3 for that to work
19:31 semiosis with replica 2 clients will turn read-only when one of the servers goes down
19:31 semiosis afaik
19:32 spstarr_work asking PHB now
19:38 spstarr_work ok if I do round robin DNS
19:39 spstarr_work does Gluster try another host if its down?
19:39 spstarr_work so when it connects to glusterfs-pool.localnet  and host B is down does it then retry with another ?
19:39 semiosis ,,(rrdns)
19:39 glusterbot You can use rrdns to allow failover for mounting your volume. See Joe's tutorial: http://goo.gl/ktI6p
19:40 spstarr_work I cant use Route53... so I have to use PowerDNS or bind if it can do RRDNS
19:40 spstarr_work ok so it WILL fall over with RRDNS
19:40 semiosis try it :)
19:40 spstarr_work and try another one in that pool name
19:40 spstarr_work setting up now
19:42 semiosis you're using VPC?  so you have static IPs in ec2?
19:44 spstarr_work internally yes
19:44 spstarr_work its all private IPs
19:49 semiosis thats nice.  i still suggest using ,,(hostnames) for the individual servers
19:49 glusterbot Hostnames can be used instead of IPs for server (peer) addresses. To update an existing peer's address from IP to hostname, just probe it by name from any other peer. When creating a new pool, probe all other servers by name from the first, then probe the first by name from just one of the others.
19:50 semiosis gluster1.domain.foo for example
19:50 semiosis in case those internal IPs need to change one day
19:50 spstarr_work one problem
19:50 semiosis although it's not as critical as when in ec2 classic
19:50 spstarr_work need 4 then
19:50 spstarr_work 1 node in each availability zone
19:51 spstarr_work since i have it split amongst two
19:51 spstarr_work so i need 4
19:51 semiosis idk what you mean
19:52 cfeller joined #gluster
19:52 spstarr_work us-east-1a  glusterfs-cluster.local box #1,  us-east-1c glusterfs-cluster.local box #2    
19:52 spstarr_work for #3 to form quorum i need 2  
19:52 spstarr_work one in us-east-1a and one in us-east-1c
19:52 spstarr_work so if I lost us-east-1a I have failover to -1c
19:53 semiosis no it would be 3-way replication, so you could lose any one of them & still have quorum with the remaining two
19:53 semiosis use three availability zones
19:53 semiosis one server in each
19:54 semiosis replica 3 in your volume create command, and three bricks
19:55 spstarr_work except for DNS....
19:56 spstarr_work if the DNS server dies ...
19:56 spstarr_work route53 may solve this since its not per zone dependent
19:56 TonySplitBrain joined #gluster
19:56 semiosis route53 is pretty nice
20:00 spstarr_work indeed
20:00 spstarr_work PHB needs me to draw diagram.. doing that *sigh*
20:00 spstarr_work since this is blowing up complexity but i expected it would :)
20:04 sroy_ joined #gluster
20:15 aixsyd dbruhn: I'm stuck again. Same errors. No active sinks
20:16 aixsyd ive followed everything to the letter D:
20:17 radez JoeJulian: ping, I have a gluster + mysql issue I could use your help on if you have a sec to entertain it
20:20 spstarr_work home time :) thanks
20:20 spstarr_work chat tomorrow
20:21 mattappe_ joined #gluster
20:23 aixsyd dbruhn: looks like the SHD isnt running/wont start
20:24 aixsyd http://fpaste.org/68067/64467313/
20:24 glusterbot Title: #68067 Fedora Project Pastebin (at fpaste.org)
20:25 aixsyd This is the client mnt log: http://fpaste.org/68069/96447431/
20:25 glusterbot Title: #68069 Fedora Project Pastebin (at fpaste.org)
20:27 aixsyd yep - i just cannot write anything to the cluster.
20:28 theron_ joined #gluster
20:29 aixsyd I'm about ready to scrap this whole project
20:33 aixsyd http://fpaste.org/68073/89645169/  <-- cluster member shd log
20:33 glusterbot Title: #68073 Fedora Project Pastebin (at fpaste.org)
20:33 aixsyd if anyones around to point me in a direction, thatd really help
20:43 aixsyd so every time i try to write to the cluster, the heal info just adds each new file to the list to be healed, no disk IO, and mounting fails
20:43 aixsyd http://fpaste.org/68078/38964581/
20:43 glusterbot Title: #68078 Fedora Project Pastebin (at fpaste.org)
20:44 aixsyd this is a brand new/fresh install. if i cant figure this out, managements gonna scrap the whole thing. i'm very much panicking right now :'(
20:45 sroy_ joined #gluster
20:50 dbruhn aixsyd, give me a few min and I will be able to work with you some on it
20:50 dbruhn sorry been a busy day
20:53 dbruhn aixsyd, back
20:54 dbruhn first, "gluster peer status", "gluster volume status", "gluster volume info"
20:56 aixsyd one sec. i think i *may* have found a large issue
20:56 dbruhn ok
20:57 aixsyd my switch decided to completely revert back to factory configs
20:57 aixsyd so... no LACP.
20:57 dbruhn fun!
20:57 dbruhn lol
20:58 aixsyd now look at this, and tell me how well this is gonna work without any link aggregation
20:58 aixsyd https://i.imgur.com/49QEPuI.jpg
20:58 aixsyd dont laugh at my test setup
20:58 aixsyd XD
20:59 dbruhn Well that mess looks like a test hardware pile
20:59 aixsyd exactly
20:59 aixsyd hey look at that! my lag in ssh terminals is gone
21:00 Liquid-- joined #gluster
21:01 jclift left #gluster
21:01 dbruhn nice
21:01 aixsyd despite that big issue - still cannot write to cluster
21:01 aixsyd so, ill throw your three commands
21:02 sroy_ joined #gluster
21:02 aixsyd http://fpaste.org/68081/46945138/
21:02 glusterbot Title: #68081 Fedora Project Pastebin (at fpaste.org)
21:03 dbruhn OK, were you able to create a volume? or is that still not working?
21:03 aixsyd oh! look at that. its working now.
21:03 * aixsyd is puzzled.
21:03 dbruhn The gluster command line is pretty sensitive to things not being there
21:03 dbruhn on my 12 server system commands will time out and do all sorts of wonky things
21:04 dbruhn if one of the servers is not available
21:04 aixsyd hm
21:04 aixsyd i'm still getting some strangeness under heal info
21:04 dbruhn also, check to make sure from each one of your peers you get a good peer status, i've had it be a little weird if for some reason the UUID's get messed up
21:05 aixsyd http://fpaste.org/68082/64710613/
21:05 glusterbot Title: #68082 Fedora Project Pastebin (at fpaste.org)
21:06 mattappe_ joined #gluster
21:08 dbruhn "gluster volume heal gv0 info split-brain"
21:08 mattapperson joined #gluster
21:08 aixsyd i killed the volume, and remade it.
21:09 aixsyd i'm not up for screwing around. XD
21:09 dbruhn kk
21:09 aixsyd that volume was made when things were all screwed up
21:09 aixsyd I think the LACP thing was the issue.
21:09 aixsyd Id never encountered those sisues before
21:09 aixsyd *issues
21:09 dbruhn sounds plausible
21:09 aixsyd imagine 4 NIC's all claiming the same MAC
21:09 aixsyd x2 server
21:10 aixsyd actually, x5 servers. XD
21:11 dbruhn Sounds like a mess
21:11 semiosis aixsyd: i thought you were using infiniband
21:11 semiosis rdma?
21:12 dbruhn Ip over IB
21:12 dbruhn but yeah, that's not IB gear
21:12 semiosis right
21:12 aixsyd semiosis: IB for clusternode to node, but ethernet to the house network
21:12 semiosis that looks like good ol' ethernet
21:12 aixsyd 4x1gbe
21:13 semiosis aixsyd: using fuse or nfs clients?
21:13 dbruhn aixsyd, most of your communication happens from the client to both nodes.
21:13 semiosis ^^^ when using fuse clients
21:13 dbruhn Very little server to server communication happens.
21:13 dbruhn he is using fuse
21:13 semiosis so in that case it doesnt make much sense to have IB between the servers
21:14 dbruhn +1 semiosis
21:14 aixsyd semiosis: heals
21:14 semiosis idk much about IB but never heard of someone mixing IB & ethernet in the same cluster
21:14 aixsyd we've had downtime before due to raid heals - and in a worst-case scenario (like a raid failure on a node) that rebuild needs to happen PQD
21:15 aixsyd IB > LACP GBE
21:16 dbruhn aixsyd, on your servers, do your hostnames resolve the ethernet or the IB IP address?
21:16 semiosis aixsyd: can you please pastie the -fuse.vol file for your volume?  should be something like /var/lib/gluster/vols/volname/volname-fuse.vol iirc
21:16 aixsyd dbruhn: the nodes resolve to eachothers IB ip's
21:16 dbruhn semiosis is going down the same path I am, I would be interested in seeing that too.
21:17 aixsyd semiosis:
21:17 aixsyd http://fpaste.org/68095/38964783/
21:17 glusterbot Title: #68095 Fedora Project Pastebin (at fpaste.org)
21:17 aixsyd that it?
21:17 jag3773 joined #gluster
21:18 dbruhn shouldn't there be two bricks listed in that file?
21:19 aixsyd theres two files
21:19 aixsyd one for each server
21:19 aixsyd this is for node #1
21:19 semiosis that looks like the brick volfile
21:20 semiosis not the client volfile
21:20 aixsyd OH, on the client
21:20 aixsyd one sec
21:20 dbruhn gv0-fuse.vol should be the one
21:21 dbruhn One of these days I am going to have a cluster in testing that isn't built with it being a client so I can tell which logs and files are unique to each
21:21 aixsyd http://fpaste.org/68097/89648093/
21:21 glusterbot Title: #68097 Fedora Project Pastebin (at fpaste.org)
21:21 semiosis aixsyd: its a file on the server, both servers actually, which is sent to the client when the client requests a mount
21:21 aixsyd semiosis: got it. see above paste
21:22 semiosis yep
21:22 semiosis ok no surprises there
21:23 dbruhn aixsyd, DNS resolution is working for all of the clients on the subnet for client facing connection?
21:24 aixsyd dbruhn: yepper
21:25 semiosis 172.99.1.x is IPoIB & 10.x is ethernet, right?
21:26 aixsyd_ joined #gluster
21:26 semiosis client log file looks OK, cant see any reason why you wouldnt be able to write
21:26 swang joined #gluster
21:26 aixsyd semiosis: I got it.
21:26 semiosis ??
21:27 aixsyd it was totally non-gluster related
21:27 aixsyd my stupid switch decided to drop its config
21:27 aixsyd no LACP D:
21:27 dbruhn he had a problem with his switch not trunking
21:28 dbruhn aixsyd, do you have to save the config to your switch to make it through a reboot? Those dell switches are just rebadged cisco's aren't they?
21:28 semiosis [16:01] <aixsyd> despite that big issue - still cannot write to cluster
21:28 semiosis [16:09] <aixsyd> I think the LACP thing was the issue.
21:28 semiosis right, i missed that second line
21:28 aixsyd :P
21:28 aixsyd dbruhn: i sure do. I thought I did - but thankfully i had it backed up on a tftp server
21:29 dbruhn aixsyd, the reason for all the questions is most people don't have a unique server to server network apart from the client side network when using the fuse client.
21:30 dbruhn because all of the replication happens between the client and the servers
21:30 * aixsyd is a pioneer
21:30 aixsyd dbruhn: afaik, self-heals would go through IB, no?
21:31 aixsyd in the peer-probe process, i have them talking via IB
21:31 aixsyd clients connect via the eth IP's
21:31 semiosis it's all by hostname resolution
21:32 semiosis looks like you had that set up right though... 172. for IB & 10. for ethernet
21:32 semiosis from your logs
21:32 aixsyd :)
21:32 aixsyd i get self-heals going at like 120-130MB/s, where the 10./eth links cannot go anywhere near that fast
21:33 aixsyd i'm hitting my hard drives limits at that point
21:33 semiosis sweet
21:33 aixsyd towit - IB is insanely overkill
21:33 aixsyd but better than 1GBE and cheaper than 10GBE
21:34 dbruhn I need to get my IB in a better situation, I have an issue that IB interface won't bring up the IP interface on boot, really annoying
21:35 aixsyd what OS?
21:35 tryggvil joined #gluster
21:36 dbruhn redhat 6.5
21:36 aixsyd ah - dont know enough :(
21:36 * aixsyd is a debian fool
21:37 dbruhn i've figured out it's because my subnet manager is being slow to respond so the interface isn't seen as being fully up on boot
21:37 aixsyd heres a good one for ya'll - iperf from node 1 to node 2 is 6gbps, iperf from node 2 to node 1 is 3gbps
21:37 dbruhn semiosis is the resident ubuntu guy, and I think his builds are the debian builds too
21:38 semiosis aixsyd: wrong hostname resolution on node 1 resolves node 2 to localhost?
21:39 aixsyd hm
21:39 aixsyd DUDE
21:39 semiosis no way
21:39 aixsyd i love you
21:39 semiosis <3
21:39 aixsyd i'm also a moron - have you guessed?
21:39 aixsyd my domain is .lan
21:39 aixsyd and /etc/hosts was .com
21:39 aixsyd @_@
21:40 semiosis there's yer problem right there
21:40 aixsyd nahhhhhhhhhhhhh, that should work out fine >.>
21:41 aixsyd hmm... so, doing this all on another set of nodes, the volume create command is just hanging
21:43 aixsyd dbruhn: oh, to add a layer of complexity on top of everything else i'm doing, i'm using bonded IB nics :P
21:45 dbruhn i remember
21:45 dbruhn lol
21:46 aixsyd my boss was getting nervous about all these issues - he supplied me with this switch. o.O
21:47 aixsyd hmm.. still getting a hangup on volume create
21:48 dbruhn make sure you can resolve everything appropriately across your network, that's most commonly the issue
21:49 aixsyd seems like i can..
21:50 aixsyd peer status is good..
21:50 diegows joined #gluster
21:51 aixsyd not sure wat do here.
21:51 dbruhn I usually just remove gluster from the picture at this point, and start with the network
21:52 aixsyd wat more than ping?
21:52 psyl0n joined #gluster
21:52 dbruhn ping, or ssh, and make sure iptables is allowing communication
21:53 dbruhn you could always telnet on the appropriate port from server to server as well
21:53 aixsyd all three are okay
21:53 dbruhn app armor is disabled?
21:54 aixsyd doesnt coeme with debian
21:54 dbruhn Assuming this is a mirrored set of hardware from your lab environment?
21:54 aixsyd yepper
21:55 aixsyd just purged glusterfs and rebooting
21:55 dbruhn LACP is working right....
21:55 dbruhn lol
21:56 aixsyd LOL, yes it is.
21:56 aixsyd OMG NO ITS NOT
21:56 aixsyd j/k ;)
21:56 dbruhn lol
21:57 dbruhn Did you clear the /var/lib/glusterd files with your purge?
21:57 aixsyd probably not? lemme see when it comes back
21:57 robo joined #gluster
21:58 mattappe_ joined #gluster
21:58 aixsyd sure did
22:01 aixsyd dbruhn: still hangs
22:02 dbruhn gluster peer status?
22:03 aixsyd connected/connected
22:04 dbruhn logs showing anything when you are trying to create the volume?
22:04 aixsyd which log would I look at?
22:04 aixsyd not the shd.log
22:05 _pol joined #gluster
22:06 dbruhn maybe the cli log
22:07 aixsyd http://fpaste.org/68112/96508461/
22:07 glusterbot Title: #68112 Fedora Project Pastebin (at fpaste.org)
22:07 semiosis the glusterd log, etc-glusterfs-glusterd.log
22:08 aixsyd http://fpaste.org/68113/50910138/
22:08 glusterbot Title: #68113 Fedora Project Pastebin (at fpaste.org)
22:09 aixsyd oh shit - i think i have glusterfs-common 3.4.2 installed, but server 3.4.1-2
22:09 aixsyd that "force" part is new in 3.4.2
22:09 semiosis several lines are truncated in that log
22:10 aixsyd yeah, but this is for sure the issue
22:10 aixsyd 0-management: Failed to get 'force' flag
22:10 semiosis or maybe this is... [2014-01-13 22:00:24.231429] E [store.c:394:gf_store_handle_retrieve] 0-: Unable to retrieve store handle /var/lib/glusterd/glusterd.info, err$
22:10 aixsyd 3.4.2 makes you put force at the end if you wanna use a mount point as a brick. 3.4.2 wants a folder inside a mount point
22:10 mattappe_ joined #gluster
22:10 semiosis ok interesting
22:11 aixsyd force allows you to override that. but seeing as my server is 3.4.1, it doesnt like that syntax
22:11 aixsyd so ive got a common and server version mismatch - any idea how to downgrade glusterfs-common to 3.4.1?
22:11 semiosis well, no
22:11 dbruhn semiosis, do you think that was the fix joejulian mentioned a while back to your suggestion of putting your brick directories a level deeper than the mount point?
22:11 semiosis launchpad doesnt save old versions :(
22:12 semiosis dbruhn: probably, have to ask him
22:12 aixsyd i got it
22:12 aixsyd apt-get install glusterfs-common=3.4.1-2
22:13 dbruhn in that case aixsyd, you might want to put your brick directories a level deeper than the mount itself, if for file system is ever not mounted it will save you some headache
22:13 semiosis oh you're on debian, not ubuntu, right
22:13 dbruhn s/for/your
22:13 aixsyd http://www.gluster.org/community/documentation/index.php/QuickStart   <-- that should be edited to reflect these new changes
22:13 glusterbot Title: QuickStart - GlusterDocumentation (at www.gluster.org)
22:13 semiosis aixsyd:  please make the necessary changes to that wiki doc
22:14 aixsyd willdo
22:14 semiosis tyvm
22:14 aixsyd <3
22:14 psyl0n joined #gluster
22:15 aixsyd that fixed it - no hangup
22:17 aixsyd sweet. I will make the wiki changes when i get home
22:17 aixsyd heading out for now - thanks a ton ya'll!
22:17 dbruhn yw
23:06 mattappe_ joined #gluster
23:12 mattappe_ joined #gluster
23:12 diegows joined #gluster
23:13 m0zes joined #gluster
23:26 psyl0n joined #gluster
23:27 psyl0n joined #gluster
23:28 complexmind joined #gluster
23:52 KORG joined #gluster

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary