Camelia, the Perl 6 bug

IRC log for #gluster, 2013-01-16

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:20 copec so if I get a server.c:685:server_rpc_notify... disconnecting connection from _client_
00:20 copec did the client rpc send the disconnect?
00:21 ctria joined #gluster
00:21 copec I guess let me reframe that question:  Is that it logging that _it_ is disconnecting from the client, or that it was notified that the client disconnected from rpc?
00:23 raven-np joined #gluster
00:37 JoeJulian dbruhn: Normally a rebalance will not move a file from a less-full brick to a more-full one. Force overrides that behavior. Not sure if it does anything else besides that though.
00:37 JoeJulian copec: The client closed the tcp connection.
00:50 badone joined #gluster
00:53 stopbit joined #gluster
01:06 JoeJulian file a bug
01:06 glusterbot http://goo.gl/UUuCq
01:09 copec JoeJulian, thanks.
01:30 nik__ joined #gluster
01:59 jbrooks joined #gluster
02:03 bala1 joined #gluster
02:11 zwu joined #gluster
02:13 raven-np joined #gluster
02:13 raven-np1 joined #gluster
02:46 efries joined #gluster
02:54 copec JoeJulian, so while verbose copying a directory hierarchy into the glusterfs fuse mount and watching the network traffic, the copy will stall on a relatively small file (I'm copying jpg's into it) and the network traffic essentially stops at the same time
02:54 copec and 42 seconds later it times out
02:54 copec across all six bricks on three systems
02:56 copec I'm learning a lot :)  so I'm not frustrated yet
02:58 copec no errors on the server bricks other than what I said before
03:13 JoeJulian copec: That's the ,,(ping-timeout). Might want to watch wireshark and see if the client's getting the TCP FIN. For some reason it sounds like the server's shutting down the network before it kills glusterfsd.
03:13 glusterbot copec: The reason for the long (42 second) ping-timeout is because re-establishing fd's and locks can be a very expensive operation. Allowing a longer time to reestablish connections is logical, unless you have servers that frequently die.
03:13 copec I just thought it was because 42 was the answer
03:14 JoeJulian It is because of that. That's the reason they chose that for the ping-timeout default.
03:15 copec It seems like _something_ is going wrong with it, so it is a legitimate timeout
03:15 copec something between the slowaris rpc and the gnu rpc
03:15 JoeJulian And it would be if your server goes hard down, or your network fails.
03:16 copec the test I ran I left pings going from the client to all the servers, and then ran a dstat on the client while copying the files
03:16 copec all of the sudden the network traffic flatlines, the file copy stalls, etc...
03:17 JoeJulian Check your init sequence on shutdown. Something's ending the network before it should.
03:21 copec It is the default Ubuntu upstart sequence...nothing is calling an init
03:24 JoeJulian Oh, I thought you were running something rpm based. In that case, I'll need to figure out how it shuts down to help you any further, unless semiosis is around and could answer.
03:25 semiosis i actually just dropped in
03:26 semiosis what os is your glusterfs server running?
03:26 semiosis copec: ^^
03:28 semiosis and whats the problem?
03:28 copec Solaris 11.1
03:28 copec I built 3.3.0, here is what I did:  https://docs.google.com/document/d/1ZIpcL8​FbE-xD1auAx1aMSr07J1bZNiqovrSX-ZQu9CI/edit
03:28 glusterbot <http://goo.gl/OXGnZ> (at docs.google.com)
03:29 copec I create a filesystem with two bricks per server, three servers
03:29 copec I built the exact same 3.3.0 on ubuntu 12.04.1, use the fuse mount
03:29 copec start copying files
03:30 semiosis ok gotcha
03:30 copec an arbitrary time into copying files it stalls, and at the same time network traffic dies down from ~50MB/sec to nothing
03:30 semiosis a brick process is dying
03:30 copec 42 seconds later client times out from all of the bricks
03:30 copec let me paste up the logs
03:30 semiosis client log would say which one... or you can run 'gluster volume status <name>' on a server
03:30 semiosis and it will tell you which bricks are up, which are down
03:30 copec It shows all of them up
03:31 semiosis ok then check client log
03:31 semiosis client is losing network connection to one of the bricks
03:31 shylesh joined #gluster
03:32 koodough joined #gluster
03:32 copec It looses it to all of them, simultaneously
03:32 copec pastebin alternative bot me
03:32 semiosis @paste
03:32 glusterbot semiosis: For RPM based distros you can yum install fpaste, for debian and ubuntu it's dpaste. Then you can easily pipe command output to [fd] paste and it'll give you an url.
03:33 semiosis idk about that dpaste
03:34 semiosis copec: if client loses connection to all bricks simultaneously then i'd say your client has network issues
03:34 semiosis check dmesg maybe nic issues
03:34 copec I looked at dmesg, I'm ssh'd into it, and I have pings going from it to all the servers
03:34 semiosis or cable or switch issues
03:34 copec and it is all solid
03:35 copec well, it's not erroring
03:35 copec heh
03:35 semiosis well i guess pastie.org that client log file showing the disconnects
03:35 semiosis and a few dozen lines above them too
03:35 semiosis for context
03:38 copec I might have to ping you in a bit, have a couple of screaming kids :P
03:38 overclk joined #gluster
03:40 semiosis i might be asleep when you get back
03:41 semiosis but go take care of your kids, we can pick up tmrw
03:48 sgowda joined #gluster
04:01 sripathi joined #gluster
04:13 jbrooks joined #gluster
04:16 m0zes joined #gluster
04:18 greylurk joined #gluster
04:22 __Bryan__ joined #gluster
04:23 _msgq_ joined #gluster
04:23 mmakarczyk_ joined #gluster
05:01 sripathi1 joined #gluster
05:01 hagarth joined #gluster
05:12 bulde joined #gluster
05:22 raghu joined #gluster
05:32 sripathi joined #gluster
05:40 rastar joined #gluster
05:44 vpshastry joined #gluster
05:49 yosafbridge joined #gluster
05:51 sashko joined #gluster
05:51 sripathi joined #gluster
06:01 lala joined #gluster
06:03 rastar1 joined #gluster
06:15 Ramereth joined #gluster
06:21 lala joined #gluster
06:21 hagarth joined #gluster
06:25 m0zes joined #gluster
06:32 venkat joined #gluster
06:35 bala joined #gluster
06:36 ngoswami joined #gluster
06:46 vimal joined #gluster
06:49 rgustafs joined #gluster
06:58 cyr_ joined #gluster
07:07 glusterbot New news from newglusterbugs: [Bug 878663] mount_ip and remote_cluster in fs.conf are redundant <http://goo.gl/p6MZ7>
07:11 jtux joined #gluster
07:13 rastar joined #gluster
07:22 sripathi joined #gluster
07:26 Nevan joined #gluster
07:28 sripathi joined #gluster
07:45 sripathi joined #gluster
07:59 ramkrsna joined #gluster
07:59 15SAAUNF3 joined #gluster
08:00 ctria joined #gluster
08:01 ctria joined #gluster
08:07 ruissalo joined #gluster
08:08 hagarth joined #gluster
08:21 ekuric joined #gluster
08:27 gbrand_ joined #gluster
08:28 gbrand_ joined #gluster
08:32 jtux joined #gluster
08:37 glusterbot New news from newglusterbugs: [Bug 895831] auth.allow limited to 1024 chars in 3.2.5 and perhaps later versions, can you increase to something much bigger or allow unlimited length or see bug 861932 <http://goo.gl/2H6wW>
08:45 andreask joined #gluster
09:00 badone joined #gluster
09:02 bauruine joined #gluster
09:20 guigui1 joined #gluster
09:21 ekuric joined #gluster
09:23 jtux joined #gluster
09:23 H__ Any Csync2 users here (as alternative to geo-replication) ?
09:24 kshlm joined #gluster
09:33 DaveS joined #gluster
09:36 sripathi joined #gluster
09:37 Azrael808 joined #gluster
09:43 x4rlos semiosis: I updated the geo-replication bug from yesterday. Though a symlink sounds dirty :-)
09:47 andreask joined #gluster
09:51 bzf130_mm joined #gluster
09:55 ctrianta joined #gluster
10:05 rastar joined #gluster
10:06 bulde joined #gluster
10:07 sripathi joined #gluster
10:09 tryggvil joined #gluster
10:21 dobber joined #gluster
10:24 guigui1 joined #gluster
10:25 guigui1 joined #gluster
10:25 sgowda joined #gluster
10:33 sripathi1 joined #gluster
10:33 rastar joined #gluster
10:34 guigui1 joined #gluster
10:42 tryggvil joined #gluster
10:43 tryggvil_ joined #gluster
10:51 raven-np joined #gluster
10:53 hagarth joined #gluster
11:11 ctria joined #gluster
11:11 sripathi joined #gluster
11:21 rastar joined #gluster
11:34 venkat joined #gluster
11:36 guigui1 joined #gluster
11:50 andreask joined #gluster
11:53 venkat joined #gluster
11:55 andrei_ joined #gluster
11:55 andrei_ i was wondering how do I sync the data between two glusterfs servers that are set in replicated mode
11:55 andrei_ i've got a file on one server that is different from the second file server
11:55 andrei_ and I can't seems to get it in sync
11:55 andrei_ i tried the heal option
11:55 andrei_ but that didn't work
11:55 andrei_ as a result, two of my glusterfs clients see the same file with different sizes
11:56 andrei_ even worse, accessing this file gives me md5sum: /mnt/glusterfs/date: Input/output error
12:01 x4rlos andrei_: You been editing on the server directly rather than over the gluster mount?
12:01 hateya joined #gluster
12:02 andrei_ x4rlos: yes, the file on this occasion has been manually changed on the server. I know you are suppose to only change it via the client side mount
12:02 andrei_ do you know if there is a way to force the sync/heal so that it updates the file on both servers?
12:03 twx_ simplest solution sounds like copying the correct file off the server
12:03 twx_ and then moving it back, onto the glusterfs mount point
12:03 twx_ of course removing all traces of it on the gluster volume inbetween those two steps
12:07 ndevos andrei_: you are looking for a ,,(targetted self heal)
12:07 glusterbot andrei_: I do not know about 'targetted self heal', but I do know about these similar topics: 'targeted self heal'
12:07 ndevos @targeted self heal | ~andrei_
12:07 ndevos targeted self heal | ~andrei_
12:07 twx_ targeted self heal
12:07 ctria joined #gluster
12:07 twx_ @targeted self heal
12:07 glusterbot twx_: http://goo.gl/E3b2r
12:08 ndevos ah, I remember
12:08 ndevos ~targeted self heal | andrei_
12:08 glusterbot andrei_: http://goo.gl/E3b2r
12:08 ctria joined #gluster
12:09 andrei_ thanks guys!
12:09 andrei_ i will check it out
12:10 andrei_ another question, if I shutdown one of the file servers (let's say to upgrade the kernel or other security updates) would glusterfs start an automatic heal upon the server reboot?
12:10 andrei_ or will I manually need to run it every time i restart one of the file servers?
12:14 cyr_ joined #gluster
12:17 sripathi joined #gluster
12:18 andrei_ x4rlos: the targeted self heal didn't work unfortunately
12:18 andrei_ i've mounted  the glusterfs volume on the good server as suggested
12:19 andrei_ and did the find /gluster/ -printf "/mnt/gluster-fix/%P\0" | xargs -0 stat > /dev/null command as suggested
12:20 andrei_ the "bad" file server still has the old version of the file
12:20 x4rlos andrei_: This a test server? Move the unwanted version of the file to <filename>.bad and see if it then resyncs the wanted version from the master.
12:20 x4rlos (do this on the affected server directly)
12:21 andrei_ yes, that's the test server
12:21 x4rlos Then try as i suggested :-)
12:21 andrei_ will do
12:23 dobber joined #gluster
12:26 x4rlos and cvross yer fingers :-)
12:27 andrei_ x4rlos: the suggested made things even more strange. very strange indeed
12:27 andrei_ i've moved the old version of the file from the "bad" server to date.bad
12:27 andrei_ and ran the stat command on the client side
12:29 andrei_ I ended up with having two files on a good server (named date and date.bad, but the date.bad file is a copy of the good date file and not the bad date file). The second server which had the bad file still has only one file which i've renamed, which is called date.bad
12:32 andrei_ http://pastebin.com/m6u9vXaH
12:32 glusterbot Please use http://fpaste.org or http://dpaste.org . pb has too many ads. Say @paste in channel for info about paste utils.
12:35 manik joined #gluster
12:43 ramkrsna joined #gluster
12:43 ramkrsna joined #gluster
12:50 dustint joined #gluster
12:54 x4rlos Sorry, wt work, and keep getting distracted.
12:56 x4rlos hmm. Interesting. And if you now delete the file.bad on the correct server - what happens? (do this over the mount, and make a copy of the file.good if you need to keep it).
12:56 plarsen joined #gluster
13:02 tryggvil joined #gluster
13:09 bulde joined #gluster
13:15 ctria joined #gluster
13:16 toruonu joined #gluster
13:17 toruonu I seem to have an NFS locking issue again. I remember I had it a while ago (ca 1 month ago) and from backtracking my e-mails I seemed to claim restarting nfslock on client seemed to help, but right now I get in strace this:
13:17 toruonu 8615  fcntl(4, F_SETLK, {type=F_RDLCK, whence=SEEK_SET, start=0, len=1}) = -1 EACCES (Permission denied)
13:17 toruonu restarting nfslock service doesn't seem to help
13:17 toruonu the gluster volume is mounted over nfs and then bind_mounted into an OpenVZ container
13:18 toruonu the hardnode NFS locking service was restarted as there is none on the container that sees it as local filesystem
13:20 andrei_ I'll give it a go
13:20 andrei_ thanks for helping by the way
13:21 x4rlos no probs. Would be good to hear what worked :-)
13:21 toruonu even stopping the container, unmounting, remounting, starting the container doesn't seem to help
13:23 andrei_ x4rlos: okay, i've managed to fix it now.
13:24 andrei_ however, not entirely the way that you've mentioned
13:24 andrei_ but similar
13:24 andrei_ i had a case where two connected clients was seeing different files
13:24 andrei_ one client saw the good version of the file
13:24 andrei_ and the other client saw the bed version
13:25 andrei_ so i've deleted the bad version of the file from the client that saw the bad version
13:25 andrei_ and ran stat on that file on the good server.
13:25 puebele joined #gluster
13:25 andrei_ following that the good file got copied over to the second server
13:25 andrei_ and both clients saw the same picture
13:27 x4rlos ah, cool. When you say you ran 'stat' on the file, what do you mean exactly?
13:27 balunasj joined #gluster
13:32 edward1 joined #gluster
13:32 jack_^ joined #gluster
13:38 hateya joined #gluster
13:38 vpshastry left #gluster
13:49 hateya joined #gluster
13:52 hateya joined #gluster
13:53 puebele joined #gluster
13:54 dobber joined #gluster
13:56 puebele joined #gluster
14:02 Gugge x4rlos: stat is a unix command
14:04 dbriggs54 I am looking for some help, I have a server that (centos 6.3) where the gluster service will not restart it crashes, leaving pid and lock file in place. Message Log follows
14:04 dbriggs54 Jan 16 08:03:24 linyv2 abrt[15064]: Saved core dump of pid 15020 (/usr/sbin/glusterfsd) to /var/spool/abrt/ccpp-2013-01-16-08:03:24-15020 (400097$
14:04 dbriggs54 Jan 16 08:03:24 linyv2 abrtd: Directory 'ccpp-2013-01-16-08:03:24-15020' creation detected
14:04 dbriggs54 Jan 16 08:03:24 linyv2 abrtd: Unrecognized variable 'Deletion' in '/etc/abrt/abrt-action-save-package-data.conf'
14:04 dbriggs54 Jan 16 08:03:39 linyv2 abrtd: Sending an email...
14:04 dbriggs54 Jan 16 08:03:39 linyv2 abrtd: Email was sent to: root@localhost
14:04 dbriggs54 Jan 16 08:03:39 linyv2 abrtd: Duplicate: UUID
14:04 dbriggs54 Jan 16 08:03:39 linyv2 abrtd: DUP_OF_DIR: /var/spool/abrt/ccpp-2013-01-15-11:20:52-8082
14:04 dbriggs54 Jan 16 08:03:40 linyv2 abrtd: Problem directory is a duplicate of /var/spool/abrt/ccpp-2013-01-15-11:20:52-8082
14:04 dbriggs54 Jan 16 08:03:40 linyv2 abrtd: Deleting problem directory ccpp-2013-01-16-08:03:24-15020 (dup of ccpp-2013-01-15-11:20:52-8082)
14:04 dbriggs54 was kicked by glusterbot: message flood detected
14:05 toruonu someone needs to learn fpaste :)
14:07 x4rlos Gugge: Yeah - but doing this will cause gluster to update?
14:08 Gugge yes
14:08 x4rlos how are they related?
14:08 dbriggs54 joined #gluster
14:08 Gugge stat calls initiates a selfheal
14:08 dbriggs54 test
14:09 x4rlos really?! i did _not_know that.
14:09 dbriggs54 I am down to reinstalling the server on sunday if i can not get it to work
14:10 lala joined #gluster
14:10 Gugge http://community.gluster.org/a/howto-targeted-s​elf-heal-repairing-less-than-the-whole-volume/ :)
14:10 glusterbot <http://goo.gl/E3b2r> (at community.gluster.org)
14:13 kkeithley dbriggs54: version? use fpaste and paste the /var/log/glusterfs logs and a gdb backtrace from the core file
14:16 x4rlos Gugge: Well, knock me over with a stick.
14:17 dbriggs54 gluster 3.2.7
14:17 dbriggs54 centos 6.3
14:17 kkeithley x4rlos: yes, really. Actually any "locate" (locate is a gluster internal function) will trigger self heal. /usr/bin/stat is the simplest command you can run to invoke a locate and trigger a self heal
14:17 bzf130_mm joined #gluster
14:22 mmakarczyk joined #gluster
14:22 x4rlos Yeah, i just read the docs. Forgive my lack of knowledge here, but I figured stat isn't a binary associated with gluster - unless gluster replaces it? How does gluster know that stat has been triggered? or is this a byproduct somehow?
14:22 dbriggs54 kkeithley- cant send a file do not have permision
14:22 dbriggs54 soory
14:22 dbriggs54 i am new a this
14:25 tryggvil joined #gluster
14:26 jrossi left #gluster
14:27 ndevos x4rlos: the stat binary executes the stat() syscall, that syscall goes through FUSE (or NFS) to a glusterfs process, that process contains a handler for all filesystem syscalls and does a LOOKUP when stat() is called
14:29 aliguori joined #gluster
14:29 ndevos oh, well, and the LOOKUP is probably the "locate" kkeithley menioned, but I have not checked the sources for that, so I might be a little off
14:29 ndevos but you get the general idea, maybe
14:30 theron_ joined #gluster
14:31 johnmark theron_: heya
14:33 theron joined #gluster
14:45 kkeithley yes, I meant lookup, not locate
14:50 dustint joined #gluster
14:57 badone joined #gluster
15:02 rwheeler joined #gluster
15:05 ekuric1 joined #gluster
15:08 dustint_ joined #gluster
15:10 dustint joined #gluster
15:16 manik joined #gluster
15:17 obryan joined #gluster
15:17 bugs_ joined #gluster
15:18 obryan left #gluster
15:21 hagarth joined #gluster
15:23 bdperkin_gone joined #gluster
15:23 bdperkin joined #gluster
15:25 tryggvil joined #gluster
15:25 stopbit joined #gluster
15:27 tryggvil joined #gluster
15:27 wushudoin joined #gluster
15:31 x4rlos ndevos: Sorry, distracted at work again.
15:31 jbrooks joined #gluster
15:31 x4rlos ndevos: Great explination there - not i gets it :-)
15:32 x4rlos s/not/now
15:32 ndevos ah, good :)
15:41 x4rlos haha
15:41 hagarth joined #gluster
15:45 Azrael808 joined #gluster
15:54 neofob left #gluster
15:54 raven-np joined #gluster
15:59 rastar joined #gluster
16:00 ekuric joined #gluster
16:03 tryggvil joined #gluster
16:04 chouchins joined #gluster
16:06 ekuric joined #gluster
16:11 daMaestro joined #gluster
16:17 tryggvil joined #gluster
16:17 theron joined #gluster
16:19 JoeJulian Gugge, x4rlos: Actually not just stat but anything that triggers a lookup(). stat was just used because it added the least number of additional operations to that trigger.
16:24 chouchins joined #gluster
16:24 neofob joined #gluster
16:28 rwheeler joined #gluster
16:30 raghu joined #gluster
16:42 primusinterpares joined #gluster
16:47 theron left #gluster
16:48 theron joined #gluster
16:48 x4rlos JoeJulian: I was looking at the wiki link Gugge: sent. thats interesting. :-)
17:01 Azrael808 joined #gluster
17:06 puebele joined #gluster
17:24 jbrooks joined #gluster
17:28 nueces joined #gluster
17:32 rastar joined #gluster
17:35 Mo___ joined #gluster
17:47 JoeJulian jdarcy: "one would expect Ceph to dominate, what with that kernel client to reduce latency and all" I love the subtle dig. :D
17:49 raven-np joined #gluster
17:52 portante joined #gluster
17:55 johnmark :)
17:57 puebele joined #gluster
17:59 gbrand__ joined #gluster
18:03 _msgq_ joined #gluster
18:03 semiosis JoeJulian: link?
18:04 msgq joined #gluster
18:04 semiosis ah nm, http://www.gluster.org/2013/01/glusterfs-vs-ceph/
18:04 glusterbot <http://goo.gl/2aVXw> (at www.gluster.org)
18:08 cicero nice.
18:20 jag3773 joined #gluster
18:20 JoeJulian cicero: The dig was against Stephan who, on the gluster-users mailing list, argues without merit that the only way that GlusterFS will ever "succeed" is if it's merged into the kernel.
18:21 JoeJulian Based solely on his opinion, of course. He has no basis in fact, just his feeling.
18:21 jdarcy Subtle?  Ha!  Not me.
18:21 JoeJulian You didn't call him out by name and say neener neener, so yeah. Subtle.
18:21 jag3773 hello, I'm having a heck of time trying to add a 3rd peer to an existing 2 node cluster, I keep getting the Peer Rejected message, the logs notes that the checksums of the volume differ.  The third node is a fresh install, so there shouldn't be any stale configs laying around.  I'm coming up short on where to look as the internet and the gluster docs don't seem to address this issue in depth
18:22 JoeJulian jag3773: what version?
18:22 jag3773 3.3.1, on centos 6.3
18:22 jdarcy jag3773: Volume checksums?  Did you copy something from one of the older nodes?
18:22 jag3773 nope
18:23 jag3773 fresh install
18:23 jdarcy What's the message *exactly* so I can search for it in the code?
18:24 jag3773 the closest I've got was following some notes on http://irclog.perlgeek.de/gluster/2012-11-26
18:24 jag3773 one moment
18:24 glusterbot Title: IRC log for #gluster, 2012-11-26 (at irclog.perlgeek.de)
18:24 jag3773 [2013-01-15 11:05:04.561625] E [glusterd-utils.c:1926:glus​terd_compare_friend_volume] 0-: Cksums of volume foo-data differ. local cksum = -2082430821, remote cksum = -2082146057
18:25 _br_ joined #gluster
18:33 bauruine joined #gluster
18:35 JoeJulian jag3773: Try deleting /var/lib/glusterd/vols on the new server and restarting glusterd.
18:35 jag3773 I did that, as noted in that link I posted, I got the same thing
18:35 ajna joined #gluster
18:35 jag3773 i moved everything out of /var/lib/glusterd/, except the info file, and I got the same thing
18:36 jag3773 I should say that doing that procedure does not change the Peer Rejected status
18:39 jag3773 okay, so i tried just deleting that folder, rm -rf /var/lib/glusterd/vols/; service glusterd restart, and now we have a change
18:40 y4m4 joined #gluster
18:40 jag3773 node 1 says that the new node is in the gluster, the new node says accepted peer request, but node 2 still only lists node 1 as a part of the gluster
18:41 jag3773 do i need to probe the new node from node 2 as well?
18:43 jag3773 thoughts JoeJulian or jdarcy ?
18:44 JoeJulian Shouldn't have to but worth a try.
18:44 JoeJulian Actually... just restart glusterd on server2
18:45 jag3773 can i do a reload?  This is a live system so i don't want to cause any hiccups
18:45 JoeJulian Yes. glusterd is just the management process.
18:45 dberry joined #gluster
18:45 dberry joined #gluster
18:46 jag3773 no change after a reload
18:47 dberry I recently upgraded one of my client boxes and it upgraded the gluster-client to 3.2.5 and now it cannot connect the 3.0.2 server
18:47 jag3773 trying to probe it from node 2 yields: 192.168.2.99 is already part of another cluster
18:47 dberry I get [fuse-bridge.c:419:fuse_attr_cbk] 0-glusterfs-fuse: 13610: LOOKUP() / => -1 (Transport endpoint is not connected)
18:48 dberry Can a 3.2.5 client work with a 3.0.2 server?
18:48 JoeJulian jag3773: I wonder if there's a difference between the volume info in server1 vs server2 and server3 synced with one of them but failed the checksum with the other...
18:48 JoeJulian dberry: Nope
18:48 JoeJulian dberry: X.Y versions must match between clients and servers.
18:49 jag3773 node 2 says: [glusterd-handler.c:2245:glusterd_probe_begin] 0-glusterd: Unable to find peerinfo for host: 192.168.2.99 (24007)
18:49 dberry thanks
18:49 JoeJulian dberry: And for X.Y.Z, you should upgrade the servers to Z before the clients.
18:50 JoeJulian dberry: Lastly, if you're upgrading, why are you upgrading to an old version? In the 3.2 series there's 2 more bugfix releases.
18:51 JoeJulian ~repos | dberry
18:51 glusterbot dberry: See @yum, @ppa or @git repo
18:51 dberry it was a machine upgrade with the standard ubuntu repo for 12.04
18:51 jag3773 JoeJulian, i restarted glusterd on the new node and now we are back in a peer rejected state
18:52 JoeJulian dberry: you should consider the ,,(ppa) for the latest bug fixes.
18:52 glusterbot dberry: The official glusterfs 3.3 packages for Ubuntu are available here: http://goo.gl/7ZTNY
18:53 JoeJulian jag3773: Same Cksum error?
18:54 jag3773 yes, except it's the new node that threw the cksum error
18:56 JoeJulian heh... "volume%d.ckusm"
19:01 wdilly joined #gluster
19:01 JoeJulian hagarth: ^
19:02 ajna I am having an issue that I am not sure if it is a real issue that is preventing me from making volumes work, but seems to be an issue to me now. Do I explain?
19:02 wdilly so, ive installed gluster a few times now, and today installing from the rpm it complained about a dependency "systemd-units" is this new
19:02 jag3773 what os wdilly ?
19:03 wdilly centos 6.3
19:03 jag3773 what repo?
19:03 wdilly fresh out of the box install...
19:03 jag3773 use this: http://repos.fedorapeople.org/repos/k​keithle/glusterfs/epel-glusterfs.repo
19:03 glusterbot <http://goo.gl/Yuv7R> (at repos.fedorapeople.org)
19:04 jag3773 that will give you version 3.3.1, I'm guessing the RPM you are using is from Fedora, which uses systemd, but centos does not
19:04 kkeithley RPMs in my repo have systemd-units too
19:04 JoeJulian But that's for fedora, right?
19:04 JoeJulian Not epel?
19:04 kkeithley er, right, RHEL/CentOS 6.x doesn't use systemd yet
19:05 jag3773 correct, centos does not use systemd
19:05 JoeJulian I suspect he grabbed the fedora .repo
19:05 jag3773 linked to wrong repo
19:05 jag3773 baseurl=http://download.gluster.org/pub/gluster/glusterfs​/3.3/3.3.1/EPEL.repo/epel-$releasever/$basearch/
19:05 glusterbot <http://goo.gl/gIzd6> (at download.gluster.org)
19:05 jag3773 that's what I'm using
19:05 msgq joined #gluster
19:05 jag3773 wdilly, kkeithley, http://download.gluster.org/pub/gluster/gl​usterfs/LATEST/CentOS/glusterfs-epel.repo
19:05 glusterbot <http://goo.gl/aFqkd> (at download.gluster.org)
19:06 kkeithley yeah, even that should not have systemd-units
19:06 semiosis hey kkeithley's back!
19:06 kkeithley did I leave?
19:06 kkeithley ;-)
19:06 wdilly using this .rpm glusterfs-3.3.1-1.el7.x86_64.rpm
19:07 jag3773 you need el6 wdilly
19:07 kkeithley el7
19:07 jag3773 el7 isn't released ?
19:07 kkeithley No
19:07 JoeJulian ajna: Best to just explain. My mind-reading skills have not yet been caffeinated today.
19:07 semiosis kkeithley: Bug 895656
19:07 kkeithley you have an CentOS 7 box?
19:07 glusterbot Bug http://goo.gl/ZNs3J unspecified, unspecified, ---, csaba, NEW , geo-replication problem (debian) [resource:194:logerr] Popen: ssh> bash: /usr/local/libexec/glusterfs/gsyncd: No such file or directory
19:07 ajna JoeJulian: Funny, I see myself saying the same to my boss very often.
19:07 semiosis kkeithley: i suspect we will need to modify our debian & ubuntu packages to create a symlink otherwise looks like geo-rep is broken in our packages :(
19:07 wdilly gotcha, i dl'd it from: ttp://download.gluster.org/pub/gluster/g​lusterfs/3.3/3.3.1/CentOS/epel-7/x86_64
19:08 wdilly so just switcht hat up for epel-6
19:08 wdilly ?
19:08 semiosis kkeithley: x4rlos was going to test to verify that solution works
19:08 semiosis x4rlos: ping... any progress on that?
19:08 jag3773 just use this wdilly: http://download.gluster.org/pub/gluster/glu​sterfs/3.3/3.3.1/CentOS/glusterfs-epel.repo
19:08 glusterbot <http://goo.gl/Y70il> (at download.gluster.org)
19:08 jag3773 that will autodetect your release version wdilly
19:08 kkeithley 895696:      Send out RHEL-3 ELS 1-Year EOL notice  ???
19:08 kkeithley oops
19:09 wdilly awesome, thanks jag3773
19:09 wdilly will give it a go and let you know how it goes
19:09 jag3773 yep, you might want to do `yum clean metadata` and then yum install... wdilly
19:09 jag3773 after modifying that repo file
19:10 jag3773 JoeJulian, I'm guessing I'll just have to setup a test environment and see if i can duplicate my Peer Rejected issue
19:10 semiosis kkeithley: https://bugzilla.redhat.com/show_bug.cgi?id=895656
19:10 glusterbot <http://goo.gl/ZNs3J> (at bugzilla.redhat.com)
19:10 glusterbot Bug 895656: unspecified, unspecified, ---, csaba, NEW , geo-replication problem (debian) [resource:194:logerr] Popen: ssh> bash: /usr/local/libexec/glusterfs/gsyncd: No such file or directory
19:10 JoeJulian jag3773: Check /var/lib/glusterd/vols/$vol/cksum on all your servers. If they differ between server1 and server2, that'll be the problem.
19:11 semiosis x4rlos: i see your latest comment confirming symlink solves the problem.  thanks for verifying that! :)
19:11 JoeJulian jag3773: Where the first two servers formerly 3.3.0?
19:11 jag3773 I believe so JoeJulian, let me check the yumm log
19:12 jag3773 formally 3.2 JoeJulian
19:12 jag3773 s/formally/formerly/
19:12 semiosis kkeithley: i'll work out the solution for the deb/ubu packages to make the symlink, it should be real easy, then provide the fix to you.  i believe it will be adding a new one-line file, debian/glusterfs-server.links and repackaging... but will confirm
19:12 glusterbot jag3773: Error: I couldn't find a message matching that criteria in my history of 1000 messages.
19:12 * semiosis lunch
19:12 * JoeJulian raises an eyebrow at glusterbot...
19:12 kkeithley ugh, you mean I have to try to remember what I did?
19:13 JoeJulian semiosis: There's a config file... let me check...
19:14 jag3773 JoeJulian, cksum is same on both of the existing nodes
19:14 semiosis kkeithley: as long as you still have the gpg key you used to sign the packages i can help jog your memory on the build process
19:14 semiosis bbiab
19:15 JoeJulian just rsync the damned vols directory to the new one, restart glusterd and see if that solves at least that problem. I'll see if I can duplicat that too.
19:16 jag3773 okay, thank JoeJulian, it may be a few days, but i'll see what I can come up with in a test environment
19:16 JoeJulian semiosis, kkeithley: the path is in /var/lib/gluserd/geo-replication/gsyncd.conf as remote_gsyncd.
19:16 ajna My issue: I have two peers. n0 y n1, both have no restrictions (firewall) between and belong to the same private network. I can `peer probe` one from the other and create a volume correctly. In a single peer or replicated in both. The problems start when I `start` the volume. I keep seeing “reading from socket failed. Error (Transport endpoint is not connected)” error in the log of both nodes, apparently randomly showing any of both peers IPs upon any act
19:17 nueces joined #gluster
19:17 semiosis JoeJulian: good to know, will look at that as an alternative & do whichever makes the package more maintainable
19:18 JoeJulian semiosis: Here's what's in the RPM /var/lib/glusterd/geo-replication
19:18 JoeJulian gah
19:18 JoeJulian http://fpaste.org/ZaCh/
19:19 ajna and, mounting doesn’t work also.
19:20 ajna after a `mount.glusterfs peer:brick dir` a df would display: df: `/mnt/tmp': Transport endpoint is not connected
19:25 JoeJulian ajna: The first thing's probably normal. Check the client log to see what's happening there though.
19:25 ajna JoeJulian, which log specifically?
19:26 JoeJulian Assuming the mount point /mnt/tmp it's /var/log/glusterfs/mnt-tmp.log
19:28 ajna E [socket.c:1685:socket_connect_finish] 0-glusterfs: connection to  failed (Connection timed out)
19:28 ajna E [glusterfsd-mgmt.c:740:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: Transport endpoint is not connected
19:28 ajna W [glusterfsd.c:727:cleanup_and_exit] (-->/usr/lib/libgfrpc.so.0(​rpc_transport_notify+0x3d) [0xeff7cd] (-->/usr/lib/libgfrpc.so.0(rpc_clnt_notify+0xc4) [0xf05704] (-->/usr/sbin/glusterfs() [0x804f590]))) 0-: received signum (1), shutting down
19:28 ndevos ajna: are you mounting on a client that contacts the storage servers over a different network/IP than the peer-probe was done?
19:29 ajna nop, one client is a peer
19:29 JoeJulian ~pasteinfo | ajna
19:29 glusterbot ajna: Please paste the output of "gluster volume info" to http://fpaste.org or http://dpaste.org then paste the link that's generated here.
19:30 sashko joined #gluster
19:30 ajna Looks simpler than pastebin, thx.
19:31 sashko joined #gluster
19:32 JoeJulian Also has the advantage of having a command line utility if you're really lazy like me.
19:32 ctria joined #gluster
19:33 ndevos you're not the only one ;)
19:33 chirino joined #gluster
19:34 ajna it exists for vim! aw my life feels better now.
19:35 sashko hey guys, this is not gluster related but I figured the great minds here might be able to shed some light on an interesting problem i'm having: got a bunch of supermicro servers and seagate and hitachi sas drives. The seagates are awfully slow to write, but normal to read, and the hitachis are good in terms of write and read performance. Has anyone had a similar issue? I also tried older seagate sas drives with same result,
19:35 sashko doesn't seem related to a certain firmware or production date
19:35 * JoeJulian isn't at all happy with Hitachi right now.
19:36 sashko damn, why?
19:36 ekuric joined #gluster
19:36 JoeJulian We're sending back about 1/3 of our Hitachi HTS547575A9E384 and they keep replacing them with just as defective drives.
19:37 sashko that sucks
19:37 sashko i guess everyone is having issues
19:37 sashko there's only few left who make 15k rpm drives
19:37 ajna Well JoeJulian, ndevos, I mount on one of the two nodes, any, and the log always outputs: http://dpaste.org/rXh1s/
19:37 glusterbot Title: dpaste.de: Snippet #216829 (at dpaste.org)
19:38 sashko seagate and hitachi, the rest is shifting to 10k
19:38 * semiosis didnt trust them when they were made by ibm, doesnt trust them now that they're made by hitachi either
19:38 elyograg the 4tb drives we got are WD raid edition SAS.
19:39 andreask joined #gluster
19:42 ajna Maybe a `missing 'option transport-type'. defaulting to "socket"` has to to with it but I am really lost.
19:43 ndevos ajna: no, socket should be fine, unless you're using RDAM
19:43 ndevos *RDMA
19:43 ndevos ajna: are you mounting as root, or as an unpriviledged user?
19:43 ajna ndevos: everything as root
19:44 ndevos ajna: the connection is terminited pretty early, you may find some details in the /var/log/glusterfs/etc-gluster...log
19:44 daMaestro kkeithley, totally okay with you owning the namespace ;-)
19:45 ajna ndevos: as I mount or at server startup?
19:46 ndevos ajna: just after mounting. the glusterd service should log something on an incoming mount request
19:47 tryggvil joined #gluster
19:49 ajna well, absolutely nothing is output if I mount from a machine that is not a node, but...
19:50 gbrand_ joined #gluster
19:51 ajna ndevos: if I try to mount from n0 (node0) for example, I get this output http://dpaste.org/UUJdo/ (tried to mount three times (10.11.23.111 is the ip of n0))
19:51 glusterbot Title: dpaste.de: Snippet #216831 (at dpaste.org)
19:51 jag3773 while I'm on, I thought I'd ask to see if anyone has had success using Gluster 3.3.1 as a filesystem for qm/KVM virtual machines, I'm testing this setup on Proxmox 2.2 (debian 64, 6.06), but without too much success
19:53 ndevos ajna: hmm, that does not help a lot :-/ maybe you need to enable more verbose logging -> start glusterd with --log-level=DEBUG
19:55 sashko semiosis: so you hating on hitachi too? :)
19:56 semiosis had too many ibm/hitachi hdds fail to use them anymore
19:56 semiosis though always ide, never sas/scsi
19:57 ajna ndevos: now this is the output of trying to mount three times, in the same node0, with a debugging log-level: http://dpaste.org/0KKwa/
19:57 glusterbot Title: dpaste.de: Snippet #216833 (at dpaste.org)
19:59 ndevos ajna: looks like it can not find the volume... are you sure there is no typo in the volume name?
20:03 ajna ndevos... well, damn it, thanks for enlightening me
20:03 ajna I followed a tutorial that told me I had to use the route to the brick, not the volume name
20:03 ajna now it works lol
20:04 ndevos :)
20:04 ajna well, it works, though I still see “reading from socket failed. Error (Transport endpoint is not connected)” for almost every action I do.
20:06 JoeJulian ajna: What's the link to the broken tutorial?
20:08 ajna Will have to find it when I am back home
20:08 semiosis JoeJulian: wouldnt be surprised if its howtoforge
20:08 semiosis though tbh could as easily be the gluster.org quickstart
20:09 semiosis ajna: either of those names ring a bell?
20:22 JoeJulian howtoforge's experts are almost as bad as the linkedin "linux experts" group.
20:23 JoeJulian I've seen better advice given to elderly couples from the sales reps at Best Buy.
20:25 ajna uhm, semiosis, I am pretty sure it was someone’s blog.
20:26 ajna I am not surprised that you are not surprised. :P
20:27 copec JoeJulian, semiosis, I was using 9k mtu with my Solaris 11.1 Bricks and Ubuntu clients, I checked and rechecked the mtu size on everything, but after using a 1.5k mtu it is working fine
20:27 johnmark JoeJulian: seriously. I syndicate their gluster-specific content, although I've thought about not doing that anymore
20:27 semiosis copec: great find!
20:28 johnmark semiosis: are you finding many errors in the quickstart?
20:28 copec Ubuntu bricks and clients work fine on this network.  I think it is some compound bug that I would like to figure out exactly where it is at
20:28 semiosis johnmark: +1 to not doing it anymore
20:28 johnmark semiosis: ok
20:28 johnmark copec: wow. that's an interesting find.
20:28 ajna JoeJulian, semiosis: https://raymii.org/s/tutorials​/Gluster-webroot-cluster.html
20:28 glusterbot <http://goo.gl/xc5fS> (at raymii.org)
20:28 semiosis johnmark: just this... http://www.gluster.org/community/doc​umentation/index.php/Talk:QuickStart
20:28 glusterbot <http://goo.gl/lKYmt> (at www.gluster.org)
20:28 johnmark ajna: oh, I've seen that one. I think it's rather old
20:28 johnmark semiosis: ah, ok
20:29 semiosis johnmark: no mention of iptables & selinux, that tripped up at least one person who brought it up
20:29 copec It's amazing how much slower 1.5k mtu is
20:29 copec It's like half the speed
20:29 semiosis wow
20:30 copec I'm going to try some totally different hardware and see what I find out
20:31 johnmark semiosis: I bet. Those two things trip up countless people on many things
20:33 RicardoSSP joined #gluster
20:34 kkeithley daMaestro: okay, good
20:35 kkeithley semiosis: yes, I still have my pgp key. It hasn't wandered off AFAIK.
20:36 gbrand_ joined #gluster
20:43 plarsen joined #gluster
20:45 edward1 joined #gluster
20:46 rwheeler joined #gluster
20:47 edward1 joined #gluster
20:49 dberry going from 3.0.2 to 3.3, do I need to run  the upgrade option on the volume files?
20:50 edward2 joined #gluster
20:51 kkeithley JoeJulian, semiosis: WRT bz 895656. Can I ask why the fix isn't to patch xlators/mgmt/glusterd/src/glusterd.c, i.e. s%/usr/local/libexec/glusterfs/%GSYNCD_PREFIX%.
20:52 kkeithley IOW I have a tough time believing this is somehow not a problem on RHEL/Fedora/CentOS.
20:52 semiosis bug 895656
20:52 glusterbot Bug http://goo.gl/ZNs3J unspecified, unspecified, ---, csaba, NEW , geo-replication problem (debian) [resource:194:logerr] Popen: ssh> bash: /usr/local/libexec/glusterfs/gsyncd: No such file or directory
20:54 kkeithley Yes?
20:54 kkeithley No?
20:54 kkeithley Maybe?
20:55 JoeJulian 1 sec
20:55 JoeJulian is bugzilla only freakishly slow today for me?
20:55 kkeithley strings - /usr/lib64/glusterfs/3/3.1/x​lator/mgmt/glusterd.so.0.0.0
20:55 kkeithley ...
20:55 kkeithley /usr/local/libexec/glusterfs/gsyncd
20:56 kkeithley ...
20:56 JoeJulian bg 764623
20:56 JoeJulian bug 764623
20:56 glusterbot Bug http://goo.gl/M60T6 medium, medium, ---, csaba, CLOSED NOTABUG, Avoid hardcoding libexecdir
20:59 JoeJulian Odd that shows as reported by hagarth.. I could have sworn it was my bug.
20:59 semiosis old bug 2947
20:59 glusterbot Bug http://goo.gl/zhqMt high, high, ---, jrb, CLOSED WORKSFORME, gnomecc segfault
20:59 JoeJulian one word
20:59 JoeJulian oldbug 2947
20:59 glusterbot Bug http://goo.gl/iRHLY low, medium, ---, amarts, CLOSED DUPLICATE, do not tamper with libexecdir in rpm spec
20:59 semiosis JoeJulian: thx
21:01 andreask joined #gluster
21:01 kkeithley Then is there some other invocation of /usr/local/libexec/glusterfs/gsyncd somehow in ubuntu/debian? Because I sure don't find it in the 3.3.1 source.
21:04 chirino joined #gluster
21:08 semiosis https://github.com/gluster/glusterfs/blob/releas​e-3.3/xlators/mgmt/glusterd/src/glusterd.c#L465
21:09 glusterbot <http://goo.gl/ERQTd> (at github.com)
21:09 semiosis found it
21:10 kkeithley old bz 2947 refers to a convenience symlink but neither the 3.3.1 glusterfs.spec.in nor the git/HEAD glusterfs.spec.in have such a symlink (i.e. /usr/local/libexec/glusterfs/gsyncd -> usr/libexec/...)
21:10 ndevos ew!
21:13 semiosis kkeithley: i'm seeing it in 3.2.x specfiles
21:13 kkeithley and I don't understand why xlator/mgmt/glusterd/src/glusterd.c first invokes gsyncd twice: first as  GSYNCD_PRE/gsyncd and then as /usr/local/libexec/glusterfs/gsyncd
21:13 semiosis but not seeing it in any of my debian packages going back a long time :(
21:14 kkeithley s/first//
21:14 glusterbot What kkeithley meant to say was: and I don't understand why xlator/mgmt/glusterd/src/glusterd.c  invokes gsyncd twice: first as  GSYNCD_PRE/gsyncd and then as /usr/local/libexec/glusterfs/gsyncd
21:19 wdilly is this still relevant/comprehensive for which ports need to be opened up via iptables? http://gluster.org/community/documentation/in​dex.php/Gluster_3.1:_Installing_GlusterFS_on_​Red_Hat_Package_Manager_(RPM)_Distributions
21:19 glusterbot <http://goo.gl/hn5V0> (at gluster.org)
21:19 semiosis ~ports | wdilly
21:19 glusterbot wdilly: glusterd's management port is 24007/tcp and 24008/tcp if you use rdma. Bricks (glusterfsd) use 24009 & up. (Deleted volumes do not reset this counter.) Additionally it will listen on 38465-38467/tcp for nfs, also 38468 for NLM since 3.3.0. NFS also depends on rpcbind/portmap on port 111.
21:21 kkeithley /usr/local/libexec is not in the Fedora/Koji 3.2.7 glusterfs.spec either
21:22 ndevos good, /usr/local is prohibited for rpms :)
21:23 semiosis https://github.com/gluster/glusterfs/b​lob/release-3.2/glusterfs.spec.in#L149
21:23 glusterbot <http://goo.gl/E8cjp> (at github.com)
21:25 kkeithley ndevos: right
21:25 kkeithley s/right/correct
21:25 kkeithley s/right/correct/
21:25 glusterbot What kkeithley meant to say was: ndevos: correct
21:25 semiosis affirmative
21:25 ndevos roger?
21:25 kkeithley 10-4
21:28 kkeithley We should revisit the old bugs and decide whether xlator/mgmt/glusterd/src/glusterd.c should still have that hard-coded /usr/local/libexec/glusterfs/gsyncd. Yes?
21:29 ndevos drop it, use GSYNCD_PRE/gsyncd
21:34 y4m4 joined #gluster
21:35 elyograg I know I asked this question once before, but I can't remember the answer.  If you want to provide gluster nfs access, do you need a bunch of memory on the nfs server, or just the brick servers?  What I would do is set up a gluster peer with no bricks to serve NFS/Samba.
21:37 elyograg two, actually -- IP failover.
21:38 glusterbot New news from resolvedglusterbugs: [Bug 764679] do not tamper with libexecdir in rpm spec <http://goo.gl/iRHLY> || [Bug 764623] Avoid hardcoding libexecdir <http://goo.gl/M60T6>
21:38 ajna JoeJulian, ndevos, must go home now. Thanks a lot for your assistance.
21:39 ndevos cya, ajna!
21:42 ajna left #gluster
21:46 plarsen joined #gluster
21:53 JoeJulian elyograg: glusterfs servers /are/ nfs servers. So they're going to try to allocate the client cache.
21:55 elyograg JoeJulian: ok, so if the machines providing NFS are not the machines with the bricks, then I need loads of memory on both types?
21:55 elyograg I've got 24GB on each brick server and I am trying to size the network access servers.
22:06 rhys joined #gluster
22:07 rhys i'm getting problems in gluster 3.3.1 with gluster NFS access hanging. 2 bricks, replicated. I use ucarp between them so my mount point is a shared HA IP address.
22:10 rhys gluster volume proxstore top returns "volume top unsuccessful"
22:12 rhys worse, this happened a bit ago, nothing seems to be in the logs
22:16 JoeJulian gluster volume status maybe?
22:16 rhys the main server i get "operation failed"
22:17 JoeJulian Go ahead and try another one. There is no "main server" so another one might tell you something useful.
22:17 rhys the other shows a NFS server and Self-Heal daemon on localhost and the other machine
22:18 rhys and now gluster volume status works.
22:19 JoeJulian If I were to guess, I'd guess that the privileged ports were all used up when you tried.
22:19 rhys ....like.. 0-1024 were used up? no.
22:19 JoeJulian How can you tell that they weren't?
22:20 rhys because I've been running a lot of things, netstat -tlpnu being one of them
22:20 rhys and this machine doesn't do anything but serve gluster
22:20 JoeJulian What about without the -l ?
22:21 rhys now i have logs back. nfs.log shows DNS resolution errors like mad.
22:21 JoeJulian You could easily have a bunch in FIN_WAIT
22:21 rhys hrm. [name.c:243:af_inet_client_get_remote_sockaddr] 0-proxstore-client-0: DNS resolution failed on host glusterfs2
22:23 JoeJulian Which would cause it to try again and again. That would probably cause it to query glusterd for the volume info over and over each time grabbing a privileged port and each time leaving them in FIN_WAIT for the timeout period.
22:23 JoeJulian I just encountered that two days ago which is why I thought of it.
22:23 rhys innnteresting. so gluster2 refers to gluster1 by its IP address, gluster 1 refers to gluster2 via DNS name.
22:24 rhys but i have both in /etc/hosts to stop any DNS problems
22:24 rhys W [rpcsvc.c:523:rpcsvc_handle_rpc_call] 0-rpcsvc: failed to queue error reply
22:28 al joined #gluster
22:30 greylurk joined #gluster
22:32 al joined #gluster
22:33 raven-np joined #gluster
22:41 JoeJulian ~hostnames | rhys
22:41 glusterbot rhys: Hostnames can be used instead of IPs for server (peer) addresses. To update an existing peer's address from IP to hostname, just probe it by name from any other peer. When creating a new pool, probe all other servers by name from the first, then probe the first by name from just one of the others.
22:43 JoeJulian elyograg: I have 16gig on my servers and 60 bricks. I limit the cache size to make them all fit. You should be able to do something similar.
22:43 rhys JoeJulian, damn. i wish you were about 5 minutes earlier with that. i'm in the process of recreating the volume.
22:43 rhys i'll be done shortly, but that would have been easier. :P
22:52 nueces joined #gluster
22:55 Maxzrz left #gluster
22:56 elyograg JoeJulian: thanks for that info.  it doesn't really answer my question, though.  if i understand the thing you said first, memory beyond minimum will help for serving NFS even when there are no bricks on that server.  If memory isn't critical, then I'd do something like 4GB, otherwise I'd probably match the 24GB that's in the brick servers.
22:58 JoeJulian Oh, I thought your concern was the servers. If you have a server that has no bricks, I wouldn't expect the nfs server to use any more memory than a client...
22:58 JoeJulian That said, let me check something...
23:00 elyograg my primary reason for using different servers for NFS is that I don't want problems with those servers to affect bricks.  A secondary reason is Samba - the version of Samba that's on CentOS is ancient, so I'd put Fedora on those servers, but I don't want to run Fedora on the bricks.
23:01 JoeJulian Ok, just checked. The nfs vol file doesn't include any caching so it /should/ be pretty slim.
23:01 elyograg ok.  the boss will be glad to hear that. :)
23:02 elyograg so up the CPU a couple of steps, don't worry too much about RAM.
23:02 erik_ joined #gluster
23:02 JoeJulian Which makes me wonder if there's a memory leak 'cause I don't remember that being the case (I disable nfs on my volumes so I wouln't normally notice).
23:05 al joined #gluster
23:10 JoeJulian Nah, it looks okay in 3.3.1
23:12 al joined #gluster
23:12 JoeJulian Looks like about 50 per volume for me. I assume that's somewhat related to which translators and how many bricks. These are 12 brick replica 3 volumes.
23:12 JoeJulian 50Mb
23:35 rhys JoeJulian, so its happening again, and i'm definitely not out of ports. volume status goes between failed and working
23:37 JoeJulian rhys: gluster volume status still says ""operation failed"?
23:38 rhys JoeJulian, it goes between operation failed, cannot get list of volume names, and working
23:38 JoeJulian check the cli.log and etc-glusterfs-glusterd.vol.log for clues.
23:39 rhys cli.log http://dpaste.org/fA6T8/
23:39 glusterbot Title: dpaste.de: Snippet #216854 (at dpaste.org)
23:40 rhys the other log file. http://dpaste.org/5FQWQ/
23:40 glusterbot Title: dpaste.de: Snippet #216855 (at dpaste.org)
23:43 rhys nothing wrong at the network layer that i see.
23:47 JoeJulian You're not nfs mounting from localhost, right?
23:47 rhys nfs mounting from localhost?
23:47 rhys I'm using UCARP as the mount point to provide HA
23:47 JoeJulian Yeah, I didn't think so, but just making sure.
23:48 rhys so peer1 is .10, peer2 is .11, and whichever one is "master" is .20. And then the mount point is .20:/volume
23:48 rhys found this though, seems related? http://gluster.org/pipermail/glust​er-users/2012-October/034563.html CARP breaking things
23:48 glusterbot <http://goo.gl/kd2Vv> (at gluster.org)
23:49 rhys is there another way to provide a single mount point (host/ip) via NFS? if i can remove CARP for debugging purposes it might make finding this easier.
23:49 JoeJulian Might be worth at least testing, though you're not reporting crashes or coredumps.
23:50 rhys glusterfs-server is definitely running on both
23:50 rhys not crashing
23:50 rhys another point, everything works just perfectly until I start writing a lot of files.
23:51 rhys on this NFS mount are qcow images for virtual machines. This error started because I was writing 100 MB of very small files to one of the drives.
23:51 JoeJulian Maybe try a deadline scheduler?
23:51 rhys already did that.
23:52 rhys saw the recommendation on the LSI 9261 raid controllers I have.
23:52 JoeJulian Which kernel version?
23:52 elyograg does ucarp only require two machines for perfet operation? to make everything work right, pacemaker wants three - one is a standby node.
23:52 rhys 2.6.32-17
23:52 JoeJulian @thp
23:52 glusterbot JoeJulian: There's an issue with khugepaged and it's interaction with userspace filesystems. Try echo never> /sys/kernel/mm/redhat_transparent_hugepage/enabled . See http://goo.gl/WUNSx for more information.
23:52 rhys elyograg, ucarp only needs 2. ucarp is essentially VRRP.
23:53 JoeJulian It's been a while since I've looked at this so I'm refreshing my memory.
23:53 JoeJulian oldbug 3232
23:53 glusterbot An error has occurred and has been logged. Check the logs for more informations.
23:53 elyograg rhys: thanks. that would simplify my rollout greatly and make it easier to get hardware purchases past the CFO. ;)
23:53 JoeJulian pfft... bugzilla seems to be down.
23:54 rhys elyograg, do you run debian?
23:55 JoeJulian ding ding!
23:55 JoeJulian rhys: Try that. Your kernel does seem to fall in the right timeframe.
23:56 rhys elyograg, http://dpaste.org/7Fv8X/
23:56 glusterbot Title: dpaste.de: Snippet #216859 (at dpaste.org)
23:56 rhys JoeJulian, that echo?
23:56 JoeJulian yes
23:56 elyograg rhys: using centos 6.3 for bricks, will be using fedora 18 for network access.  I'm familiar with debian, I use it for my home stuff.
23:57 rhys elyograg, i only say because they added ucarp support to the debian interfaces file. makes it really easy
23:57 JoeJulian Oh, wait... is this debian?
23:57 rhys JoeJulian, proxmox specifically, but yeah. debian 6
23:57 elyograg i really really like /etc/network/interfaces in debian.  redhat's ifcfg-* stuff feels very clunky.
23:59 JoeJulian Hmm, Looks like we decided /sys/kernel/mm/transparent_hugepage/enabled should be madvise on debian.
23:59 raven-np joined #gluster

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary