Perl 6 - the future is here, just unevenly distributed

IRC log for #gluster, 2014-02-26

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:11 cjanbanan joined #gluster
00:21 badone_ joined #gluster
00:27 vpshastry joined #gluster
00:33 tokik joined #gluster
00:34 elyograg if anyone is still following my problem, I have brick processes on my servers that are showing constant CPU usage even though nothing is happening on the volume.  Nothing in the brick logs.  strace output from one of these processes for a few seconds: https://www.dropbox.com/s/1wp44ez​0t871jbe/brick-process-strace.txt
00:34 glusterbot Title: Dropbox - brick-process-strace.txt (at www.dropbox.com)
00:35 elyograg one of the servers also has the self-heal daemon process showing CPU usage.  system load is zero at this point on both servers, even with these processes that have between 60 and 85% CPU usage.  At least it's not 399% right now.
00:39 elyograg just now, I killed the self heal daemon process.  CPU usage dropped to zero on all gluster processes instantly.
00:41 elyograg did I already mention that I cannot get 'gluster volume heal mdfs2 info' to return?  After I run this command, I cannot get any other gluster command to work either, says there's already a command running.  I have to kill all gluster processes on the machine before anything else works.
00:45 vpshastry joined #gluster
00:48 jag3773 joined #gluster
00:49 calum_ joined #gluster
00:51 maksim_ joined #gluster
00:52 cjanbanan joined #gluster
01:02 rfortier1 joined #gluster
01:12 JoeJulian elyograg: "all gluster processes"? or just all glusterd processes?
01:14 * JoeJulian does his happy dance. His python script is directly downloading vol files from the servers via rpc call. :D
01:25 elyograg i stop glusterd and glusterfsd using the 'service' command and then 'killall glusterfs'
01:25 elyograg then just 'service glusterd restart'
01:26 JoeJulian And all of those steps are required, or is that just what you always do?
01:26 elyograg it's what I always do to ensure it's a clean start.  I think that restarting glusterd didn't work, but now I don't remember for sure.
01:27 JoeJulian ok
01:31 harish joined #gluster
01:43 kshlm joined #gluster
01:50 cjanbanan joined #gluster
01:57 shyam joined #gluster
02:03 cjanbanan joined #gluster
02:12 * johnmark_ checks in
02:12 johnmark_ did somebody ring?
02:17 Alex /20
02:27 gdubreui joined #gluster
02:31 ColPanik is there a good way to print out the running config of a gluster node beyond the 'gluster volume status' and 'gluster volume info'?  I'm looking for something that would print out a list of properties and values for things like "diagnostics.brick-log-level" and "cluster.quorum-count" that can be set in the CLI or vol files
02:32 harish joined #gluster
02:33 badone_ joined #gluster
02:33 rwheeler joined #gluster
02:36 cjanbanan joined #gluster
02:43 jporterfield joined #gluster
02:47 jiqiren i have 2 files that are "split-brain"
02:47 jiqiren having difficulty fixing this
02:48 jiqiren is there a url / blog i can read?
02:50 Alex What have you followed so far? (he says, just reading http://joejulian.name/blog/fixin​g-split-brain-with-glusterfs-33/ and http://geekshell.org/~pivo/​glusterfs-split-brain.html)
02:50 glusterbot Title: Fixing split-brain with GlusterFS 3.3 (at joejulian.name)
02:52 jiqiren @alex, i'm looking at the joejulian blog now, seems pretty complicated to jsut blast 1 of the files
02:53 jiqiren Alex: the geekshell.org one looks like what i want
02:54 aquagreen joined #gluster
02:58 jag3773 joined #gluster
03:00 jiqiren Alex: that worked
03:00 jiqiren thanks
03:03 cjanbanan joined #gluster
03:15 Alex \o/. I had a painful issue last week with mtime being out of sync between two files, still not sure how I got to that.
03:15 rfortier joined #gluster
03:17 overclk joined #gluster
03:32 seapasulli joined #gluster
03:33 cjanbanan joined #gluster
03:34 satheesh joined #gluster
03:39 shubhendu joined #gluster
03:40 kdhananjay joined #gluster
03:41 itisravi joined #gluster
03:56 shylesh joined #gluster
04:01 JoeJulian jiqiren: by the way, that's great when you have a couple hundred files, but when you have millions, that find command will take forever. The ,,(extended attribute) method will go much faster.
04:01 glusterbot jiqiren: I do not know about 'extended attribute', but I do know about these similar topics: 'extended attributes'
04:01 JoeJulian @meh
04:01 glusterbot JoeJulian: I'm not happy about it either
04:03 cjanbanan joined #gluster
04:05 gmcwhistler joined #gluster
04:08 Matthaeus joined #gluster
04:09 kanagaraj joined #gluster
04:09 hagarth joined #gluster
04:11 bennyturns joined #gluster
04:14 sahina joined #gluster
04:18 davinder joined #gluster
04:24 ColPanik can anyone tell me what "Number of Bricks: 0 x 4 = 2" means in a "gluster vol info" result?
04:28 ColPanik possibly related to this?  https://bugzilla.redhat.co​m/show_bug.cgi?id=1002556
04:28 glusterbot Bug 1002556: high, unspecified, ---, kparthas, POST , running add-brick then remove-brick, then restarting gluster leads to broken volume brick counts
04:32 shyam joined #gluster
04:36 SuperYeti joined #gluster
04:36 SuperYeti Hello
04:36 glusterbot SuperYeti: Despite the fact that friendly greetings are nice, please ask your question. Carefully identify your problem in such a way that when a volunteer has a few minutes, they can offer you a potential solution. These are volunteers, so be patient. Answers may come in a few minutes, or may take hours. If you're still in the channel, someone will eventually offer an answer.
04:39 SuperYeti I've just setup Gluster 3.2.5 on a pair of ubuntu servers in AWS. I'm having issues however, that when I reboot the fstab entry is not mounting. I'm seeing the following in the logs: http://pastebin.com/B6ZX1ket I found 1 suggestion about an article that shows how to correctly mount at boot time with Ubuntu that is supposed to be on the gluster forum, but when I try to view the link, it gives me a 404. Any suggestions?
04:39 glusterbot Please use http://fpaste.org or http://paste.ubuntu.com/ . pb has too many ads. Say @paste in channel for info about paste utils.
04:40 SuperYeti Sorry GlusterBot :) http://paste.ubuntu.com/6998028/
04:40 glusterbot Title: Ubuntu Pastebin (at paste.ubuntu.com)
04:42 aquagreen joined #gluster
04:42 prasanth joined #gluster
04:44 ndarshan joined #gluster
04:44 cjanbanan joined #gluster
04:46 bala joined #gluster
04:50 shubhendu joined #gluster
04:52 raghu joined #gluster
04:54 kanagaraj joined #gluster
04:55 sahina joined #gluster
04:55 saurabh joined #gluster
04:56 davinder joined #gluster
04:59 JoeJulian SuperYeti: Any particular reason you're using that old buggy version?
04:59 SuperYeti umm
04:59 rjoseph joined #gluster
04:59 JoeJulian @latest
04:59 glusterbot JoeJulian: The latest version is available at http://download.gluster.org/p​ub/gluster/glusterfs/LATEST/ . There is a .repo file for yum or see @ppa for ubuntu.
04:59 SuperYeti its the one in the repo's for Ubuntu 12.04?
04:59 JoeJulian @ppa
04:59 glusterbot JoeJulian: The official glusterfs packages for Ubuntu are available here: 3.4 stable: http://goo.gl/u33hy -- 3.5 QA: http://goo.gl/Odj95k -- introducing QEMU with GlusterFS 3.4 support: http://goo.gl/7I8WN4
05:00 SuperYeti sigh
05:00 SuperYeti lol
05:00 SuperYeti Do I have to change anything else, or can I just upgrade?
05:00 SuperYeti using the ppa
05:00 JoeJulian You can just upgrade.
05:00 SuperYeti great
05:00 SuperYeti I'll give it a shot
05:00 SuperYeti thank yuou
05:00 SuperYeti you*
05:00 JoeJulian You're welcome.
05:01 shyam joined #gluster
05:01 SuperYeti ok so stop all daemons and clients
05:01 SuperYeti then run the install?
05:01 SuperYeti upgrade?
05:01 JoeJulian yes
05:06 vimal joined #gluster
05:06 gdubreui joined #gluster
05:06 SuperYeti is it normal to have to do a apt-get dist-upgrade to upgrade to the package in the ppa?
05:06 tokik joined #gluster
05:06 ndarshan joined #gluster
05:06 JoeJulian I've only used a ppa once in 25 years. I'm not the expert there...
05:07 SuperYeti heh, np
05:07 jporterfield joined #gluster
05:07 gdubreui joined #gluster
05:07 JoeJulian "Back when I was a boy, we had to...."
05:07 ColPanik @SuperYeti, probably goes w/o saying, but don't forget to do the apt-get update after adding the PPA
05:07 SuperYeti yes I did
05:07 SuperYeti then did upgrade
05:08 SuperYeti and it said it was held back
05:08 SuperYeti did dist-upgrade and it upgraded
05:08 SuperYeti hmm, seems it lost all config on upgrade :(
05:09 * SuperYeti goes to reconfigure
05:09 JoeJulian wait
05:09 SuperYeti waiting
05:09 JoeJulian mv /etc/glusterfsd /var/lib/glusterfsd
05:09 SuperYeti on both nodes?
05:09 JoeJulian I forgot you ubuntu people do things the hard way... ;)
05:10 JoeJulian yes, all servers.
05:10 SuperYeti mv: cannot stat `/etc/glusterfsd': No such file or directory
05:10 JoeJulian gah, and I make it harder
05:10 JoeJulian glusterd not glusterfsd
05:10 SuperYeti lol ok
05:11 SuperYeti then restart?
05:11 JoeJulian yes
05:11 SuperYeti or is restarting gluster-server service sufficient?
05:12 JoeJulian service should be fine
05:12 SuperYeti ok
05:12 SuperYeti hmm
05:12 SuperYeti peer status and volume status sill showing nothing
05:13 SuperYeti "gluster peer status" and "gluster volume status"
05:13 SuperYeti "no peers present" and "no volumes present"
05:13 vpshastry joined #gluster
05:14 * JoeJulian grumbles
05:15 SuperYeti there is also an /etc/glusterfs
05:15 SuperYeti if it matters
05:15 JoeJulian Just to be sure I didn't lead you completely astray, you mv'd /etc/glusterd to /var/lib/glusterd
05:15 JoeJulian And yes, /etc/glusterfs stays
05:15 SuperYeti yes i did
05:15 SuperYeti and ok, confirmed
05:16 kdhananjay joined #gluster
05:16 JoeJulian I bet you have a /var/lib/glusterd/glusterd
05:16 SuperYeti yep lol
05:16 JoeJulian And I would stop gluster-server before you fix that.
05:17 cp0k_ joined #gluster
05:21 SuperYeti w00t
05:21 SuperYeti ok that worked :)
05:22 rastar joined #gluster
05:22 SuperYeti now, can I still use fohlderfs01:/sites /mnt/sites glusterfs defaults,_netdev,backupvolfile-server=fohlderfs02 0 0 to mount? or is there a better preferred method?
05:22 JoeJulian Should be good.
05:22 jporterfield joined #gluster
05:23 SuperYeti hmm
05:23 SuperYeti gluster volume status
05:23 SuperYeti shows different ports than I originally read about and specified in my firewall
05:23 SuperYeti what ports do i have to open for the client to connect to the gluster cluster?
05:25 lalatenduM joined #gluster
05:26 SuperYeti ya it's definitely not mounting when i rebooted the client heh
05:27 SuperYeti http://www.jamescoyle.net/how-t​o/457-glusterfs-firewall-rules are those accurate?
05:29 JoeJulian @ports
05:29 glusterbot JoeJulian: glusterd's management port is 24007/tcp and 24008/tcp if you use rdma. Bricks (glusterfsd) use 24009 & up for <3.4 and 49152 & up for 3.4. (Deleted volumes do not reset this counter.) Additionally it will listen on 38465-38467/tcp for nfs, also 38468 for NLM since 3.3.0. NFS also depends on rpcbind/portmap on port 111 and 2049 since 3.4.
05:32 SuperYeti thanks
05:34 aravindavk joined #gluster
05:36 SuperYeti well it mounts
05:36 SuperYeti lets see if it mounts on boot :)
05:36 JoeJulian If not, I blame upstart
05:37 SuperYeti hmm
05:37 SuperYeti it mounted
05:37 SuperYeti but not right at boot
05:37 SuperYeti i logged in
05:37 SuperYeti ran mount
05:37 SuperYeti nothing
05:37 SuperYeti checked log, ran mount, then i saw it
05:37 SuperYeti as a result apache failed to start
05:37 JoeJulian Are you logging in from console?
05:38 SuperYeti ssh
05:38 SuperYeti this is all in EC2
05:38 JoeJulian mmmkay...
05:38 SuperYeti but like I said, it did mount, but after a while
05:39 JoeJulian Can you be around during GMT-5 business hours?
05:39 SuperYeti yes
05:39 rfortier1 joined #gluster
05:39 SuperYeti oh wait, this is weird, apache seems to have started
05:39 SuperYeti let me reboot again to verify
05:39 SuperYeti i may have logged into ssh really early in the boot cycle or something
05:39 JoeJulian semiosis is our resident Ubuntu/EC2 expert. He would know a lot more about any issues you might have there.
05:39 SuperYeti ok cool
05:40 SuperYeti so that would be like ~7 hrs from now?
05:40 SuperYeti aka EST?
05:41 JoeJulian right
05:43 SuperYeti ok cool
05:43 JoeJulian but you may be right. upstart waits for specific things to happen before starting the next. It's entirely possible you could ssh in before it was done.
05:43 SuperYeti thanks very much, I appreciate all the help
05:43 SuperYeti ya i tried again, and waited a bit longer to ssh in
05:43 SuperYeti and everything seems to be running
05:43 SuperYeti have run into some other issue now, but that's our app/deploy not the cluster lol
05:44 JoeJulian :)
05:51 vpshastry joined #gluster
05:54 cjanbanan joined #gluster
05:58 johnmark huzzah :)
05:59 benjamin_____ joined #gluster
06:02 rjoseph1 joined #gluster
06:03 harish joined #gluster
06:06 SuperYeti think i fixed the other problem
06:06 lanning joined #gluster
06:06 SuperYeti damn developers not following instructions for autoconfig on deploy
06:06 SuperYeti lol
06:12 cjanbanan joined #gluster
06:14 elyograg new info for my troubles.  after I killed the self heal daemon on the second brick server for the second volume, I discovered that the heal info for that volume would return, but say that the self-heal daemon on that server was not running.  after I got gluster fully killed and restarted, the self-heal daemon was back doing its thing, and the heal info command now won't return.  what info do I need to provide to figure out what's gone wrong and how
06:16 JoeJulian elyograg: Sorry, which version again?
06:16 Philambdo joined #gluster
06:16 elyograg 3.4.2.  upgraded monday evening from 3.3.1, which was exhibiting the same problems.
06:17 elyograg well, except for the heal info command not returning.
06:22 elyograg my frustration level with gluster is high at the moment.  I had really hoped that upgrading would fix all our problems.  I know it will fix the really scary bugs with rebalancing, that was verified in testing.
06:24 elyograg i've now got xymon sending anything in the main logfiles that includes " E " ... seeing a lot of strange stuff on each server when I do the gluster stop/restart.
06:24 elyograg lots of: W [socket.c:514:__socket_rwv] 0-mdfs2-client-15: readv failed (No data available)
06:25 elyograg if warnings are all it gets, it won't alarm on them, but they're included when there are also errors.
06:25 shyam joined #gluster
06:25 elyograg lots of: E [glusterd-store.c:1858:glus​terd_store_retrieve_volume] 0-: Unknown key: brick-12
06:25 JoeJulian Ok, let's assume your problem is all in self-heal. If we wipe the self-heal queue, are you able to run a heal-full to ensure the heals all happen?
06:26 elyograg tell me how to do that.  i'll try it.
06:26 JoeJulian The unknown key thing... I don't have any idea why it happens, but it happens to everybody.
06:27 JoeJulian elyograg: remove from each brick everything under /mnt/brick1/.glusterfs/indices/xattrop
06:27 JoeJulian elyograg: That's where it stores the pending self-heals
06:28 JoeJulian elyograg: You're not firewalled, right?
06:28 JoeJulian The ports changed for the bricks.
06:28 JoeJulian @ports
06:28 glusterbot JoeJulian: glusterd's management port is 24007/tcp and 24008/tcp if you use rdma. Bricks (glusterfsd) use 24009 & up for <3.4 and 49152 & up for 3.4. (Deleted volumes do not reset this counter.) Additionally it will listen on 38465-38467/tcp for nfs, also 38468 for NLM since 3.3.0. NFS also depends on rpcbind/portmap on port 111 and 2049 since 3.4.
06:28 elyograg just verified that.  firewall disabled on all ports.
06:29 elyograg ooh, TONS of files in at least one of the xattrop directories.
06:31 elyograg 421073 to be precise.  not decreasing after a lot of seconds.
06:31 elyograg should I kill the self-heal daemon before clearing all that?
06:31 JoeJulian I get the impression that it's still loading them all which is why it's unresponsive.
06:32 JoeJulian Could probably verify with strace.
06:32 elyograg that would make sense.  strace just shows a lot of things that I can't even decipher.
06:32 elyograg https://dl.dropboxusercontent.com/u​/97770508/brick-process-strace.txt
06:33 elyograg that is from a brick trace, but visually the shd trace looks the same.
06:33 JoeJulian yeah
06:33 JoeJulian strace -e trace=file would limit the trace to file related function calls, which would probably show if it's walking that tree.
06:33 cjanbanan joined #gluster
06:34 JoeJulian but that's neither here nor there. You /can/ delete it, but that deletes the catalog of pending self-heals. The only way to trigger all the heals that are needed if you delete that is with a heal...full.
06:34 elyograg if i restrict it with that -e (tried before) i get no strace output.
06:35 elyograg my browser's freaking out.  fun.
06:36 JoeJulian That totally looks like a race condition to me. I wish avati were around to ask for info. He seems to be able to just "think" race conditions.
06:36 elyograg that's what I was thinking too.  ok, so I'll kill the daemon, wipe the xattrop files, then do a heal full.
06:37 elyograg before the heal full, I'll go ahead and kill/start gluster again.
06:37 JoeJulian +1
06:39 elyograg just noticed that someone is copying data onto the volume.  have to figure that out and see if I can get them to stop.
06:42 JoeJulian Gah! Too tired to finish writing this blog post. I'm rambling.
06:42 elyograg I want to go to sleep myself.  thank you for not doing so before now. :)
06:43 elyograg oh, good.  the 'guilty party' is awake and can stop it.
06:48 shyam joined #gluster
06:48 aravindavk joined #gluster
06:50 prasanth joined #gluster
06:56 elyograg ok, after wiping, heal info returns quickly, says no files.  heal full initiated.
06:57 JoeJulian The quick one-liner version of that blog article that I'll post tomorrow: split-brain heals are easier with https://github.com/joejulian/glusterfs-splitbrain
06:57 glusterbot Title: joejulian/glusterfs-splitbrain · GitHub (at github.com)
06:57 elyograg millions of files here, this could take a while.
06:58 vpshastry1 joined #gluster
06:59 Philambdo joined #gluster
07:01 cjanbanan joined #gluster
07:03 meghanam joined #gluster
07:03 meghanam_ joined #gluster
07:04 JoeJulian johnmark: The quick one-liner version of that blog article that I'll post tomorrow: split-brain heals are easier with https://github.com/joejulian/glusterfs-splitbrain
07:04 glusterbot Title: joejulian/glusterfs-splitbrain · GitHub (at github.com)
07:04 elyograg can a full heal proceed without the self-heal daemon running, or does it work through that daemon?  I think I'm having that race condition again.
07:05 JoeJulian It works through the daemon.
07:05 JoeJulian Is there at least something in the logs now?
07:06 JoeJulian You can safely kill the daemon and do the good old "find /$client_mount" to trigger the heal.
07:06 JoeJulian At least that'll get us both to sleep.
07:07 elyograg in the self-heal daemon log, I see it connecting, and then a bunch of "another crawl is in progress" messages.  nothing ongoing.
07:07 JoeJulian You can even "volume set $vol cluster.self-heal-daemon off" if you need to.
07:09 JoeJulian Alright... I need to go eat some pie, watch some episode of something with my wife and remind her that she's more important than this silly computer and then get some sleep.
07:09 elyograg i don't see any "-e trace=file" output on any of the processes that are active.
07:10 JoeJulian Oh!
07:10 JoeJulian "strace -f -e trace=file -p ..."
07:10 JoeJulian It's probably running in a fork.
07:11 elyograg still no output.
07:11 JoeJulian What's the problem again?
07:11 JoeJulian cpu out of control, wasn't it?
07:12 elyograg yes.  very high load avg.
07:12 JoeJulian But is it actually impeding anything?
07:13 JoeJulian The self-heals are run on a lower priority queue and shouldn't take anything away from the normal client operations.
07:16 haomaiwa_ joined #gluster
07:17 elyograg when I ran into the problem, it would grind operation to a halt.
07:18 elyograg ok, I disabled the self-heal daemon, which stopped the heal.  I re-enabled it, then started the full heal again.  it now seems to be working fine.  I see file opens/etc happening.
07:18 aravindavk joined #gluster
07:18 elyograg We'll see what happens in ten minutes (or less) when the daemon wakes up again.
07:20 elyograg hopefully everything is good, but if it isn't, a race condition seems likely.
07:25 cjanbanan joined #gluster
07:27 maksim_ joined #gluster
07:37 ekuric joined #gluster
07:37 jporterfield joined #gluster
07:39 merrittk joined #gluster
07:42 rgustafs joined #gluster
07:44 ndarshan joined #gluster
07:50 kanagaraj joined #gluster
07:50 aravindavk joined #gluster
07:51 rjoseph joined #gluster
07:56 ctria joined #gluster
08:00 cjanbanan joined #gluster
08:04 prasanth joined #gluster
08:07 eseyman joined #gluster
08:12 kdhananjay joined #gluster
08:14 sahina joined #gluster
08:14 aquagreen joined #gluster
08:16 cjanbanan joined #gluster
08:24 keytab joined #gluster
08:27 shubhendu joined #gluster
08:37 prasanth joined #gluster
08:39 fsimonce joined #gluster
08:42 kanagaraj_ joined #gluster
08:48 bala joined #gluster
08:48 aquagreen1 joined #gluster
08:48 micu joined #gluster
08:50 andreask joined #gluster
08:51 rjoseph joined #gluster
08:51 sahina joined #gluster
08:53 shylesh joined #gluster
08:53 shubhendu joined #gluster
08:53 ndarshan joined #gluster
08:54 rgustafs joined #gluster
09:05 liquidat joined #gluster
09:05 jtux joined #gluster
09:13 hagarth joined #gluster
09:35 rjoseph joined #gluster
09:38 ndarshan joined #gluster
09:42 Norky joined #gluster
09:54 vpshastry joined #gluster
09:55 prasanth joined #gluster
09:55 bala joined #gluster
09:56 stickyboy joined #gluster
09:58 prasanth_ joined #gluster
10:02 harish joined #gluster
10:05 shyam joined #gluster
10:06 shubhendu joined #gluster
10:06 sahina joined #gluster
10:09 itisravi joined #gluster
10:20 hagarth joined #gluster
10:27 Norky joined #gluster
10:28 johnmark @chanstats
10:28 * johnmark kicks glusterbot
10:28 hagarth @stats
10:28 glusterbot hagarth: I have 4 registered users with 1 registered hostmasks; 1 owner and 1 admin.
10:28 hagarth @channelstats
10:28 glusterbot hagarth: On #gluster there have been 262429 messages, containing 10703512 characters, 1773067 words, 6781 smileys, and 917 frowns; 1411 of those messages were ACTIONs. There have been 109442 joins, 3229 parts, 106356 quits, 24 kicks, 197 mode changes, and 7 topic changes. There are currently 189 users and the channel has peaked at 239 users.
10:28 johnmark ah, that's the one :)
10:28 hagarth yeah :)
10:28 johnmark heh
10:32 prasanth joined #gluster
10:39 jmarley joined #gluster
10:41 kanagaraj joined #gluster
10:43 rjoseph joined #gluster
10:48 harish joined #gluster
10:55 Slash joined #gluster
10:57 wica joined #gluster
10:58 masterzen joined #gluster
11:04 fidevo joined #gluster
11:06 RameshN joined #gluster
11:15 askb joined #gluster
11:39 bala joined #gluster
11:44 rjoseph joined #gluster
12:02 ppai joined #gluster
12:03 an_ joined #gluster
12:05 bazzer using georeplication to write to a distributed gluster mount point seems half as slow as when I write directly to a brick ( which I did as a test ) - is this a common ? or are there any tuning pointers I can turn to ?
12:11 andreask joined #gluster
12:21 sahina joined #gluster
12:22 kanagaraj joined #gluster
12:24 edward2 joined #gluster
12:37 stickyboy I am having weird issues with .Xauthority with a home which lives on a gluster mount...
12:37 diegows joined #gluster
12:38 stickyboy I'm still trying to figure out a pattern for this issue, but grasping at straws.
12:41 social kkeithley: do you have a minute as for questions on nfs vs fuse?
12:41 kkeithley ask away
12:43 kkeithley social: ^^^
12:43 social kkeithley: we have migrated some stuff off gluster to netapp nfs (don't ask me why it's business stuff) and we see superincreased iowait from nfs
12:43 edward2 joined #gluster
12:44 social does iowait even show up on gluster if it's under pressure? Thus I think we have trilion of iops and I would guess gluster fuse is capable of handling those better but I would love to know
12:45 kkeithley you switched the client side from gluster native to netapp nfs?
12:46 social We had upload volume which now is on netapp and mounted as nfs4 and compared to gluster fuse mount the iowait skyrocked
12:46 bala joined #gluster
12:46 social we had full gluster setup of 4 nodes delivering the bricks and clients used fuse mount before
12:46 kkeithley oh,
12:46 social I wonder if it's that gluster didn't even report iowait
12:46 kkeithley okay
12:47 social or it's that it really handles ton of iops better
12:47 edward2 joined #gluster
12:48 kkeithley Not sure. Off hand I don't know how iowait is instrumented by anything, let alone in gluster.
12:48 kkeithley hang on a sec, let me look at some things
12:49 ndevos social: in case you're using NFSv4, and the workload handles the same file from multiple servers, try turning off NFSv4 delegations
12:50 ndevos delegations are locks helt by the nfs-client, the server can recall them when an other client needs to access that same file - the recalling can take time and contributes to delays
12:53 social well I'd me more interested in finding out why gluster performed on massive iops better, it can also be just simle as that we had more cpu power on bricks than whole netapp has
12:55 ndevos gluster does not have (something like) delegations (yet), so it could be one if the contributing factors - NFS-server tuning might improve things
12:55 glusterbot New news from newglusterbugs: [Bug 1065659] Not able to reset network compression options from command line <https://bugzilla.redhat.co​m/show_bug.cgi?id=1065659>
12:58 sahina joined #gluster
13:01 benjamin_____ joined #gluster
13:02 kkeithley I don't believe we have any support for collecting system activity, e.g. sar, sadc, etc, built in. I certainly don't remember it. `gluster volume profile` stats are all we have AFAIK.
13:03 wica joined #gluster
13:05 kkeithley purpleidea: ping
13:06 kkeithley @later tell purpleidea ping
13:06 glusterbot kkeithley: The operation succeeded.
13:13 gmcwhistler joined #gluster
13:19 rastar joined #gluster
13:23 calum_ joined #gluster
13:39 vimal joined #gluster
13:43 jag3773 joined #gluster
14:01 swat30 joined #gluster
14:04 plarsen joined #gluster
14:06 japuzzo joined #gluster
14:07 bennyturns joined #gluster
14:07 sroy joined #gluster
14:11 jag3773 joined #gluster
14:15 plarsen joined #gluster
14:19 B21956 joined #gluster
14:23 psyl0n joined #gluster
14:23 jmarley joined #gluster
14:24 gmcwhistler joined #gluster
14:25 dewey joined #gluster
14:30 Norky joined #gluster
14:30 BC0001 joined #gluster
14:32 jag3773 joined #gluster
14:32 theron joined #gluster
14:32 BC0001 Is the host specified in the mount irrelevant after the first connection?  It appears as though it would be a SPOF but assume it's not.  Does it make any sense to make that host a dns failover host?
14:33 sahina joined #gluster
14:33 seapasulli joined #gluster
14:37 BC0001 "The server specified in the mount command is only used to fetch the gluster configuration
14:37 BC0001 volfile describing the volume name."  The trusty manual...
14:39 eseyman joined #gluster
14:39 kkeithley only used to fetch the configuration the first time you mount. After that the clients know about all the servers.
14:41 lman4821 joined #gluster
14:42 ikk joined #gluster
14:43 dbruhn joined #gluster
14:44 kkeithley That's (obviously) for native/fuse mounts. I believe people use rrdns for mounting gNFS vols.
14:46 lman4822 joined #gluster
14:51 BC0001 Yeah, I am using native.  Thanks
14:51 benjamin_____ joined #gluster
14:54 y4m4 joined #gluster
14:55 rwheeler joined #gluster
15:02 jobewan joined #gluster
15:09 jag3773 joined #gluster
15:09 haomaiwang joined #gluster
15:12 rpowell joined #gluster
15:13 daMaestro joined #gluster
15:13 partner hmm does enabling the io-debug translator (io-stats-dump) add much load (of any sort) for the server? i'd just like to get some more numbers out so that i could graph the usage a bit more, found one munin plugin that at least wants to do some setfattr
15:14 partner or should i perhaps just grep the numbers from say "gluster volume status all clients" ?
15:17 tokik joined #gluster
15:17 partner hmph, funny way of asking "where should i get some values for graphing purposes" :) just got stuck on some ancient plugin and was thinking it all wrong way
15:19 ndevos kkeithley, BC0001: there is also the backup-volfile-servers mount option, it accepts a list of hostnames separated by :
15:20 kkeithley ndevos: oh, good thing to know
15:21 kkeithley but BC0001 left 30 min ago. :-(
15:23 ndevos @later tell BC0001 For the mount SPOF, there is also the backup-volfile-servers mount option, it accepts a list of hostnames separated by :
15:23 glusterbot ndevos: The operation succeeded.
15:26 wushudoin joined #gluster
15:30 kaptk2 joined #gluster
15:31 elyograg I;m getting worried by the amount of memory used on a gluster system doing a full heal.  and I can't see what's using all the memory.  https://www.dropbox.com/s/y97gw0jj4qergw​1/screenshot-gluster-top-sort-memory.png
15:32 glusterbot Title: Dropbox - screenshot-gluster-top-sort-memory.png (at www.dropbox.com)
15:32 elyograg if this continues, the linux OOM killer is going to kill the daemon.
15:33 elyograg the situation is similar on the other server with bricks for this volume, but that self heal daemon is a lot smaller.
15:34 elyograg nearly all the RAM used but not accounted for in the process list.  The cached and buffers values are next to nothing, as you can see.
15:35 andreask tried "slabtop"?
15:35 elyograg swap is also steadily (but very slowly) increasing.
15:36 elyograg never seen that before.  how do I read it?
15:36 dbruhn elyograg, what version of gluster?
15:36 psyl0n joined #gluster
15:36 psyl0n joined #gluster
15:36 elyograg 4.3.2, CentOS 6.
15:36 elyograg 6.5 to be precise.
15:36 dbruhn you mean 3.3.2?
15:36 elyograg upgraded from 3.3.1 and an earlier centos on monday evening.
15:37 elyograg these systems have been 6.5 all along, though.
15:38 dbruhn Can you stop the heal and see if it clears the memory?
15:40 elyograg I need the heal to finish.  On JoeJulian's advice, I cleared the queue (the xattrop directories) last night (lots of files in there, causing many problems) and this is the full heal afterwards.  It's been running for around 8 hours already.  millions of files in the volume.
15:41 partner 3.3.2?
15:41 elyograg 3.4.2
15:41 elyograg oh, typo up there.  sorry.
15:42 partner ah ok, just interested as there is some memory leak on 3.3.2 for at least rebalancing, i can't move files around much more longer than day or two or the OOM comes and kills
15:42 elyograg sorted the slabtop by cache size.  the largest one is xfs_inode, at 7480612K
15:43 elyograg partner: We ran into that problem, as well as the problem where a not-responding brick causes the rebalance to go rogue and start trashing the filesystem.  one of the reasons I've upgraded.
15:43 sprachgenerator joined #gluster
15:44 elyograg no idea what made the brick not respond for more than 42 seconds, but somehow it happened.
15:44 hagarth joined #gluster
15:44 partner i'm planning on upgrading too once we get the client side done
15:51 lanning_ joined #gluster
15:55 elyograg any ideas on my memory usage?  I need to get on a train to work really soon, have to leave home to do that.
15:55 elyograg I'
15:55 elyograg I'll be able to read any backlog after I get to work.
15:58 elyograg strace shows that the self heal daemon is dealing with whats in .glusterfs now, last night was dealing with the actual data files on the bricks.
16:01 giannello joined #gluster
16:02 ndevos joined #gluster
16:02 ndevos joined #gluster
16:05 bugs_ joined #gluster
16:05 primechuck semiosis: Do you know of an issue of DIRECT_IO not working on ubuntu 12.04 in fuse?
16:13 ira_ joined #gluster
16:14 RayS joined #gluster
16:17 lmickh joined #gluster
16:20 jag3773 joined #gluster
16:20 elyograg should I be worried by a lot of this in the strace for bricks?    ELOOP (Too many levels of symbolic links)
16:21 zaitcev joined #gluster
16:22 hybrid512 joined #gluster
16:22 davinder joined #gluster
16:23 zerick joined #gluster
16:27 rwheeler_ joined #gluster
16:27 glusterbot New news from newglusterbugs: [Bug 1066778] Make AFR changelog attributes persistent and independent of brick position <https://bugzilla.redhat.co​m/show_bug.cgi?id=1066778> || [Bug 1066997] geo-rep package should be dependent on libxml2-devel package <https://bugzilla.redhat.co​m/show_bug.cgi?id=1066997>
16:32 wica joined #gluster
16:41 jbrooks left #gluster
16:44 seapasulli_ joined #gluster
16:49 psyl0n joined #gluster
16:50 jag3773 joined #gluster
16:54 jbrooks joined #gluster
16:57 T0aD joined #gluster
17:00 T0aD joined #gluster
17:01 vpshastry joined #gluster
17:02 T0aD joined #gluster
17:04 semiosis SuperYeti: i'm here now.  too much in the backlog for me to catch up.  if you want to fill me in on your present situation i'll try to help
17:04 T0aD joined #gluster
17:05 semiosis primechuck: not really
17:05 andreask left #gluster
17:06 JoeJulian bazzer: Bricks are only for glusterfs' use, not for writing directly.
17:06 semiosis primechuck: but istr something about fuse direct-io in the saucy kernel
17:06 semiosis primechuck: if you can run a test on saucy or trusty that might be helpful
17:06 semiosis specifically kernel 3.11 or newer
17:06 semiosis which saucy has
17:07 T0aD joined #gluster
17:07 JoeJulian sticky_afk: best I can offer with the information provided is that I don't have any issue with .Xauthority with my native glusterfs mounted home directories.
17:08 REdOG joined #gluster
17:09 semiosis JoeJulian: i totally trolled myself with gluster last night... i replicated the test setup from my laptop in a vagrant box, then i mounted the vagrant box volume and got the local/laptop one.  freakin' name resolution!
17:09 SuperYeti hey semiosis I think i got it figured out with JoeJulian's help
17:09 T0aD joined #gluster
17:09 semiosis SuperYeti: ok great
17:10 primechuck Yeah, it looks like there was in issue with fuse direct IO before kernel 3.4.  Jumping to the saucy kernel and seeing if that works.
17:10 semiosis cool
17:10 JoeJulian hehe
17:10 dusmant joined #gluster
17:11 JoeJulian elyograg: echo 3 >/proc/sys/vm/drop_caches
17:12 primechuck Migrating things from RHEL base to Ubuntu base and am having trouble wrapping my head around moving to newer version rather than having the backport :)
17:12 semiosis @kernel
17:12 glusterbot semiosis: I do not know about 'kernel', but I do know about these similar topics: 'kernel tuning'
17:12 semiosis ,,(kernel tuning)
17:12 glusterbot http://community.gluster.org/a/li​nux-kernel-tuning-for-glusterfs/
17:12 JoeJulian @later tell andreask slabtop's awesome! Thanks for introducing me to that.
17:12 glusterbot JoeJulian: The operation succeeded.
17:12 semiosis @forget kernel tuning
17:12 glusterbot semiosis: The operation succeeded.
17:12 semiosis @learn kernel tuning as https://www.gluster.org/community/docum​entation/index.php/Linux_Kernel_Tuning
17:12 glusterbot semiosis: The operation succeeded.
17:12 T0aD joined #gluster
17:13 JoeJulian @forget herring tuning
17:13 glusterbot JoeJulian: Error: There is no such factoid.
17:14 JoeJulian semiosis: : The quick one-liner version of that blog article that I'll post later today: split-brain heals are easier with https://github.com/joejulian/glusterfs-splitbrain
17:14 glusterbot Title: joejulian/glusterfs-splitbrain · GitHub (at github.com)
17:14 semiosis thought that kernel tuning page might help elyograg re: memory
17:14 JoeJulian Yeah, bummer that johnmark didn't get that all backed up... :P
17:15 T0aD joined #gluster
17:15 JoeJulian dammit... it's hard to throw him under the bus when I never know what timezone he's in. I prefer to do that when he's awake and responding.
17:16 semiosis got a vacation autoreply from him the other day, might be afk
17:17 kkeithley I believe he's in BLR. 9:30 PM (http://xkcd.com/1335/)
17:17 glusterbot Title: xkcd: Now (at xkcd.com)
17:17 semiosis wow JoeJulian this splitbrain thing looks sweet!
17:17 T0aD joined #gluster
17:18 JoeJulian I thought that one up the last time I encountered some. It's been months since I've had the time to try to build it.
17:18 JoeJulian Last time I just built the vol files by hand.
17:18 Gilbs joined #gluster
17:19 semiosis kkeithley: oh!
17:21 ndevos kkeithley: nice xkcd, but it doesnt come as a poster :-/
17:21 kkeithley when you can get a poster that updates I'll be a ready buyer. eInk maybe?
17:21 kkeithley updates like the xkcd web site
17:22 Gilbs What is the syntax to top force stop geo-replication?   I'm trying  "geo-replicaion my-volume root@x.x.x.x::geo-volume force stop"  All I'm getting is a geo-replication command failed.    (centos 6.5  //  gluster 3.4.2)
17:22 semiosis kkeithley: just being able to manually rotate the outer ring (with the times) would be enough for me
17:22 ndevos hey, cool, didnt notice that, but I'd need to have the GEO in it too somehowe
17:24 ndevos well, maybe not if 'now' is always on top, hmm
17:24 semiosis kkeithley: could make a clock using these graphics, just paste the outer ring to the hour hand
17:24 Matthaeus joined #gluster
17:24 semiosis or something like that
17:25 kkeithley http://www.snorgtees.com/you-look-funny-doing-that
17:25 glusterbot Title: You Look Funny Doing That With Your Head T-Shirt | SnorgTees (at www.snorgtees.com)
17:25 kkeithley nothing's ever perfect.
17:25 ndevos or, rather 'noon'
17:25 ndevos semiosis: yes, I'd like that, and have Europe on the top
17:27 semiosis cant find such a thing on amazon.  i'm surprised no one has invented this yet
17:32 Mo__ joined #gluster
17:37 nightwalk joined #gluster
17:41 ngoswami joined #gluster
17:45 rwheeler_ joined #gluster
17:51 psyl0n joined #gluster
17:57 cp0k Is there any harm in running Gluster 3.4.1 client with my storage nodes being 3.4.2?
17:58 calum_ joined #gluster
17:58 cp0k my clients have the glusterd init scripts displaying status "crashed" for some reason
17:58 semiosis cp0k: best practice is to use same exact version on all clients & servers.  upgrading advice is to upgrade servers *before* clients
17:58 vpshastry joined #gluster
17:59 semiosis cp0k: probably not going to be a problem, but you should try it on a test cluster before prod
17:59 aquagreen joined #gluster
18:05 aquagreen joined #gluster
18:08 cp0k semiosis: thanks, thats what I thought
18:08 cp0k I think this is part of the reason why glusterd is showing up as crashed on my clients:
18:09 cp0k [2014-02-26 18:04:55.290152] E [rpc-transport.c:253:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.4.2/rpc-transport/rdma.so: cannot open shared object file: No such file or directory
18:09 elyograg clearing the VM cache freed up 20GB of memory.  But why isn't it showing it in "cached" ?
18:09 semiosis pretty sure that's normal
18:09 cp0k despite the crashed status, I am still able to mount gluster via fuse
18:10 semiosis cp0k: pretty sure that's normal
18:10 cp0k thanks
18:11 bazzer Sure JoeJulian - i just did that as a test to compare geo-replication performance to xfs partition vs gluster.fuse mount
18:12 bazzer its was 50%ish quicker.  I have recreated the destination volume afterwards
18:15 seapasulli joined #gluster
18:22 elyograg i do not like it when the linux kernel lies about memory usage.  clearing the vm cache frees up 20GB of memory that is not accounted for in any of the values showing in top or free.  how can I actually see where that memory is being used?
18:23 semiosis lies, damn lies, and linux virtual memory subsys
18:23 elyograg windows lies even more, and solaris is arcane as hell.  overall I like the linux situation best. :)
18:23 semiosis hah
18:24 semiosis elyograg: vmstat?
18:24 elyograg same info.
18:25 semiosis "clearing the vm cache frees up 20GB of memory that is not accounted for in any of the values showing in top or free" -- if it doesnt show up in any of the values, what makes you think 20G was freed?
18:25 semiosis how would you ever come up with that number?
18:29 elyograg used memory went from 24GB to 4GB.  None of the other numbers on top/free changed.
18:29 elyograg i'm used to all the memory being used, but if it's cache memory, it ususally shows up as "cached"
18:30 elyograg that's hovered between 11MB and 16MB the whole time.
18:31 ikk joined #gluster
18:32 elyograg screenshot i shared earlier shows the before picture.  https://www.dropbox.com/s/y97gw0jj4qergw​1/screenshot-gluster-top-sort-memory.png
18:32 glusterbot Title: Dropbox - screenshot-gluster-top-sort-memory.png (at www.dropbox.com)
18:33 elyograg small correction to the 'no other number changed' -- used and free changed.  none of the others.
18:34 elyograg it's slowly creeping back to where it was before, but cached is still small.
18:35 semiosis weird
18:36 elyograg i'm glad to know that I am not having a true low memory situation, but I'd like to see where the memory's going.
18:39 semiosis elyograg: is this a vm or a lxc?
18:40 elyograg not sure what lxc is.  it's a bare-metal Dell server.
18:40 semiosis lxc  - linux containers
18:43 elyograg the self-heal daemon log (doing a full heal) has this fairly often: [pid 42010] lstat("/bricks/d01v01/mdfs2/.glusterfs/43/bc/4​3bc39f1-3a85-4339-83e8-4080a4ffee37/10088578", 0x7f1dd3ffe790) = -1 ELOOP (Too many levels of symbolic links)
18:43 elyograg oh, wait.  not the log.
18:43 elyograg strace on the bricks.
18:46 JoeJulian elyograg: My suggestion came from your slabtop information, "largest one is xfs_inode, at 7480612K" which would be the filesystem cache.
18:46 elyograg the misinformation might be fixed in newer kernel versions, i guess.  CentOS is still runing 2.6.x.
18:50 sputnik13 joined #gluster
18:52 elyograg Fedora requires upgrades too often, and anything else makes it difficult to run Dell's admin software and in-OS firmware upgrades.
18:55 redbeard joined #gluster
18:55 elyograg oh, i did have another question.  Is it safe to kill a rebalance in progress with a TERM signal?  I would expect it to be safe, that the signal will be handled and the current item finished before exit, but I want to make sure.
18:55 partner sounds like you upgrade your firmware on weekly basis :o
18:56 elyograg no, but I don't want to have to take the OS down and boot a firmware CD/DVD.  Having it in the OS means I can do the upgrade and then reboot once.
18:56 semiosis is rebalance ever safe?  (this Q from a guy who installs a whole new OS every 6 months instead of upgrading)
18:56 elyograg there is that, semiosis.
19:01 tdasilva joined #gluster
19:01 elyograg using kill -9 i would expect to be potentially unsafe, but I'm hoping a standard kill will do the right thing.  the reason i ask is that if we start a rebalance but need to add new storage before it's done, we'd like to kill it, add new stuff, and start it again.
19:02 semiosis isn't there a rebalance pause/stop command?
19:02 elyograg the help shows that there is a stop command.
19:02 elyograg don't know why i didn't think of that.
19:03 nage joined #gluster
19:04 elyograg now to ask the obvious followup.  does a rebalance stop *work*? :)
19:04 elyograg I will be testing, of course.
19:04 semiosis then please let us know
19:04 aquagreen joined #gluster
19:09 an_ joined #gluster
19:15 seapasulli joined #gluster
19:16 vstokesjr joined #gluster
19:17 vstokesjr Question, can you Gluster geo-rep be circular?
19:18 kkeithley vstokesjr: no, not yet
19:19 vstokesjr Ok, so it's just one-way.
19:20 vstokesjr Is it planned for 3.5?
19:22 jag3773 joined #gluster
19:22 kkeithley Nope. See  http://www.gluster.org/community/d​ocumentation/index.php/Planning35
19:22 glusterbot Title: Planning35 - GlusterDocumentation (at www.gluster.org)
19:23 kkeithley Too many pots, not enough cooks
19:23 vstokesjr LOL. I understand that.
19:24 vstokesjr However, if the volumes are different I could then have Server1-Vol-A ---geo-rep--> Server2-Vol-A and then Server2-Vol-B ---geo-rep--> Server1-Vol-B  ???
19:25 kkeithley I suspect that'll work
19:26 JoeJulian Can someone please approve my forge project?
19:26 vstokesjr OK, thank you. That helps.
19:26 JoeJulian Is that something you can do kkeithley?
19:30 P0w3r3d joined #gluster
19:32 vpshastry left #gluster
19:39 an__ joined #gluster
19:39 dbruhn_ joined #gluster
19:41 semiosis iirc jclift can approve forge
19:41 wushudoin| joined #gluster
19:42 Philambdo1 joined #gluster
19:42 hagarth1 joined #gluster
19:44 jmarley__ joined #gluster
19:45 davinder2 joined #gluster
19:46 ndevos_ joined #gluster
19:46 ndevos_ joined #gluster
19:48 fidevo joined #gluster
19:48 bugs_ joined #gluster
19:52 [o__o] joined #gluster
19:53 mojorison joined #gluster
19:53 ron-slc joined #gluster
19:53 hchiramm_ joined #gluster
19:54 mtanner_ joined #gluster
19:54 daMaestro joined #gluster
19:54 japuzzo joined #gluster
19:58 redbeard joined #gluster
19:58 tjikkun_work joined #gluster
19:58 jiffe98 joined #gluster
19:58 theron joined #gluster
19:58 T0aD joined #gluster
19:58 psyl0n joined #gluster
19:59 sputnik13net joined #gluster
20:00 lmickh_ joined #gluster
20:01 kaptk2_ joined #gluster
20:01 swat30 joined #gluster
20:01 divbell_ joined #gluster
20:01 T0aD joined #gluster
20:01 bennyturns joined #gluster
20:02 harish_ joined #gluster
20:04 shapemaker joined #gluster
20:05 saltsa joined #gluster
20:05 theron joined #gluster
20:05 redbeard joined #gluster
20:05 tjikkun_work joined #gluster
20:05 jiffe98 joined #gluster
20:05 lanning joined #gluster
20:07 sulky joined #gluster
20:09 badone_ joined #gluster
20:10 jporterfield_ joined #gluster
20:11 elyograg_ joined #gluster
20:20 sulky joined #gluster
20:21 Norky joined #gluster
20:28 cjh973 semiosis: you around?
20:31 cjanbanan joined #gluster
20:40 kkeithley JoeJulian: did someone else already approve it?
20:54 cp0k hey gang, I am back with my peer probing issue...after spending hours of making sure /var/lib/glusterd/peers is all correct and consistent...the problem unfortunately still remains :(
20:54 sputnik13net trying to mount a gluster volume and it's not doing anything...  nothing in the logs
20:54 sputnik13net any clues?
20:54 cp0k when I probe a new peer, gluster says the operation is successful, but on the newly probed peer all I see is the peer info from which I probed from
20:55 cp0k and the peer from which I probed from I see "State: Peer Rejected (Connected)"
20:56 cp0k only real error message I am seeing that suggests trouble is " E [glusterd-utils.c:4255:glusterd_brick_start] 0-management: Could not find peer on which brick 10.0.144.211:/gluster/4 resides"
20:56 cp0k which suggests that my volume data may be out of sync somewhere?
20:57 badone_ joined #gluster
21:01 kkeithley @later tell johnmark: I get emails about proposed projects on forge. The link takes me to a page where I can reply, but I don't see anywhere that I can approve the project.
21:01 glusterbot kkeithley: The operation succeeded.
21:01 elyograg cp0k: is the firewall enabled on one of them?
21:02 cp0k elyograg: this is not a firewall issue
21:02 cp0k elyograg: I am only firewalling SSH
21:06 cjanbanan joined #gluster
21:08 semiosis cjh973: dont ask to ask, just ask
21:09 semiosis cp0k: see ,,(peer rejected)
21:09 glusterbot cp0k: I do not know about 'peer rejected', but I do know about these similar topics: 'peer-rejected'
21:09 semiosis cp0k: see ,,(peer-rejected)
21:09 glusterbot cp0k: http://www.gluster.org/community/documen​tation/index.php/Resolving_Peer_Rejected
21:09 semiosis @alias peer rejected as peer-rejected
21:09 glusterbot semiosis: (alias [<channel>] <oldkey> <newkey> [<number>]) -- Adds a new key <newkey> for factoid associated with <oldkey>. <number> is only necessary if there's more than one factoid associated with <oldkey>. The same action can be accomplished by using the 'learn' function with a new key but an existing (verbatim) factoid content.
21:10 semiosis @alias peer-rejected 'peer rejected'
21:10 glusterbot semiosis: (alias [<channel>] <oldkey> <newkey> [<number>]) -- Adds a new key <newkey> for factoid associated with <oldkey>. <number> is only necessary if there's more than one factoid associated with <oldkey>. The same action can be accomplished by using the 'learn' function with a new key but an existing (verbatim) factoid content.
21:10 semiosis ,,(meh)
21:10 glusterbot I'm not happy about it either
21:10 cp0k semiosis: thanks
21:10 semiosis yw
21:13 Gilbs left #gluster
21:18 _dist joined #gluster
21:19 cp0k so step 4 is confusing: http://www.gluster.org/community/documen​tation/index.php/Resolving_Peer_Rejected
21:19 cp0k how can I probe the healthy peer if its already part of gluster
21:20 cp0k or do they want me to detach it first, and then probe it to the new one?
21:21 glusterbot Title: Resolving Peer Rejected - GlusterDocumentation (at www.gluster.org)
21:21 junaid joined #gluster
21:22 rwheeler joined #gluster
21:22 crazifyngers joined #gluster
21:23 andreask joined #gluster
21:24 semiosis cp0k: just probe it
21:25 semiosis ignore the "already a peer" message
21:25 lmickh joined #gluster
21:27 saltsa_ joined #gluster
21:27 junaid_ joined #gluster
21:29 glusterbot New news from newglusterbugs: [Bug 1019874] Glusterfs crash when remove brick while data are still written <https://bugzilla.redhat.co​m/show_bug.cgi?id=1019874>
21:30 _dist joined #gluster
21:34 Norky joined #gluster
21:35 lanning joined #gluster
21:35 wica Hi, I need to replace a broken disk, i have replica 2 setup. is it possible to use rsync from the "mirror" brick?
21:36 semiosis wica: why dont you just let self-heal sync the data?
21:37 wica because we experience a large load on the volume.
21:37 wica while healing.
21:39 wica we want to to rsync, like to sort of preloading.
21:40 wica its only 2.4T
21:40 wica Hi btw
21:41 cp0k semiosis: It is not working, just keeps saying "peer probe: failed: 10.0.144.214 is already part of another cluster"
21:41 cp0k semiosis: I repeated the instructions about 5 times from A-Z
21:41 cp0k (running Gluster 3.4.2 all around)
21:42 Gilbs joined #gluster
21:42 semiosis ah, already part of *another* cluster -- thats different!
21:42 cp0k seems the only way I am able to get it to work is to manipulate /var/lib/glusterd by hand
21:42 semiosis cp0k: that means you need to fix the uuid in /var/lib/glusterd/glusterd.info -- make it the same as what the other peer thinks its UUID is
21:43 cp0k well yea, how can I probe one of the "good peers" if its already part of gluster?
21:43 semiosis idk exactly what your issue is, but here's how it works...
21:44 semiosis if a glusterd is peered with one or more other glusterd then it will not accept probed, except from existing peers
21:44 cp0k my issue is I am trying to probe a new peer to add new bricks, when I probe the new peer I do not see the full peer list on the new peer
21:44 cp0k right
21:45 cjanbanan joined #gluster
21:45 wica semiosis: wil rsync work? or are the extends attr diff on the bricks?
21:45 cp0k semiosis: what do you mean by "fix the UUID in glustered.info to what the other peer thinks its UUID is"
21:45 cp0k this is a brand new peer I am trying to get into gluster
21:46 semiosis [15:55] <cp0k> and the peer from which I probed from I see "State: Peer Rejected (Connected)"
21:46 Gilbs Does anyone know which files I need to manually edit to remove all entries for geo-replicaion?  I though to edit gsync.conf but it comes back after a restart of glusterd.
21:46 semiosis you should fix the peer rejected problem before trying to expand your cluster
21:47 elyograg cp0k: 'gluster peer status' will not list itself.  only the other peers.
21:47 semiosis cp0k: if your cluster is in good health (all peers reporting Peer In Cluster (connected) for all other peers) then adding a new peer is as simple as probing it from an existing peer
21:48 cp0k Im with ya guys, I understand how it works, but the trouble I am having is weird...let me paste you the outlook so you understand more where I am coming from :)
21:49 P0w3r3d joined #gluster
21:53 cjh973 semiosis: when i get a virtual mount in the gluster api it gives me back a pointer.  do i have to free that pointer when i'm done with it?
21:54 cp0k http://fpaste.org/80720/93451643/
21:54 glusterbot Title: #80720 Fedora Project Pastebin (at fpaste.org)
21:54 semiosis cjh973: not sure, is that what glfs_fini is for?
21:55 cjh973 oh is it?  haha
21:55 cjh973 lemme look
21:56 semiosis cp0k: on the new peer, i would wipe out all of /var/lib/glusterd EXCEPT for glusterd.info, and make sure that file has uuid c034e332-9001-4498-9576-bda1834d44fc in it, then probe both ways between the new peer and an existing peer, and restart glusterd on both a few times
21:57 semiosis cjh973: are you really writing an s3 api in c/c++???
21:57 cjh973 i'm trying :)
21:57 cjh973 it's a fun little project
21:57 semiosis when i think of languages to write an http api, those are way down on the list
21:58 semiosis maybe jruby would be a little easier!
21:58 cjh973 true but i need a project to learn c/c++ though
21:58 cjh973 lol jruby
21:58 semiosis not kidding, jruby is nice
21:58 cjh973 i've never used it
21:59 semiosis its ruby on the jvm, which means you can write code in ruby & call my glusterfs-java-filesystem java library
21:59 semiosis people seem to like ruby a lot for writing web service apis
21:59 cjh973 that's neat
21:59 cjh973 didn't know ruby could run in the jvm
21:59 cjh973 yeah ruby is popular for that kinda sdtuff
22:00 semiosis well if your goal is learning c/c++, then by all means have fun!
22:00 semiosis if your goal was to ship a web service api, then i'd say you're nuts doing it in c++ :D
22:01 semiosis brb, meeting of the coffee committee
22:01 cp0k alright, now I was able to get it into "State: Accepted peer request (Connected)" state
22:02 cp0k semiosis: very important comittee hah
22:02 cjh973 semiosis: yeah my goal is the former :)
22:03 cjh973 i'm not that crazy :D
22:07 sputnik13 gluster is not mounting...  log shows "readv failed Connection reset by peer"
22:07 semiosis cp0k: might try restarting the glusterds again.  not sure why but that seems to help things along
22:07 sputnik13 which is weird because I can mount from the gluster nodes themselves, just not outside
22:07 semiosis sputnik13: connection reset usually means iptables is rejecting
22:07 semiosis sputnik13: or possibly name resolution is sending the request to the wrong host
22:07 sputnik13 I have no iptables running
22:08 semiosis sputnik13: or, less likely, ip address conflict
22:08 sputnik13 I'm also using the same DNS server across all of them
22:08 sputnik13 I can ping fine :(
22:08 sputnik13 I had them all on a different subnet earlier today and everything worked fine, but when I moved everything to a different subnet is when this started happening
22:09 sputnik13 I detached all of the gluster peers and reprobed them with the new IP and hostnames
22:09 sputnik13 then recreated the volume
22:09 sputnik13 also restarted the glusterfs-server on all of them
22:12 sputnik13 does the client hostname matter for the client connection?
22:12 Gilbs Might try stopping and restarting the volume, I had that issue before.
22:12 sputnik13 I had completely deleted the volume and created a new one
22:12 sputnik13 but I will try that as well
22:16 qdk joined #gluster
22:18 sputnik13 I'm doing a tcpdump and I see packets going back and forth
22:19 Gilbs Mounting on the clients is not working right?
22:19 sputnik13 correct
22:19 sputnik13 I can mount on the gluster peers though
22:19 Gilbs gotcha, that was my next question.
22:19 Gilbs ubuntu?
22:19 sputnik13 yup, 12.04.4
22:19 sputnik13 latest gluster from ppa
22:19 sputnik13 3.4.3 I think
22:20 Gilbs and everything was working/mounting prior to the subnet change?
22:20 semiosis i havent made a 3.4.3 yet, i need to
22:20 sputnik13 yes
22:20 Gilbs can you mount via ip vs dns name?
22:20 semiosis sputnik13: make sure there's a glusterfsd process for each brick on the servers
22:21 _dist left #gluster
22:21 semiosis sputnik13: if any are missing, check /var/log/glusterfs/bricks for logs
22:21 sputnik13 semiosis: there is a process on each server, I have a single brick
22:21 semiosis well we've been over everything
22:21 semiosis start pasting some logs on pastie.org, begin with the client & the bricks
22:22 sputnik13 Gilbs: IP also does not work
22:22 Gilbs hmmmm
22:22 semiosis time for 20 questions is over
22:22 semiosis pastie logs
22:22 semiosis (imho)
22:22 Gilbs was /etc/hsots edited with the old ips by chance?
22:23 Gilbs nm that won't change via ip
22:23 sputnik13 Gilbs: no, I'm using a DNS server, no static host entries
22:23 sputnik13 semiosis: glad to, but the thing that puzzles me is that no logs are being updated on the server side
22:24 semiosis sputnik13: what you're saying doesnt add up
22:24 semiosis find the logs
22:24 sputnik13 I know
22:24 sputnik13 the logs are in /var/log/gluster no?
22:24 semiosis if the processes are running, they're making logs
22:24 semiosis use lsof if you have to
22:24 keulator joined #gluster
22:24 sputnik13 they're open, not saying there's no logs
22:24 Gilbs /var/log/glusterfs
22:24 sputnik13 I'm saying they're not being updated on the server side
22:24 semiosis disk full?
22:25 sputnik13 nope
22:25 semiosis if you stopped & started the volume logs had to be updated
22:25 cjanbanan joined #gluster
22:26 sputnik13 sorry, I should have been more precise, what I mean is that between when I try the mount and when I force cancel it there are no log updates
22:27 sputnik13 iptables on the client should not matter no?
22:27 semiosis depends on the rules
22:27 Gilbs I don't thik ubuntu runs iptables by default
22:27 semiosis Gilbs: if things were default, there wouldnt be any problem :)
22:28 sputnik13 ok, I flushed the iptables on the client
22:28 sputnik13 server has none
22:28 sputnik13 still no dice
22:28 Gilbs ah server, was confused there.
22:28 sputnik13 weeeeeird
22:29 sputnik13 the other weird thing?  I can telnet to the port that's trying to connect
22:29 sputnik13 24007
22:29 * sputnik13 is completely confused
22:30 semiosis sputnik13: what is the exact mount command?
22:30 sputnik13 gluster does not use any kernel modules correct?
22:30 semiosis correct
22:30 sputnik13 semiosis: mount -t glusterfs 10.2.16.111:/bcf_gluster gluster
22:30 sputnik13 bcf_gluster is my volume, gluster is the folder I'm mounting to
22:30 semiosis thx
22:31 semiosis please put on pastie.org the full content of /var/log/glusterd/gluster.log
22:31 semiosis and give the link here
22:31 MacWinner joined #gluster
22:31 semiosis also please ,,(pasteinfo)
22:31 glusterbot Please paste the output of "gluster volume info" to http://fpaste.org or http://dpaste.org then paste the link that's generated here.
22:31 sputnik13 from the client yes?
22:31 semiosis yes
22:32 semiosis s/glusterd/glusterfs/
22:32 glusterbot What semiosis meant to say was: please put on pastie.org the full content of /var/log/glusterfs/gluster.log
22:32 semiosis ^^
22:33 sputnik13 oh boi...  the full log?
22:33 sputnik13 how about just the lines from when I start to mount
22:33 sputnik13 http://pastie.org/8794735
22:33 glusterbot Title: #8794735 - Pastie (at pastie.org)
22:33 sputnik13 a couple occurrences
22:34 sputnik13 the block of log entries starting at 2014-02-26 21:18:15.445551 seems the most "complete"
22:34 sputnik13 that's an instance where I let it time out
22:35 sputnik13 I noticed that when it works it works immediately so I've been killing it after a few seconds
22:37 semiosis [2014-02-26 21:33:56.933100] E [glusterfsd-mgmt.c:1674:mgmt_getspec_cbk] 0-mgmt: failed to fetch volume file (key:/bcf_gluster)
22:37 semiosis try without the / in your mount command... mount -t glusterfs 10.2.16.111:bcf_gluster gluster
22:37 sputnik13 that's the volume no?
22:37 sputnik13 ok
22:38 sputnik13 no dice, still hung
22:38 semiosis truncate the log first, so you can pastie the whole thing after the next try
22:38 sputnik13 ok
22:38 semiosis also waiting for that ,,(pasteinfo)
22:38 glusterbot Please paste the output of "gluster volume info" to http://fpaste.org or http://dpaste.org then paste the link that's generated here.
22:39 sputnik13 I'll let it time out
22:39 sputnik13 so...  the mount command is what I was using earlier today when it worked
22:40 sputnik13 except for the hostname/ip which is now different
22:41 cjanbanan joined #gluster
22:41 semiosis [2014-02-26 21:16:45.211158] I [glusterfsd.c:1910:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.4.2 (/usr/sbin/glusterfs --volfile-id=/bcf_gluster --volfile-server=gluster2.bcf.prod gluster)
22:41 semiosis your log says gluster2.bcf.prod but your (claimed) mount command has an ip, 10.2.16.111. what gives?
22:42 sputnik13 it's an 8 node gluster
22:42 sputnik13 I tried against gluster2 once, which is 112
22:42 dbruhn joined #gluster
22:43 semiosis also you said you got a connection reset message... i dont see that
22:43 semiosis these look like timeouts
22:43 sputnik13 oh wait
22:43 sputnik13 idea
22:43 nage joined #gluster
22:43 sputnik13 and I bet it's the problem...
22:43 * sputnik13 grumbles
22:44 semiosis well, what is it?
22:44 sputnik13 mtu
22:44 sputnik13 have to check
22:44 semiosis wow
22:44 sputnik13 one sec
22:44 sputnik13 nope, scratch that
22:44 semiosis hahahaa
22:44 sputnik13 wait, doh
22:45 sputnik13 yeah mtu
22:45 * sputnik13 grumbles and pounds head on wall
22:46 sputnik13 sorry to waste your collective times
22:47 cp0k semiosis: this is as far as I was able to get it go....http://fpaste.org/80735/39345482/
22:47 glusterbot Title: #80735 Fedora Project Pastebin (at fpaste.org)
22:47 cp0k at least now all the peers see my newly probed peer (10.0.144.115)
22:47 semiosis sputnik13: well at least that was a new one for me.  i thought i'd seen it all when it comes to mount problems
22:48 cp0k but it still all in the State: Peer Rejected (Connected) state :/
22:50 sputnik13 I've seen similar weirdness with ssh, you start losing packets that go beyond 1500 bytes
22:51 sputnik13 but not every tcp packet is > 1500 bytes especially when you're establishing a connection
22:51 sputnik13 I'm pretty sure it's an mtu problem, just have to see why jumbo frame isn't working...  I have 9216 set on all switch ports
22:52 sputnik13 in fact it absolutely is the problem, I just set the mtu locally on each gluster server to 1500 and the mount went through perfect
22:52 * sputnik13 pounds head on wall
22:54 mtanner joined #gluster
23:00 Matthaeus1 joined #gluster
23:00 jiqiren i noticed after going from gluster 3.3.0 to 3.4.2, when i run gluster volume heal myvol info, I almost always see something
23:01 jiqiren example, running that command with watch, (running the volume heal info bit) i'll see nothing, then something, then nothing, etc over and over
23:02 jiqiren is that normal?
23:03 harish_ joined #gluster
23:04 cjanbanan joined #gluster
23:05 lmickh joined #gluster
23:07 jiqiren fwiw this is centos 6.5 with latest updates
23:08 JoeJulian fairly. It's probably transitional states as files are in the process of being written.
23:09 Gilbs Yeah, as soon as I get geo-replicaion working JoeJulian shows up  :)
23:11 aquagreen joined #gluster
23:12 partner i'm glad to see there's lots of success on this channel, been keeping monologue #elsewhere trying to fight the issues summing up the gluster server bandwidth usage..
23:13 jiqiren also when i run info split-brain i see a number of entries, but the file seems ok now. are those just "past split-brains" that being show?
23:13 jiqiren (that are being showen)*
23:13 dbruhn left #gluster
23:14 jiqiren my output: http://pastie.org/private/87y0ll70a5lztbovfeb92g
23:14 glusterbot Title: Private Paste - Pastie (at pastie.org)
23:15 JoeJulian Gilbs: Perfect!
23:17 JoeJulian jiqiren: The last split-brain issue was logged 21.6 hours ago.
23:18 jiqiren ok, so it is just showing past problems. thanks
23:18 JoeJulian You're welcome.
23:20 cp0k any further advice for me on my trouble probing the new peer? this is as far as I was able to get it on the new peer: http://fpaste.org/80735/39345482/
23:20 glusterbot Title: #80735 Fedora Project Pastebin (at fpaste.org)
23:20 Gilbs JoeJulian:  While you're here though, what files do I need to manually edit to remove instances of geo-replicaion if I cannot stop a session?  I noticed after editing and even fully deleting gsync.conf it still comes back to life after restarting glusterd.
23:20 ekis_isa joined #gluster
23:21 JoeJulian Gilbs: Not entirely sure. Something in /var/lib/glusterd/vols/$volume though.
23:22 keulator joined #gluster
23:22 Gilbs Ok, i'll keep digging in my lab.
23:24 cp0k JoeJulian: hey, I am having issues adding a new peer....it just says that the peer is rejected. I followed the instructions @ http://www.gluster.org/community/documen​tation/index.php/Resolving_Peer_Rejected and was able to get it to this point: http://fpaste.org/80735/39345482/
23:24 glusterbot Title: Resolving Peer Rejected - GlusterDocumentation (at www.gluster.org)
23:24 cp0k any idea?
23:24 cp0k s/idea/ideas/g
23:24 glusterbot cp0k: Error: u's/idea/ideas/g any idea?' is not a valid regular expression.
23:24 cp0k s/idea/ideas/g
23:25 JoeJulian I hate this issue. It feels so Microsoft to suggest the same stupid thing every time, restarting all glusterd.
23:26 JoeJulian Maybe figuring out the whole peering process will be my next deep-dive.
23:27 JoeJulian cp0k: Anything in the glusterd logs?
23:29 Gilbs cp0k:  was this coming from an upgrade?
23:33 cp0k Gilbs: yes
23:33 cp0k Gilbs: after upgrading to 3.4.2 I need to add new bricks
23:34 cp0k JoeJulian: there was one error msg in the logs that stood out, let me see if I can find it
23:34 cjanbanan joined #gluster
23:35 cp0k " E [glusterd-utils.c:4255:glusterd_brick_start] 0-management: Could not find peer on which brick 10.0.144.211:/gluster/4 resides"
23:35 cp0k " E [glusterd-utils.c:4255:glusterd_brick_start] 0-management: Could not find peer on which brick 10.0.144.211:/gluster/4 resides"
23:36 JoeJulian Ok, that would do it.
23:37 cp0k what does this mean? it lost the location of one of my bricks in the volume info?
23:37 cp0k how would I get this back?
23:37 JoeJulian I would probably just cheat. rsync peers from two different servers and delete the one for this server.
23:37 cp0k JoeJulian: yes, I have done this in the past and got success
23:38 cp0k JoeJulian: the only thing was, the rest of the peers would not know about the newly added peer
23:38 cp0k JoeJulian: Unless I go onto every peer and manually add it to /var/lib/glusterd/peers
23:38 JoeJulian After doing that cheat, probe this server from one other.
23:39 cp0k I guess thats an idea, I will try that :)
23:39 cp0k thanks
23:39 cp0k but what about the whole thing about it not being able to find which brick /gluster/4 is on? whats that all about?
23:41 JoeJulian Somehow it has volume data but no peer connections. I'm guessing that the broken volume data is breaking the peer connections. The broken peer connection is causing the broken volume status. Catch 22.
23:41 cp0k that makes sense, the fun dont stop haha
23:42 cp0k going to take a rest and crunch this data and then get back to it on a fresh mind
23:42 cp0k as always, thanks for all your help guys
23:50 jiqiren cp0k: that 1 peer could just have the wrong iptables rules or something
23:50 Gilbs left #gluster
23:54 gdubreui joined #gluster
23:57 nueces joined #gluster

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary