Perl 6 - the future is here, just unevenly distributed

IRC log for #gluster, 2014-02-24

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:02 cjanbanan joined #gluster
00:17 jporterfield joined #gluster
00:21 mtanner joined #gluster
00:31 tokik joined #gluster
00:33 gdubreui joined #gluster
00:38 tokik joined #gluster
00:44 bala joined #gluster
00:49 eryc_ joined #gluster
00:50 nightwalk joined #gluster
00:50 jporterfield_ joined #gluster
00:50 psyl0n joined #gluster
00:50 bala joined #gluster
00:50 khushildep joined #gluster
00:50 rfortier joined #gluster
00:50 cfeller joined #gluster
00:50 plarsen joined #gluster
00:50 glusterbot joined #gluster
00:50 harish joined #gluster
00:50 qdk joined #gluster
00:50 hagarth joined #gluster
00:50 16WAATW8V joined #gluster
00:50 morse joined #gluster
00:50 msvbhat joined #gluster
00:50 cyberbootje joined #gluster
00:56 psyl0n joined #gluster
00:58 rfortier1 joined #gluster
01:00 rfortier joined #gluster
01:05 cp0k joined #gluster
01:07 cp0k Hey guys, anyone around who has dealt with a split brain in the past?
01:09 cp0k I dont suppose JoeJulian is around on the weekends?
01:11 qdk joined #gluster
01:11 cjanbanan joined #gluster
01:12 Frankl joined #gluster
01:21 Frankl JoeJulian: hagarth: I have tried to enlarge the cluster.background-self-heal-count & least-io-threads, it didn't help much to my issue: proactive self-healing DOS the volume, clients's access is very slow.
01:22 Frankl JoeJulian: hagarth: could I disable proactive self-heal by setting the option self-heal-daemon off? my gluster is 3.3.2
01:29 harish joined #gluster
01:31 jporterfield joined #gluster
01:51 JoeJulian Frankl: You could. I can't imagine you'd gain anything from that. How are you determining that proactive self-healing is causing the problem? I find that to be extremely unlikely.
02:03 kevein joined #gluster
02:05 Frankl JoeJulian: enlarge self-heal-count could help a little, the access speed of a file is somewhat faster but still not acceptable
02:07 Frankl JoeJulian: If I make the node offline again, access speed is fine, but if the node is back online, the access speed is intolerable. and the node's io is very high.using strace and pstack, I think the very high IO activity is caused by self-heal rather than others
02:12 jporterfield joined #gluster
02:12 JoeJulian Imagine two bricks. A and B. A has been offline for some time. B has 1000 1GB files that have been touched since A went offline. You have a background self-heal queue of 10. You re-add A. Your client accesses approximately 100 files per second. 10 of them are serviced immediately while self-heal begins in the background. The 11th file seems to hang. That's because the background queue is full and now it must be healed in real time. 1Gig etherne
02:12 JoeJulian t, that file takes 16 seconds to heal. The client will block on that file during that time.
02:13 JoeJulian So at 100 files per second, 1589 files are stacked up and waiting in that 16 seconds.
02:13 JoeJulian Assuming each access is unique.
02:15 JoeJulian That, btw, has nothing at all to do with proactive self-heal.
02:17 Frankl the node is offline for about 10 days, and there are 60,000 files per day, do we will have 600,000 files, at the same time each file is about 100MB.
02:18 JoeJulian We're only immediately concerned with existing files accessed through the client mount.
02:19 cjanbanan joined #gluster
02:19 Frankl Joejulian: cluster.background-self-heal-count: 512 which I have set
02:20 JoeJulian new files will be new simultaneously. Files that were added while the brick was offline that are not being accessed by a client, are irrelevant and will be healed over time at lowest priority by the self-heal daemon.
02:22 JoeJulian I'm explaining all this so you can decide if that's what's happening in your situation.
02:22 JoeJulian If it's not, then obviously it must be something else.
02:24 Frankl JoeJulian: thanks, I will dig into
02:24 JoeJulian Hmm, just thought of one other possibility. Does your application do directory lookups, and are there hundreds or thousands of files per directory?
02:26 JoeJulian Frankl: The more I think about this, the more unhappy I am with it. I will file a bug report with my thoughts...
02:26 glusterbot https://bugzilla.redhat.com/en​ter_bug.cgi?product=GlusterFS
02:27 Frankl yes, you got it. There are many directories here. the pattern is temp/f/fb/fff5edfdffe3f41e27b7883dd58dacc19fd1ca​06/fff5edfdffe3f41e27b7883dd58dacc19fd1ca06.pfv, as you see, there are 4096 directories(f/fb is 3 hex char) and each dir will have many files, each file has a dir with the same name
02:28 rfortier joined #gluster
02:28 JoeJulian if your application does a lookup() for every file in that directory (often they do that by calling stat() for every dirent) that will trigger self-heal for all of them and you'll hang when you exceed that background self-heal count.
02:28 rfortier1 joined #gluster
02:29 JoeJulian One possibility that I've thought of. Use iptables to block your live clients from the brick. This will prevent the client from connecting but the daemon will still be able to heal.
02:30 JoeJulian You could expedite the heal by allowing a non-production client to access the volume (and the replaced brick) and running "find" on the client mount.
02:31 JoeJulian Sure, new files and changes will be immediately dirty, but as you bring the brick closer to consistency, you could eventually drop the iptables block and you should be able to ... and I'm changing my mind again...
02:31 harish joined #gluster
02:32 JoeJulian How about this...
02:33 JoeJulian Check the process list for glusterfs. You should have a process that looks roughly like "/usr/sbin/glusterfs --volfile-id=centos --volfile-server=glusterfs /mnt/gluster/centos"
02:34 JoeJulian unmount your client and mount it instead by executing that command with the additional option, "--xlator-option=afr.data-self-heal=off"
02:35 JoeJulian When you brick your old brick back online, the client should not attempt to do the data self-heal, instead leaving that up to the self-heal daemon.
02:35 Frankl yes, iptables might help. I could try this hacking way to blocking the clients files.
02:36 Frankl For I could not touch the clients, so maybe I could not add any option to clients
02:37 JoeJulian If you cannot touch the clients then I guess the iptables route may be the only possibility...
02:37 JoeJulian Oh... I have another idea.
02:38 JoeJulian You could set cluster.data-self-heal off through "gluster volume set"
02:39 JoeJulian Then you could mount the volume manually somewhere else though the glusterfs command instead setting "--xlator-option=afr.data-self-heal=on" and doing the find on that temporary mount.
02:41 Frankl this is would be more reasonable. I could try this
02:44 haomaiwa_ joined #gluster
02:48 JoeJulian Frankl: You might be interested in adding yourself to the cc list for bug 1069041
02:49 glusterbot Bug https://bugzilla.redhat.com:​443/show_bug.cgi?id=1069041 unspecified, unspecified, ---, pkarampu, NEW , Provide a method for preventing the client from being blocked when the background-self-heal queue is full
02:50 Frankl yes. this issue is blocking us to bringing back the convalescent node. THanks JoeJulian
02:59 haomaiw__ joined #gluster
03:01 nightwalk joined #gluster
03:01 cjanbanan joined #gluster
03:03 glusterbot New news from newglusterbugs: [Bug 1069041] Provide a method for preventing the client from being blocked when the background-self-heal queue is full <https://bugzilla.redhat.co​m/show_bug.cgi?id=1069041>
03:06 bharata-rao joined #gluster
03:08 cp0k joined #gluster
03:08 cp0k Hello, anyone home?
03:08 haomai___ joined #gluster
03:15 hagarth joined #gluster
03:17 hagarth Frankl: have you figured out what glusterfsd processes are busy with?
03:20 JoeJulian cp0k: hey
03:20 cp0k JoeJulian: Hey, wow, you are here
03:20 JoeJulian hagarth: What do you think of my hack?
03:21 JoeJulian cp0k: No, I'm just a caffeine hallucination.
03:21 cp0k JoeJulian: I am seeing the number of total split-brain entries sky rocket on my volume :(
03:21 cp0k heh
03:21 JoeJulian cp0k: you win!
03:22 cp0k JoeJulian: the majority of the files listed are gfid's, not actual filenames
03:22 hagarth JoeJulian: I think I missed your hack
03:23 cp0k At this point I am running dangerously low on disk space and need to add additional nodes to the cluster by tomorrow at your latest. The data that is split-brain I can rm, it that means I can get the system all cleaned up prior to the brick adds
03:23 JoeJulian gluster volume set $vol cluster.data-self-heal off
03:23 JoeJulian Then you could mount the volume manually somewhere else though the glusterfs command instead setting "--xlator-option=afr.data-self-heal=on" and doing the find on that temporary mount
03:23 cp0k so I understand I need to wipe out the hardlink in .glusterfs, as well as rm the file on the system? or will deleting the hard link from .glusterfs tree also wipe the file?
03:24 JoeJulian deleting the hardlink will not delete the other directory entry, no.
03:24 cp0k JoeJulian: do you recommend I stop the self heal now?
03:24 JoeJulian The inode will remain in use.
03:24 cp0k JoeJulian: it seems there is so much crap there that Gluster is just tripping up over itself more and more
03:24 JoeJulian cp0k: Nope. If nothing else, the self heal is telling you where you have issues.
03:25 hchiramm_ joined #gluster
03:25 JoeJulian I assume you know how you got into this situation?
03:25 cp0k Not so much :(
03:26 JoeJulian cp0k: Or are you suggesting you punt and wipe one of the bricks?
03:26 cp0k this system has been runing for a year or so, no connectivity issues between the storage nodes or anything like that
03:26 hagarth JoeJulian: yes, I have used a combination of the two in extreme cases.
03:26 cp0k JoeJulian: I really would hate to do something like that, especially the way things look right now -
03:26 cp0k http://fpaste.org/79711/13932123/
03:27 glusterbot Title: #79711 Fedora Project Pastebin (at fpaste.org)
03:27 JoeJulian hagarth: If you do set cluster.data-self-heal off, does that affect the self-heal daemon?
03:27 hagarth JoeJulian: no, only cluster.self-heal-daemon off affects it
03:28 cp0k JoeJulian: my plan of action is to see if I can rm the files that are showing up as split-brain, if I can get the number of split-brain entries to go down, then I'll work all night if I have to to clean everythign up manually
03:28 JoeJulian cp0k: It won't go down. Again, those are log entries. If you check the uniqueness of the gfids, you may not have as much as you think.
03:30 cp0k JoeJulian: okay, so maybe I misunderstood your blog post on split brain then...I am under the impression that deleting the file and the gfid from the .glusterfs makes things better
03:30 JoeJulian Right. Pick one that's good and delete the other one.
03:31 JoeJulian What I'm saying is that "gluster volume heal $vol split-brain" may list the same file over and over and over again every time it's been identified as split brain during self-heal runs or during attempted client accesses.
03:32 JoeJulian s/vol split/vol info split/
03:32 glusterbot What JoeJulian meant to say was: What I'm saying is that "gluster volume heal $vol info split-brain" may list the same file over and over and over again every time it's been identified as split brain during self-heal runs or during attempted client accesses.
03:33 cp0k JoeJulian: okay, but what if I dont care about the file and just delete it from all bricks
03:33 JoeJulian Should be fine.
03:33 cp0k and also delete the gfid in .glusterfs
03:33 JoeJulian I assumed that's what you meant.
03:33 cp0k yea
03:34 JoeJulian The file will stay in the "info split-brain" report though. Just no new entries will be added for that file.
03:34 cp0k I have tried that earlier, the only number of entries that I was able to make go down was for "gluster volume heal volname info"
03:34 cp0k right, I noticed that
03:34 rfortier joined #gluster
03:35 cp0k so if I get the number of 'gluster volume heal volname info" down to zero, I should be all good?
03:35 JoeJulian There are two things you can do. Stop all glusterd and start them again. That will clear that log temporarily, or just keep track of the timestamp where you last thought you were done fixing split-brain and see if anything new happens after that.
03:36 cp0k and anyway to reset the "info split-brain" report so that it does a fresh lookup? the entries almost all being at 1023 scared the crap out of me earlier....I thought I had the whole system corrupted
03:36 cp0k JoeJulian: gotcha, I figured restarting glusterd may reset that :)
03:37 cp0k JoeJulian: do you know what the reason may be behind only a gfid coming up in the split-brain output? rather then a path to the file?
03:38 ajha joined #gluster
03:41 cjanbanan joined #gluster
03:41 JoeJulian Look in $brick/.glusterfs/indices/xattrop. That's where the index is for self-heal. Note they're all gfid based. The gfid represents the cluster-wide inode number so they can be consistent. A self-heal can be performed against just the gfid entry and it will heal the file at the same time. The report, unless the self-heal daemon has crawled the actual file associated with that inode, has no way of knowing what filename is associated.
03:41 JoeJulian Further, what if that inode is associated with two different (hardlink) filenames? Which one would be correct to report?
03:46 cp0k JoeJulian: I honestly have no idea
03:46 itisravi joined #gluster
03:46 JoeJulian That's because there's no right answer. :D
03:47 cp0k :)
03:47 shubhendu joined #gluster
03:48 cp0k Im playing with removing the files that report to be in need of healing from the bad bricks
03:48 badone joined #gluster
03:48 cp0k then runing stat on them, to see if that would heal it up
03:49 JoeJulian Yep, that would work.
03:49 cp0k https://github.com/gluster/gluster​fs/blob/master/doc/split-brain.md
03:49 glusterbot Title: glusterfs/doc/split-brain.md at master · gluster/glusterfs · GitHub (at github.com)
03:49 cp0k this has been awesome documentation btw
03:50 JoeJulian Cool. I'm going to obsolete it tomorrow.
03:50 cp0k oh, your going to release those new notes tomorrow? :)
03:50 JoeJulian That's what I'm working on right now.
03:50 cp0k awesome! perfect timing for me heh
03:51 cp0k btw, I called RedHat to ask them how much technical support would cost...they told me that I would hear back from a sales rep soon....its been 4 days with not a single email / call
03:51 JoeJulian I guess they made enough money...
03:51 cp0k any idea what these guys charge for support?
03:51 cp0k yes, exactly :(
03:51 JoeJulian No clue.
03:52 JoeJulian I know more than they do, so I'm not really their target customer.
03:52 cp0k with all the support you provide here, why not work for them and get paid? :)
03:52 JoeJulian Then I'd HAVE to do it and all the fun would be gone.
03:55 JoeJulian cp0k: Wasn't it you that ran my python script and saw no dirty files?
03:56 cp0k no, that was not me
03:57 cp0k so Im looking at a file thats coming up in gluster volume heal volname info, its only on one of two storage nodes
03:57 cp0k when I run a md5sum on it, I get input / output error
03:57 JoeJulian What's the client log say?
03:58 mtanner_ joined #gluster
03:58 cp0k Unable to self-heal contents of
03:58 cp0k background  data self-heal failed on
03:58 cp0k (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix:  [ [ 0 2 ] [ 14 0 ] ]
04:01 cp0k Cant seem to find the gfid in the .glusterfs tree
04:01 cp0k http://fpaste.org/79714/21438013/
04:01 glusterbot Title: #79714 Fedora Project Pastebin (at fpaste.org)
04:03 bala joined #gluster
04:04 DV__ joined #gluster
04:05 satheesh2 joined #gluster
04:05 davinder joined #gluster
04:05 ira joined #gluster
04:07 ppai joined #gluster
04:07 kevein_ joined #gluster
04:11 cjanbanan joined #gluster
04:13 JoeJulian Uh-oh... I think I killed cp0k...
04:14 ndarshan joined #gluster
04:14 sahina joined #gluster
04:14 bala joined #gluster
04:16 RameshN joined #gluster
04:23 cp0k joined #gluster
04:23 cp0k oops, got kicked off, hope I didn't miss any messages
04:26 rfortier joined #gluster
04:29 JoeJulian Only a pm.
04:29 JoeJulian did you get that?
04:30 cp0k yes, I got it, thanks
04:30 cp0k I am not the CTO / CEO of the company I work for, but will be sure to run it by my boss tomorrow
04:39 cjanbanan joined #gluster
04:42 jporterfield joined #gluster
04:42 qdk joined #gluster
04:47 nshaikh joined #gluster
04:52 prasanth joined #gluster
04:54 KORG joined #gluster
04:55 KORG joined #gluster
04:57 Frankl JoeJulian: cluster.data-self-heal: off didn't work in my enviroment
04:58 Frankl I have set the option off using gluster v set and use two clients one mouting without any options and the other using --xlator-option=afr.data-self-heal=on
04:58 cp0k JoeJulian: With some trial and error, I am happy to report that I am getting the number of split-brain entries down :D
04:59 Frankl JoeJulian: the one with off option still could trigger the self-heal. Anything I missed?
05:00 rjoseph joined #gluster
05:09 cjanbanan joined #gluster
05:13 kdhananjay joined #gluster
05:19 cp0k joined #gluster
05:32 XpineX__ joined #gluster
05:39 cjanbanan joined #gluster
05:44 jporterfield joined #gluster
05:56 cp0k joined #gluster
05:56 ColPanik joined #gluster
06:00 cjanbanan joined #gluster
06:01 prasanth joined #gluster
06:02 ColPanik I've "inherited" a two-brick replicated setup of Gluster 3.3.2 and noticed that the SHD doesn't report as running.  Both bricks have been restarted and rebooted within the last few days, but no luck.  Last activity in the logs is from several months ago.  suggestions?
06:02 ppai joined #gluster
06:03 ColPanik ^^^ last activity in the SHD logs was several months ago
06:09 Philambdo joined #gluster
06:10 vpshastry joined #gluster
06:16 haomaiwa_ joined #gluster
06:20 raghu joined #gluster
06:21 RameshN joined #gluster
06:24 dusmantkp_ joined #gluster
06:30 benjamin_____ joined #gluster
06:32 hagarth joined #gluster
06:34 vimal joined #gluster
06:35 cjanbanan joined #gluster
06:38 rfortier joined #gluster
06:47 jporterfield joined #gluster
06:50 dusmantkp_ joined #gluster
06:51 sahina joined #gluster
07:00 cjanbanan joined #gluster
07:05 lalatenduM joined #gluster
07:09 ColPanik joined #gluster
07:16 saurabh joined #gluster
07:17 ngoswami joined #gluster
07:17 jporterfield joined #gluster
07:18 harish joined #gluster
07:27 jtux joined #gluster
07:32 haomaiwa_ joined #gluster
07:37 rastar joined #gluster
07:40 mgebbe_ joined #gluster
07:47 cjanbanan joined #gluster
07:50 rgustafs joined #gluster
07:50 ekuric joined #gluster
07:53 cjanbanan joined #gluster
07:56 morse_ joined #gluster
07:56 glusterbot` joined #gluster
07:57 ctria joined #gluster
07:59 fsimonce joined #gluster
08:01 ndevos NuxRo: yesterday I found out that you also need this patch: https://forge.gluster.org/cloudstack-gl​uster/cloudstack/commit/a03505787c24e00​c7a9e3a79f441417d4d3e1d10?format=patch
08:02 rossi_ joined #gluster
08:04 msvbhat joined #gluster
08:04 ndevos @learn cloudstack as Support for Primary Storage on Gluster is coming, see http://blog.nixpanic.net/2014/02/setti​ng-up-test-environment-for-apache.html for setting up a test environment.
08:04 glusterbot ndevos: The operation succeeded.
08:05 eseyman joined #gluster
08:05 romero joined #gluster
08:05 jporterfield joined #gluster
08:06 cfeller joined #gluster
08:07 franc joined #gluster
08:07 franc joined #gluster
08:09 rgustafs joined #gluster
08:10 qdk joined #gluster
08:11 keytab joined #gluster
08:11 cyberbootje joined #gluster
08:12 vpshastry joined #gluster
08:12 Philambdo joined #gluster
08:13 hagarth joined #gluster
08:13 davinder joined #gluster
08:14 dusmantkp_ joined #gluster
08:15 junaid joined #gluster
08:15 rgustafs joined #gluster
08:18 keytab joined #gluster
08:22 hchiramm__ joined #gluster
08:26 cjanbanan joined #gluster
08:26 hybrid512 joined #gluster
08:28 sahina joined #gluster
08:30 kdhananjay joined #gluster
08:31 kanagaraj joined #gluster
08:34 rjoseph1 joined #gluster
08:38 badone joined #gluster
08:46 vpshastry1 joined #gluster
08:47 rastar joined #gluster
08:48 ppai joined #gluster
08:48 bala joined #gluster
08:49 RameshN joined #gluster
08:49 ndarshan joined #gluster
08:50 hagarth joined #gluster
08:51 vpshastry2 joined #gluster
08:52 shubhendu joined #gluster
08:55 andreask joined #gluster
08:56 Nev___ joined #gluster
08:57 dusmantkp_ joined #gluster
09:01 liquidat joined #gluster
09:06 DV__ joined #gluster
09:17 khushildep joined #gluster
09:21 Norky joined #gluster
09:37 Slash joined #gluster
09:39 ngoswami joined #gluster
09:40 16WAAURZD joined #gluster
09:51 sahina joined #gluster
09:53 Rydekull joined #gluster
09:54 RameshN joined #gluster
09:55 vpshastry1 joined #gluster
09:55 dusmantkp_ joined #gluster
10:01 davinder joined #gluster
10:03 mgebbe joined #gluster
10:03 rfortier joined #gluster
10:06 badone joined #gluster
10:22 mgebbe_ joined #gluster
10:23 sahina joined #gluster
10:26 kanagaraj joined #gluster
10:30 ngoswami joined #gluster
10:32 RameshN joined #gluster
10:35 Norky with v3.4, "gluster create" now wants bricks *not* to be the root directory of a filesystem, i.e. one is encouraged to create a subdirectory in a FS, and use that as the brick
10:36 Norky I can't find anything about that in the Release Notes, is there anywhere I can read about the rationale for that change?
10:36 vpshastry joined #gluster
10:37 Norky I can think of one problem it solves, but I wanted the developers' reasoning
10:37 glusterbot New news from newglusterbugs: [Bug 1021686] refactor AFR module <https://bugzilla.redhat.co​m/show_bug.cgi?id=1021686>
10:39 andreask Norky: I'm not a developer but what I think it's all about having a chance to detect a not mounted brick filesystem
10:40 Norky that'd be another advantage
10:45 benjamin_____ joined #gluster
10:53 calum_ joined #gluster
10:55 ndevos Norky: yes, andreask ios right, having a brick on /bricks/disk/data where /bricks/disk is a mountpoint, prevents starting the brick process in case /bricks/disk/data is not avaialble (/bricks/disk failed to mount?)
10:56 ndevos Norky: it is not so much a developers thing, more like an enforcing of best practise
11:01 Frankl_ joined #gluster
11:01 fsimonce` joined #gluster
11:02 romero joined #gluster
11:03 rossi_ Norky: the direcotory is created by gluster automatically, btw, no need to create it by hand: gluster create volume gv0 node:/brick/somedir - somedir will be created
11:03 Norky whatever the case, I coulnd'#t find it documented anywhere :)
11:05 ColPanik_ joined #gluster
11:07 glusterbot` joined #gluster
11:09 divbell_ joined #gluster
11:13 nikk joined #gluster
11:14 cfeller_ joined #gluster
11:14 purpleidea joined #gluster
11:14 jporterfield_ joined #gluster
11:14 vincent_1dk joined #gluster
11:14 purpleidea joined #gluster
11:19 romero_ joined #gluster
11:22 msvbhat joined #gluster
11:26 smithyuk1 joined #gluster
11:27 shubhendu joined #gluster
11:42 ProT-0-TypE joined #gluster
11:44 kdhananjay joined #gluster
11:45 vpshastry joined #gluster
11:47 xavih_ joined #gluster
11:48 kdhananjay kd
11:48 aravindavk joined #gluster
11:52 bala joined #gluster
11:53 lalatenduM joined #gluster
11:54 smithyuk1 Hi guys, having some problems with gluster since upgrading 3.3.0 to 3.4.2. Posted in here the other day asking about a localhost bug which this chap also seems to have (http://www.gluster.org/pipermail/glu​ster-users/2013-October/037758.html). Just tried to add a new brick (successful probe) but got an error "volume add-brick: failed: Host is not in 'Peer in Cluster' state"; if I tried to restart glusterd it failed, followed instructions here to fix (ht
11:54 smithyuk1 tps://www.gluster.org/community/documenta​tion/index.php/Resolving_Peer_Rejected). After putting gluster into debug mode we noticed that it was rejecting peer because it couldn't find the IP supplied. How I suspect this is all caused by the localhost issue that we're seeing with our other peers. Any idea how we would proceeed? TL;DR: [3.4.2] localhost bug means we can't add new peers/bricks
11:54 glusterbot Title: [Gluster-users] Extra work in gluster volume rebalance and odd reporting (at www.gluster.org)
11:54 harish joined #gluster
11:59 ndarshan joined #gluster
12:00 ppai joined #gluster
12:05 Norky rossi_, I didn't realise that, thank you
12:07 Norky I was wondering about adopting this layout in existing volumes. A brick move seems onerous, would stopping the volume, moving brick contents to within subdirs and editing the volfile be a sensible approach?
12:11 vpshastry joined #gluster
12:13 participation joined #gluster
12:13 itisravi joined #gluster
12:16 ppai joined #gluster
12:19 participation Hello everyone, would this be the right place to ask about gluster and if it is suitable for my scenario?
12:19 xavih joined #gluster
12:20 ndevos participation: yes, this is the right place for it
12:23 edward1 joined #gluster
12:27 participation ndevos, thank you! I am wondering if I can use gluster and how to go about doing so.
12:28 participation We a tiny two man design agency that we run from my home. For storage I have two synology ds1511+, a mac osx server with some 10 external usb drives ranging from 2 to 4 tb. We are nearly running out of space on our 'working' synology and thus also on our 'backup' synology. I looked into expanding the synology devices with another 5 disks each. I read about gluster and it occured to me that gluster might provide a unified storage
12:28 participation volume accross all three servers and their hard drives. We are going to spend money on either a server to run gluster and a bunch of hard drives on or 2 synology expansion units with hard drives. The question is, can we use gluster in this manner, as a kind of overlay-filesystem on our devices/disks?
12:29 jporterfield joined #gluster
12:30 ndevos participation: hmm, Synology is a NAS right?
12:30 participation yes, that is correct
12:30 participation the units we have are 5 bay units set up in raid 5
12:31 participation Or actually, in synology parlance, synology hybrid raid
12:31 Norky I don't know if it is possible to have gluster use network filesystems as its 'bricks'. The normal way to run gluster is this: one or more gluster servers (usually Linux computers) with locally-attached storage that become 'bricks' and form the gluster volume which is then available to clients from all of the gluster servers
12:33 participation Yes, that I what I read and thought. But I was unable to find anything related to, well, resharing network mounts.
12:33 ndevos participation: in your case, you would want to run gluster on the NAS boxes themselves...
12:33 Norky Synology deivs I have seen are ARM based and don't have much memory
12:34 Norky devices*
12:34 Norky gluster can run ARM, but I dunno if such devices will have sufficient resources
12:34 ndevos participation: I'm pretty sure you can not use an NFS-export as brick, you could use iscsi devices provided by your NAS, if it supports iscsi
12:35 Norky ahh,, yeah, that might be an option
12:35 * ndevos runs his home archives on arm+gluster, but those dont need much performance or have many clients anyway
12:38 participation ndevos, I see. Unfortunatly, I think that synology locks the devices down somewhat and they are not very powerful. Ours have 3 gb ram but slow ARM processors. That is why I was thinking about running a much more powerful server that I would mount all the different mountpoints/drives/iscsi on and reshare this as a unified network filesystem.
12:38 participation ndevos, synology supports iscsi
12:39 participation ndevos we don't have many clients (including home use, 10 at most) but we do need performance as near to gigabit linespeed as possible
12:39 tdasilva joined #gluster
12:40 ndevos participation: a small server with access to the (xfs formatted) iscsi disks provided by a NAS sould work for you - but you dont really need gluster if you only have one server
12:41 ndevos participation: you can use iscsi, put LVM over them and format them and export it over nfs or cifs, the LVM volumegroup can get extended with more iscsi devices when you need to
12:42 participation ndevos, we have three that I would like to make use of. 2 synology devices with 5 drives each (plus a bunch of usb drives) and a mac osx server with again a bunch of usb drives attached. Currently some data is backed up / distributed but space on the one drive might be nearly full and space on the other might be nearly empty.
12:43 participation ndevos or am I misunderstanding what you meant?
12:44 nshaikh joined #gluster
12:45 ndevos participation: if all systems (also your mac osx server) can export disks (or a bigger raid drive) as iscsi, you have the option to use gluster on an other server, or ise the iscsi disks as physical volume for a big logical volume which you can export over nfs or cifs
12:46 micu joined #gluster
12:53 rastar joined #gluster
12:56 participation ndevos thanks a lot, that was exactly the answer that I was hoping / looking for. Now I need to figure out how 'big' my new to build gluster server needs to be. I was thinking about buying a dl585 g5 or something like that from ebay and filling it up with drives.
12:59 prasanth joined #gluster
13:04 ctria joined #gluster
13:06 vpshastry joined #gluster
13:08 ngoswami joined #gluster
13:17 benjamin_____ joined #gluster
13:17 rastar joined #gluster
13:25 wcchandler is there a way to force a redistribution of data?  i have 12 4tb bricks in a duplicated-replicate volume -- i have 2 drives at 100% use, 2 more at 97% then 2 at 9% and 2 at 3%.
13:27 wcchandler nevermind, it looks like it's a single "flat" file that's ~3.7tb on those volumes...
13:32 overclk_ joined #gluster
13:33 eclectic joined #gluster
13:33 aurigus joined #gluster
13:33 aurigus joined #gluster
13:35 orion7644 joined #gluster
13:35 FooBar_ joined #gluster
13:35 Rydekull_ joined #gluster
13:35 mkzero_ joined #gluster
13:35 johnmark_ joined #gluster
13:36 solid_liq joined #gluster
13:36 lava_ joined #gluster
13:36 saltsa_ joined #gluster
13:36 solid_liq joined #gluster
13:36 kanagaraj joined #gluster
13:41 jag3773 joined #gluster
13:42 B21956 joined #gluster
13:43 divbell joined #gluster
13:44 mibby joined #gluster
13:44 eseyman joined #gluster
13:45 codex joined #gluster
13:46 delhage joined #gluster
13:46 verdurin joined #gluster
13:46 NeatBasis joined #gluster
13:46 masterzen joined #gluster
13:46 johnmwilliams_ joined #gluster
13:47 tomased joined #gluster
13:47 georgeh|workstat joined #gluster
13:48 Norky participation, as an HP reseller, the DL5xx series are often more computer than you need, a DL3xx might be cheaper and have the same or greater number of drive slots
13:48 marcoceppi_ joined #gluster
13:48 tziOm joined #gluster
13:48 james joined #gluster
13:48 smithyuk1_ joined #gluster
13:50 [o__o] joined #gluster
13:50 Norky I'm wondering for your case with a single server if you even need gluster, aggregating disks/iSCSI disks with LVM and sharing with plain NFS/SMB might do the job as well...
13:52 aquagreen joined #gluster
13:53 jporterfield joined #gluster
13:56 chirino joined #gluster
13:57 jag3773 joined #gluster
14:02 mrfsl joined #gluster
14:04 bennyturns joined #gluster
14:09 davinder joined #gluster
14:10 cfeller joined #gluster
14:11 RameshN joined #gluster
14:14 sroy joined #gluster
14:15 mrfsl During a brick replace on Gluster 3.4.2 does self heal have to complete before the new replica disk is used? My concern is that there is too much change so that the self heal will actually never finish. Can anyone lend insight or additional insight into the self-heal process.
14:15 mrfsl ?
14:17 mrfsl My self-heal has been running for 21 days trying to replicate 3.23 TiB
14:19 ctria joined #gluster
14:20 dusmantkp_ joined #gluster
14:21 primechuck joined #gluster
14:30 theron joined #gluster
14:30 kkeithley joined #gluster
14:33 smithyuk1_ Just as an FYI, fixed the peer rejected/only localhost showing in rebalances. Had to run `gluster volume myvol reset` on all volumes (despite having nothing reconfigured) and it resolved all of the issues.
14:35 jag3773 joined #gluster
14:39 dusmantkp_ joined #gluster
14:39 liquidat joined #gluster
14:40 liquidat joined #gluster
14:49 REdOG joined #gluster
14:51 bugs_ joined #gluster
14:51 kaptk2 joined #gluster
14:57 rpowell joined #gluster
15:02 cp0k joined #gluster
15:10 cp0k joined #gluster
15:10 hagarth joined #gluster
15:11 kmai007 joined #gluster
15:13 kmai007 morning folks, whats on the agenda today/
15:13 gmcwhistler joined #gluster
15:17 aquagreen joined #gluster
15:19 ajha joined #gluster
15:21 gmcwhistler joined #gluster
15:23 saltsa joined #gluster
15:23 steve_d joined #gluster
15:25 cp0k joined #gluster
15:25 steve_d Anyone familiar with nfs.trusted-sync behaviour? specifically in a multinode replica cluster, does NFS send ack back to the client when data is received in memory on whichever host is providing nfs services, or does it send ack when that data has been replicated in memory of all the replica member nodes?
15:26 steve_d I suppose the question could also be, does the data have to be on disk of one of the nodes, before it is replicated to the other nodes?
15:28 gmcwhistler joined #gluster
15:29 lmickh joined #gluster
15:29 adam__ joined #gluster
15:32 glusterbot New news from resolvedglusterbugs: [Bug 1062674] Write is failing on a cifs mount with samba-4.1.3-2.fc20 + glusterfs samba vfs plugin <https://bugzilla.redhat.co​m/show_bug.cgi?id=1062674>
15:33 bala joined #gluster
15:38 ColPanik joined #gluster
15:39 jobewan joined #gluster
15:42 ColPanik Any thoughts on why glustershd wouldn't start on gluster restart or reboot of a machine?  I've inherited a 3.3.2 two brick replicated setup and can't seem to get the SHD to start
15:42 ColPanik Last activity from the logs was several months ago
15:42 kmai007 what packages are installed?
15:43 kmai007 i had the same issue last week and it was missing the newer glusterfs-libs
15:43 jobewan joined #gluster
15:43 kmai007 rpm -qa|grep glusterfs
15:44 ColPanik glusterfs-3.3.2-2.el6.x86_64
15:44 kmai007 there should be moer than just 1 package
15:44 kmai007 i could be wrong, i'm on glusterfs-3.4-2-1. and they have dependencies
15:46 ColPanik that's an interesting thing to look at, I can see from the bash history that the prev. admin had some problems installing (tried a couple different rpms, then added the epel)
15:46 kmai007 so which flavor linux are you running?
15:46 kmai007 rhel
15:46 ColPanik Centos
15:47 ColPanik (getting version, I'm more familiar with Ubuntu)
15:47 jmarley joined #gluster
15:47 ColPanik Centos 6.4
15:48 ColPanik it seems a little weird, because there was activity at some point
15:49 kmai007 i'm no expert, but just relaying what i encountered last week with glusterd not starting up
15:49 kmai007 looks like 3.3.2-2 has more than just 1 package to make glusterd run
15:49 kmai007 http://download.gluster.org/pub/gluster/glu​sterfs/3.3/3.3.2/RHEL/epel-6Server/x86_64/
15:49 kmai007 i believe you will need a total of 3 packages, glusterfs, glusterfs-server, glusterfs-fuse installed
15:49 glusterbot Title: Index of /pub/gluster/glusterfs/3.3/3.​3.2/RHEL/epel-6Server/x86_64 (at download.gluster.org)
15:49 ColPanik and I haven't found any docs that suggest that the SHD can be disabled
15:50 kmai007 SHD?
15:50 ColPanik SHD => self heal daemon
15:51 kmai007 it can
15:51 kmai007 as a set feature on the volume
15:51 ndevos ColPanik: I thought the SHD was introduced with 3.4, are you sure it is in 3.3?
15:51 * ndevos isn't sure
15:52 ColPanik pretty sure it was 3.3.. "Proactive self-healing – GlusterFS volumes will now automatically restore file integrity after a replica recovers from failure."
15:52 ColPanik http://www.gluster.org/community/docum​entation/index.php/About_GlusterFS_3.3
15:52 glusterbot Title: About GlusterFS 3.3 - GlusterDocumentation (at www.gluster.org)
15:53 ndevos ColPanik: ah, ok :)
15:54 kmai007 you can see all the features in the gluster cli
15:54 kmai007 gluster volume set help
15:55 kmai007 Option: cluster.self-heal-daemon
15:55 kmai007 Default Value: off
15:55 kmai007 Description: This option applies to only self-heal-daemon. Index directory crawl and automatic healing of fileswill not be performed if thi s option is turned off.
15:55 kmai007 but that is on 3.4.1 GA
15:55 ColPanik kmai007: yea, I actually did see that setting at some point.  I believe we had it set "on" in the configs, but I'm trying to dig up the file to look and confirm
15:57 ColPanik as a side question, there's not a way to make gluster print out current values for those volume options, is there?
15:58 kmai007 gluster volume info <volume>
15:58 kmai007 should list what features are on or off
15:59 kmai007 but to know the defaults, no, if its not listed, it wasn't modified
16:00 zwevans joined #gluster
16:00 japuzzo joined #gluster
16:01 NuxRo ndevos: cheers, will test when I get some time these days
16:02 ndevos NuxRo: do you need a 4.3 backport of that patch? I'm not sure it applies cleanly
16:05 NuxRo ndevos: I'd love a 4.3 backport of it as a matter of fact :)
16:05 vpshastry joined #gluster
16:05 NuxRo hopefully it will fix the unbootable vm issue that i have currently
16:05 ColPanik kmai007: ok, found it... "/var/lib/glusterd/gluster​shd/glustershd-server.vol" has a bunch of *-heal-* options that are all either "on" or "yes" in our setup
16:06 ColPanik 'option self-heal-daemon on' seems like the most relevant
16:06 ColPanik it's just strange because there is some log activity, just really old.  I'm not even seeing an indication that it's starting and then crashing
16:07 ndevos NuxRo: if the vm is not booting, you probably can workaround it by converting the qcow2 image to raw: mv orig orig.qcow2 ; qemu-img convert -O raw orig.qcow2 orig
16:08 NuxRo NuxRo: well, that's not good ... is there any proper fix in sight?
16:08 NuxRo lol, meant ndevos
16:08 ndevos NuxRo: with that patch, qemu tries to use any image on gluster as qcow2, but without it, it uses the raw-format
16:08 NuxRo talking to myself ...
16:08 kmai007 i've been told to never modify that .vol file explicitly
16:09 NuxRo ah roger, i want the qcow2 thingy
16:09 kmai007 b/c the same vol file needs to be known among all bricks
16:09 daMaestro joined #gluster
16:09 ndevos NuxRo: thats what the patch addresses :)
16:09 NuxRo lovely
16:09 kmai007 and then the client when mounts up the vol. also pulls that file into memory
16:09 * ndevos looks for hos 4.3 branch
16:09 NuxRo let me know when you have something that applies cleanly to 4.3
16:09 ndevos s/hos/his/
16:09 glusterbot What ndevos meant to say was: * ndevos looks for his 4.3 branch
16:10 NuxRo now i have a big itch to start using this in production, ndevos :)
16:10 ndevos NuxRo: that would be awesome!
16:11 mgebbe joined #gluster
16:15 NuxRo I'll definitely do some testing anyway, i really like the idea
16:15 NuxRo thanks a lot!
16:16 ColPanik kmai007: ok, it hasn't been modified AFAIK, just making the point that all of the stuff that looks related to the self healer looks to be enabled
16:16 kmai007 gotcha, can someone help me interpret this
16:17 kmai007 Please delete the file from all but the preferred subvolume.- Pending matrix:  [ [ 0 1 ] [ 2 0 ] ]
16:17 ColPanik if I try to do a "volume heal <vol> info" it just tells me that I should check the self healer's logs because it isn't running.  If I look at those, the last output is several months old
16:17 kmai007 "gluster volume status <volume>"
16:17 kmai007 what does that tell you
16:18 jag3773 joined #gluster
16:18 jbrooks joined #gluster
16:21 ColPanik Status of volume: xxxxx
16:21 ColPanik Gluster processPortOnlinePid
16:21 ColPanik ---------------------------------------​---------------------------------------
16:21 ColPanik Brick xxx01:/srv/xxx24011Y2194
16:21 ColPanik Brick xxx02:/srv/xxx24012Y2207
16:21 ColPanik NFS Server on localhost38467NN/A
16:21 ColPanik Self-heal Daemon on localhostN/ANN/A
16:21 ColPanik NFS Server on xxx0238467NN/A
16:21 ColPanik Self-heal Daemon xxx02N/ANN/A
16:21 ColPanik (xxx's replacing host/volume names)
16:22 ColPanik this is from server 01
16:23 Slash joined #gluster
16:23 theron joined #gluster
16:25 kmai007 strange, everything appears normal
16:25 pdrakeweb joined #gluster
16:26 kmai007 i guess to start from scratch, you can stop glusterd
16:26 kmai007 clear out your logs
16:26 kmai007 and restart glusterd to see
16:26 kmai007 only if things are not appearing to work
16:26 kmai007 i hate for you to stop it and not get it running again
16:27 VerboEse joined #gluster
16:28 ColPanik kmai007: yea, we had an incident a few days ago when we realized that the SHD wasn't running. We did several restarts of the gluster processes and the volumes, but haven't seen the SHD start once.
16:29 kmai007 so for your volume do you have "glusterd and glusterfsd" active PIDs ?
16:29 kmai007 on the brick
16:31 ColPanik I think so.  I see one glusterd process and then two additional processes which contain arguments suggesting they're from each of two volumes we're sharing out from gluster
16:32 ColPanik one thing we did notice what that it appears that the previous guy had added in two peers that never existed.  I think it might have been preemptive for hardware that he anticipated having at some point?  Anyway, those were showing up in the peer list as Disconnected
16:32 kmai007 oh
16:32 ColPanik I actually force removed those last night.  I wonder if that was messing up quorum
16:32 kmai007 gluster peer status
16:33 kmai007 shows what you need?
16:33 ndevos NuxRo: this backport compiles, not tested any further: https://forge.gluster.org/cloudstack-gl​uster/cloudstack/commit/babb0c21e64330a​a1b4fd2a386c1c79e872dd5c0?format=patch
16:33 ColPanik right now it does, but previously it showed two disconnected peers
16:34 kmai007 i wonder if the vol files are different now in your group.
16:34 kmai007 it shouldn't be if they show connected
16:34 kmai007 each brick displays the correct peers?
16:35 sprachgenerator joined #gluster
16:35 ColPanik let me double check
16:37 ColPanik yes, each of two servers are now showing each other correctly as peers.  But before last night, they were showing up as having three peers, which included the 'preemptive nodes' that I referenced above.
16:37 samppah @glossary
16:37 glusterbot samppah: A "server" hosts "bricks" (ie. server1:/foo) which belong to a "volume"  which is accessed from a "client"  . The "master" geosynchronizes a "volume" to a "slave" (ie. remote1:/data/foo).
16:38 ColPanik and nothing has been restarted since I did that
16:39 ColPanik so, maybe that's the next thing to try.  I didn't think quorum would be an issue until 3.4, but I think I did see some documentation that mentioned quorum is a thing in 3.3 as well
16:39 kmai007 fuser /var/log/glusterfs/glusterfs/glustershd.log does that have an active PID?
16:44 vpshastry left #gluster
16:47 rpowell1 joined #gluster
16:54 Matthaeus joined #gluster
16:56 ikk joined #gluster
17:03 T0aD joined #gluster
17:04 eryc joined #gluster
17:06 crazifyngers joined #gluster
17:06 harish joined #gluster
17:06 Norky joined #gluster
17:06 cjanbanan joined #gluster
17:06 keytab joined #gluster
17:06 Philambdo joined #gluster
17:06 cyberbootje joined #gluster
17:06 haomaiwa_ joined #gluster
17:06 XpineX joined #gluster
17:06 ultrabizweb joined #gluster
17:06 m0zes joined #gluster
17:06 NuxRo joined #gluster
17:06 wgao_ joined #gluster
17:06 semiosis joined #gluster
17:06 RobertLaptop joined #gluster
17:06 JordanHackworth joined #gluster
17:06 overclk joined #gluster
17:06 divbell_ joined #gluster
17:06 sprachgenerator joined #gluster
17:06 pdrakeweb joined #gluster
17:06 jbrooks joined #gluster
17:06 zwevans joined #gluster
17:06 jmarley joined #gluster
17:06 ColPanik joined #gluster
17:06 kaptk2 joined #gluster
17:06 bugs_ joined #gluster
17:06 liquidat joined #gluster
17:06 kkeithley joined #gluster
17:06 primechuck joined #gluster
17:06 sroy joined #gluster
17:06 davinder joined #gluster
17:06 smithyuk1_ joined #gluster
17:06 Guest39322 joined #gluster
17:06 tziOm joined #gluster
17:06 marcoceppi_ joined #gluster
17:06 tomased joined #gluster
17:06 codex joined #gluster
17:06 B21956 joined #gluster
17:06 lava joined #gluster
17:06 aurigus joined #gluster
17:06 xavih joined #gluster
17:06 romero_ joined #gluster
17:06 vincent_1dk joined #gluster
17:06 purpleidea joined #gluster
17:06 badone joined #gluster
17:06 mtanner_ joined #gluster
17:06 saltsa joined #gluster
17:09 ColPanik kmai007: not sure what just happened with the massive disconnect, but did you see those replies I sent?
17:10 rpowell joined #gluster
17:10 ProT-0-TypE joined #gluster
17:10 johnmwilliams__ joined #gluster
17:11 KyleG1 joined #gluster
17:11 KyleG joined #gluster
17:12 KyleG joined #gluster
17:14 zerick joined #gluster
17:15 chirino joined #gluster
17:15 kmai007 nothing
17:16 ColPanik k, will repeat
17:17 ColPanik re: your fuser command suggestion.
17:17 ColPanik I ran it once and got nothing back
17:17 ColPanik however, I ran it a few more times in rapid succession and got this:
17:17 masterzen joined #gluster
17:17 ColPanik Cannot stat file /proc/2194/fd/225: No such file or directory
17:17 ColPanik then a few more times and got nothing
17:17 ColPanik then got "Cannot stat file /proc/2194/fd/199: No such file or directory"
17:17 ColPanik then nothing again
17:17 ColPanik # ps -A | grep 2194
17:17 ColPanik 2194 ?        14-11:56:54 glusterfsd
17:18 ColPanik also, the last thing in the SHD log is:
17:18 Guest39322 left #gluster
17:18 ColPanik [2013-08-10 10:10:33.672795] E [rpcsvc.c:1155:rpcsvc_program_unregister_portmap] 0-rpc-service: Could not unregister with portmap
17:18 kmai007 man that log is old
17:18 ColPanik yea, right?
17:19 kmai007 yeh so the process that wrote to the log use to be something else
17:19 kmai007 i had issues with the logrotate.d/glusterfs script
17:19 kmai007 so i created my own
17:20 kmai007 but here is what i get when i fuser my logfile
17:20 kmai007 fuser /var/log/glusterfs/glustershd.log
17:20 kmai007 [root@omdx1b50 ~]# ps -ef|grep 46448
17:20 kmai007 root     36935 36913  0 11:20 pts/0    00:00:00 grep 46448
17:20 kmai007 root     46448     1  4 Feb20 ?        03:36:14 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/2d14fce83d6140d3dc064f7b6b94e7ed.socket --xlator-option *replicate*.node-uuid=1636dc6​6-d726-46ec-958b-8a5746477601
17:20 kmai007 [root@omdx1b50 ~]#
17:21 kmai007 do you get all that in your output?
17:24 ColPanik no, but I don't think I would because nothing is consistently accessing glustershd.log, so no pid to feed into the grep of ps.  Separately, I know there's no glustershd process running
17:24 ColPanik # ps -efH | grep shd
17:24 ColPanik root      1750     1  0 Feb18 ?        00:00:00   /usr/sbin/sshd
17:24 ColPanik root     22162  1750  0 10:40 ?        00:00:00     sshd: root@pts/0
17:24 ColPanik root     22594 22166  0 12:24 pts/0    00:00:00         grep shd
17:27 kmai007 i do not believe there is a glustershd process
17:27 kmai007 i just know of 2
17:27 kmai007 glusterd glusterfsd
17:27 kmai007 the heal daemon i dont believe gets an active PID
17:29 ColPanik gotcha.  The grep I pasted should still catch anything with the string glustershd, so I don't think any processes are running
17:31 kmai007 i suppose whats next is to reboot 1 brick
17:31 kmai007 and see if it starts up
17:32 kmai007 is it chkconfig on? to start after reboot?
17:32 KyleG left #gluster
17:33 ColPanik kmai007: can you suggest a way to check the "chkconfig on"?
17:33 orion7644 chkconfig --list
17:34 kmai007 chkconfig --list|grep gluster
17:34 Mo_ joined #gluster
17:34 ColPanik # chkconfig --list | grep gluster
17:34 ColPanik glusterd       0:off1:off2:off3:off4:off5:off6:off
17:34 ColPanik glusterfsd     0:off1:off2:off3:off4:off5:off6:off
17:35 ColPanik but we do start manually with a separate scrit
17:35 ColPanik script
17:36 kmai007 i see
17:36 kmai007 well i have glusterd on, and it will then proceed to execute glusterfsd as needed
17:36 kmai007 on reboot
17:37 kmai007 sounds like you've looked everywhere, maybe its time to reboot the brick if you're confident
17:37 semiosis ,,(processes)
17:37 glusterbot The GlusterFS core uses three process names: glusterd (management daemon, one per server); glusterfsd (brick export daemon, one per brick); glusterfs (FUSE client, one per client mount point; also NFS daemon, one per server). There are also two auxiliary processes: gsyncd (for geo-replication) and glustershd (for automatic self-heal). See http://goo.gl/F6jqx for more information.
17:38 semiosis ColPanik: whats the problem?
17:38 sputnik13 joined #gluster
17:42 cfeller joined #gluster
17:42 ColPanik semiosis: in short, our self healing daemon hasn't been running for a really long time.  I haven't found any evidence that it's explicitly disabled, and also don't see any evidence that it's starting and dying.  A few of us inherited this setup, so it's kind of a forensics job
17:43 refrainblue joined #gluster
17:43 ColPanik kmai007: thanks, I really appreciate all of your feedback.  It's possible that pulling those disconnected peers out will change something on this reboot
17:43 kmai007 np, goodluck
17:44 daMaestro joined #gluster
17:44 lmickh joined #gluster
17:44 theron joined #gluster
17:44 aquagreen joined #gluster
17:44 georgeh|workstat joined #gluster
17:44 mibby joined #gluster
17:44 DV__ joined #gluster
17:44 hybrid512 joined #gluster
17:44 hchiramm__ joined #gluster
17:44 qdk joined #gluster
17:44 P0w3r3d joined #gluster
17:44 Slasheri joined #gluster
17:44 ColPanik semiosis: we're running 3.3.2
17:45 semiosis ColPanik: afaik restarting glusterd should cause it to launch the missing shd process
17:45 semiosis then if the shd is still missing, there should be something about that in the glusterd log file (etc-glusterfs-glusterd.log) or the shd log
17:46 ikk joined #gluster
17:47 ColPanik yea, the last thing from shd's log was "[2013-08-10 10:10:33.672795] E [rpcsvc.c:1155:rpcsvc_program_unregister_portmap] 0-rpc-service: Could not unregister with portmap"
17:47 ColPanik obviously from a very long time ago
17:47 JoeJulian Is /var/log full?
17:47 semiosis restart portmap?
17:48 ColPanik JoeJulian: /var/log is not full
17:49 ColPanik semiosis: Is there a good way to determine if portmap is running before I try a restart?
17:49 kmai007 service portmap status
17:50 kmai007 nope thats not it
17:50 ColPanik # service portmap status
17:50 ColPanik portmap: unrecognized service
17:50 bugs_ sometimes portmap is named rpcinfo
17:51 kmai007 rpcinfo -p
17:52 kmai007 looks like its port 111
17:52 ColPanik # rpcinfo -p
17:52 ColPanik program vers proto   port  service
17:52 ColPanik 100000    4   tcp    111  portmapper
17:52 ColPanik 100000    3   tcp    111  portmapper
17:52 ColPanik 100000    2   tcp    111  portmapper
17:52 ColPanik 100000    4   udp    111  portmapper
17:52 ColPanik 100000    3   udp    111  portmapper
17:52 ColPanik 100000    2   udp    111  portmapper
17:52 kmai007 dont paste too much else you'll get da boot
17:52 ColPanik *sorry* such a noob :(
17:52 ColPanik thank you
17:52 kmai007 pastie
17:52 kmai007 i believe, that what i was warned before
17:53 kmai007 no i'm still the noob
17:53 kmai007 http://fpaste.org/ is what i use
17:53 glusterbot Title: New paste Fedora Project Pastebin (at fpaste.org)
17:53 kmai007 pastie, for some reason my proxy blocks it, must the name intended for other usage...LOL
17:53 ColPanik hahaha
17:54 marcoceppi_ kmai007: try http://paste.ubuntu.com ?
17:54 glusterbot Title: Ubuntu Pastebin (at paste.ubuntu.com)
17:55 ColPanik semiosis: do you see anything weird with portmapper/rpcinfo?  I'd seen it referenced, but only in connection with the NFS share.
17:55 kmai007 thanks marcoceppi_
17:58 semiosis ,,(paste)
17:58 glusterbot For RPM based distros you can yum install fpaste, for debian and ubuntu it's pastebinit. Then you can easily pipe command output to [f] paste [binit] and it'll give you a URL.
17:59 semiosis hi marcoceppi_
17:59 marcoceppi_ o/ semiosis
18:00 semiosis so feature freeze is upon us.  any idea if the glusterfs MIR will/did make it in?
18:00 marcoceppi_ semiosis: no idea, but was a MIR opened?
18:00 * marcoceppi_ checks lp
18:00 semiosis https://blueprints.launchpad.net/ubun​tu/+spec/servercloud-p-glusterfs-mir
18:00 glusterbot Title: GlusterFS MIR : Blueprints : Ubuntu (at blueprints.launchpad.net)
18:00 semiosis that mir has been open since before precise :)
18:00 semiosis https://bugs.launchpad.net/ubunt​u/+source/glusterfs/+bug/1274247
18:00 glusterbot Title: Bug #1274247 “[MIR] Glusterfs” : Bugs : “glusterfs” package : Ubuntu (at bugs.launchpad.net)
18:00 marcoceppi_ semiosis: it looks like 3.4.2 is in trusty
18:01 semiosis i put it in universe
18:01 semiosis http://packages.ubuntu.com/search?key​words=glusterfs&amp;searchon=sourcena​mes&amp;suite=trusty&amp;section=main - no results
18:01 glusterbot Title: Ubuntu – Package Search Results -- glusterfs (at packages.ubuntu.com)
18:01 semiosis people want it in main so that qemu will be built with glusterfs support
18:01 marcoceppi_ semiosis: I'll ping james page about it
18:02 semiosis thanks.  i've done that one a week for the last two or three :)
18:02 marcoceppi_ in order to be in main it's got to go through a security review, which takes a while
18:02 semiosis https://bugs.launchpad.net/​cloud-archive/+bug/1246924
18:02 glusterbot Title: Bug #1246924 “qemu not built with GlusterFS support” : Bugs : ubuntu-cloud-archive (at bugs.launchpad.net)
18:02 semiosis see those last two LP bugs i linked
18:03 semiosis afaict it's waiting for a security review
18:03 semiosis no indication if that's begun
18:03 kmai007 semiosis: does the volume heal <vol> info healed output ever roll off?
18:03 kmai007 is it every 10 mins when the process kicks off?
18:03 semiosis kmai007: idk
18:04 kmai007 semiosis: sir, does the heal output roll off?
18:04 kmai007 i.e. heal info healed
18:04 semiosis s/idk/i dont know/
18:04 glusterbot What semiosis meant to say was: kmai007: i dont know
18:05 kmai007 iok
18:05 semiosis marcoceppi_: i made a ppa for qemu with gluster, for the impatient. but would be nice to have it in trusty
18:05 kmai007 s/iok/its ok
18:06 semiosis hehe
18:06 marcoceppi_ semiosis: most definitely
18:06 kmai007 dang i don't have power
18:06 marcoceppi_ it'd make about a million things easier
18:06 semiosis @ppa
18:06 glusterbot semiosis: The official glusterfs packages for Ubuntu are available here: 3.4 stable: http://goo.gl/u33hy -- 3.5 QA: http://goo.gl/Odj95k -- introducing QEMU with GlusterFS 3.4 support: http://goo.gl/7I8WN4
18:16 daMaestro joined #gluster
18:16 marcoceppi_ semiosis: http://i.imgur.com/BJOCxpG.gif
18:16 semiosis lmao
18:17 semiosis marcoceppi_: will you by chance be in SF for devnation (or redhat summit)?
18:18 marcoceppi_ semiosis: no, I won't, not sure if anyone else is going though
18:19 semiosis ok
18:20 plarsen joined #gluster
18:26 nueces joined #gluster
18:33 rossi_ joined #gluster
18:40 aquagreen joined #gluster
18:43 flrichar joined #gluster
18:44 jporterfield_ joined #gluster
18:46 ctria joined #gluster
18:47 cjanbanan joined #gluster
18:50 kmai007 please advise me, when I reboot a brick, is there a time i should wait before rebooting the next brick?
18:50 kmai007 or can i just move on to the next brick once glusterd is back up
18:52 psyl0n joined #gluster
18:58 tdasilva left #gluster
19:04 hchiramm__ joined #gluster
19:06 nueces joined #gluster
19:07 psyl0n joined #gluster
19:17 JoeJulian kmai007: When heal info is empty.
19:25 larsks joined #gluster
19:27 larsks I am getting "permission denied" from a file on a gluster (fuse) mount that seems unrelated to file permissions.  If I kill the process that had it open (a qemu instance), it suddenly starts working.
19:27 larsks This is the backing store to an instance in openstack that was migrated from one compute host to another.
19:30 psyl0n joined #gluster
19:31 sputnik13 joined #gluster
19:40 purpleidea joined #gluster
19:40 purpleidea joined #gluster
19:40 kmai007 thanks JoeJulian
19:46 andreask joined #gluster
19:49 sputnik13 joined #gluster
19:58 jurrien joined #gluster
19:59 redbeard joined #gluster
20:01 ColPanik One (of many) thing that I'm a unclear on is portmapp/rpcbind and whether it's only necessary for nfs.  Are there any good docs on verifying the proper setup?
20:01 ColPanik most of the stuff I've seen is just howto's saying that "it needs to be installed so you can run NFS"
20:02 kmai007 are you wanting to mount via NFS?
20:02 ColPanik no, not presently.  Just tugging at threads that might help explain the weird SHD behavior
20:03 kmai007 if you can mount a volume from another server as NFS, then you have nothing to worry
20:04 ColPanik I'll give that a shot, out of curiosity, do you mind sharing your "rpcinfo -p" results?
20:05 kmai007 ok
20:05 kmai007 rpcinfo -p
20:05 kmai007 program vers proto   port  service
20:05 kmai007 100000    4   tcp    111  portmapper
20:05 kmai007 100000    3   tcp    111  portmapper
20:05 kmai007 100000    2   tcp    111  portmapper
20:05 kmai007 100000    4   udp    111  portmapper
20:05 kmai007 100000    3   udp    111  portmapper
20:05 kmai007 100000    2   udp    111  portmapper
20:05 kmai007 100227    3   tcp   2049  nfs_acl
20:05 kmai007 100005    3   tcp  38465  mountd
20:05 kmai007 100005    1   tcp  38466  mountd
20:05 kmai007 100003    3   tcp   2049  nfs
20:06 ColPanik and it looks like you're actively using nfs?
20:06 orion7612 joined #gluster
20:06 kmai007 i don't remember, i think so, just for testing purposes, but the preferred is FUSE
20:06 kmai007 mounting
20:18 ColPanik ok, so I just tried mounting from my local machine.  I confirmed ping to the gluster box (and I'm also ssh'd into it).  NFS mounting fails with 'no route to host'.  Probably makes sense because the NFS daemon is not running according to gluster volume status: NFS Server on localhost 38467 N N/A
20:18 ColPanik last nfs log entries on this box are from the same day as the SHD's last log messages
20:19 kmai007 glusterNFS is different then NFS
20:19 ColPanik nfs not running isn't a primary concern, unless it's due to the same issue that's keeping the SHD down
20:19 kmai007 if you have a port and pid from 'gluster volume status <vol>'
20:19 andreask joined #gluster
20:19 kmai007 then you should be able to mount that volume as NFS from any client capable of NFS
20:20 ColPanik yea, that's what the output is from.  There's a port, but the Online colume is 'N' and the Pid column is 'N/A'
20:20 kmai007 what is your mount command on your client?
20:21 ColPanik sudo mount -vv -t nfs SERVER01:DATA DATA
20:22 ColPanik mount.nfs: mount(2): No route to host
20:22 ColPanik mount.nfs: trying text-based options 'vers=4,addr=XXX.XXX.XXX.XXX​,clientaddr=XXX.XXX.XXX.XXX'
20:25 kmai007 i suppose on your client you dont have firewall rules do you? is there anything in /etc/hosts.allow
20:26 ColPanik nope, all commented out
20:27 Mo_ joined #gluster
20:27 larsks kmai007: IANAGE ("...not a gluster expert"), but -- I think that if Gluster says the NFS server is offline, it's not a client or a firewall problem.
20:27 kmai007 i didn't read his output
20:28 kmai007 it rolled off my buffer
20:28 kmai007 i wonder if my session on irc is flaky,i'm not getting replies quickly, but i see disconnects in abundance.
20:29 kmai007 joined #gluster
20:29 hagarth joined #gluster
20:29 ColPanik yea, and it also seems that shd and nfs were last active on the same day
20:29 ColPanik so, seems like they could be related
20:30 ColPanik but IAANAGE ;)
20:30 kmai007 so is your goal to mount your volumes, or possibly fix a disabled SHD ?
20:31 kmai007 just making sure i dont' misunderstand
20:31 ColPanik definitely getting the SHD back online, but also understanding why it's been down for so long
20:32 ColPanik just seems like one of those things that shouldn't go unchecked for too long
20:32 kmai007 so you have a 2 bricks ?
20:32 ColPanik yea
20:32 kmai007 each of them will not output 'gluster volume heal <vol> info' information?
20:33 ColPanik correct, they say that the self healer isn't running, and to check it's (shd's) logs
20:33 cp0k joined #gluster
20:33 kmai007 and you've rebooted both bricks?
20:34 ColPanik a few days ago, yes
20:34 kmai007 try on both bricks
20:34 kmai007 service glusterd restart
20:34 kmai007 and see if you get some heal info then
20:35 cp0k JoeJulian: after a few hours of work, I am now at 0% split brain :D
20:36 cp0k JoeJulian: next fun part is to add new bricks in
20:37 kmai007 cp0k: congrats, my fav. exercise is to find the inode from the gfid:<@#$@$@#$@$#@$#$@$@$@#@$$>
20:37 kmai007 i'm goofy like that
20:37 cp0k JoeJulian: after adding the new bricks, should I first issue a fix layout command and then migrate the existing data?
20:37 cp0k or just do both via http://gluster.org/community/documentation​/index.php/Gluster_3.2:_Rebalancing_Volume​_to_Fix_Layout_and_Migrate_Existing_Data ?
20:37 glusterbot Title: Gluster 3.2: Rebalancing Volume to Fix Layout and Migrate Existing Data - GlusterDocumentation (at gluster.org)
20:37 cp0k kmai007: thats a funny lookin gfid :)
20:38 _dist joined #gluster
20:38 cp0k kmai007: I just rm'd all the gfid's showing up in my split brain and that fixed me up
20:38 kmai007 yes, it is, that is my expression, after cleaning up that split-pea state
20:38 cp0k hehe
20:39 kmai007 i was so scurred, i want to ensure i removed it all, though i found it was repeative in the split-output per brick,
20:39 cp0k what version of Gluster are you running?
20:39 kmai007 glusterfs 3.4.1
20:39 kmai007 going to update it soon
20:39 kmai007 finally just got all my clients on 3.4.2-1
20:39 cp0k kmai007: yep, tell me about it, piping 'sort -u' to the list of files allowed me to shrink the output of info split-brain down by ALOT!
20:40 _dist semiosis: Just got your message, I'll give your build a try later tonight on my home test machine (likely in a nested setup, but that'll be fine)
20:40 cp0k I just recently upgraded to 3.4.2
20:40 kmai007 so that brings me to this question that i dont' got answered
20:40 cp0k JoeJulian mentioned yesterday that he will be publishing a post today on split-brain :)
20:41 kmai007 the heal info "healed|heal-failed|split-brain"
20:41 kmai007 does it every roll off, or only when glusterd is restarted?
20:41 kmai007 didn't*
20:41 cp0k kmai007: info split-brain only resets after restarting glusterd on the storage node
20:42 cp0k kmai007: "gluster volume heal volname info" updates on the fly :)
20:42 kmai007 no wonder, i was chizzlin' off that mount of split-brain, and i never saw it shrink, until i restarted glusterd
20:42 kmai007 mountain*
20:42 cp0k kmai007: yea, I was in the same spot...JoeJulian helped by suggesting to restart glusterd
20:43 cp0k kmai007: do you know how you even got your data in a split-brain state? cause I sure hells don't
20:43 kmai007 so only a few more times of manual labor before i feel adventurous to script all this......<----LAZY
20:43 cjanbanan joined #gluster
20:44 kmai007 my hunch how it split, when my clients recently disconnects at like 2AM, in the FUSE logs, and there is tons of messages about
20:44 larsks So, I'm having some odd permissions problems on a gluster volume.  The fileystem reports "permission denied" for files that should be globally readable...and starts working as soon as the process with the file open stops.  Any takers?
20:44 kmai007 dht- no subvolumes found = -1
20:45 cp0k kmai007: so you are thinking that your client prematurely disconnected while it was writting the files?
20:45 kmai007 larsks: did you check to see in the client /var/log/glusterfs/<mnt>.log and see if there is addtional details?
20:45 cp0k kmai007: causing gluster to get confused ?
20:46 kmai007 cp0k: yes, i see it logging disconnected, but never reconnected....
20:46 larsks kmai007: It confirms a permission denied error: [2014-02-24 20:45:04.522814] W [fuse-bridge.c:915:fuse_fd_cbk] 0-glusterfs-fuse: 270: OPEN() /f1b6b7d9-1afc-49dc-b7e5-80b0fd33bb50/disk => -1 (Permission denied)
20:46 kmai007 so to fix it, i unmount/remount it, and its all great, but it leaves my 4 bricks in a confused state
20:47 kmai007 do you see the file in the brick and in the client?
20:47 kmai007 i'm no expert, just somebody who is ambitious about solving his gluster problems
20:47 larsks kmai007: Yes, it's in both places. I can read it directly from the brick directory, *and* I can read it on another client.  The problem is only on the client that has it open.
20:47 larsks ...and it's not selinux :)
20:47 kmai007 so how do you get out of that funk?  unmount /remount ?
20:48 kmai007 does that fix it for that client?
20:48 larsks This is the backing file for a libvirt instance.  If I "virsh destroy" that instance and then "virsh start" it, everything works from that point on.
20:48 larsks No touching the filesystem required.
20:48 kmai007 yeh man im not adventurous to run kvms off gluster yet, just using it as storage
20:49 kmai007 sorry
20:49 larsks No worries.
20:49 larsks I keep hoping one of the gluster devs will poke their head into the conversation :).
20:50 kmai007 its this and the gluster newsletter, to be resourceful
20:50 kmai007 give them a shout, im sure somebody on earth can answer
20:51 kmai007 cp0k: yo
20:51 kmai007 cp0k: volume heal vol. info heal-failed, did that require you to take any action? or just focus on split-brain ?
20:53 cp0k kmai007: I just focused on the split-brain
20:56 kmai007 thanks
21:03 hagarth joined #gluster
21:05 mkzero joined #gluster
21:06 mattappe_ joined #gluster
21:09 rossi_ joined #gluster
21:13 kmai007 can someone describe how backupvol-server works in fstab?
21:14 semiosis _dist: welcome back
21:15 sputnik13 joined #gluster
21:15 _dist semiosis: thanks :) I've been getting all my stuff migrated over to kvm on my gluster replicate. Up till 3 last night working on it (downtime of systems has an inverse relationship to humans)
21:16 REdOG joined #gluster
21:18 cp0k fellas, tomorrow I am going to be adding brand new bricks to my gluster 3.4.2 setup....after doing so, would it be better to combine the fix layout and data migration all in one step? or safer to do it as a two step process?
21:18 cp0k http://gluster.org/community/documentation/​index.php/Gluster_3.2:_Rebalancing_Volumes
21:18 glusterbot Title: Gluster 3.2: Rebalancing Volumes - GlusterDocumentation (at gluster.org)
21:24 RayS joined #gluster
21:57 cp0k anyone?
21:59 fidevo joined #gluster
22:01 ultrabizweb yes
22:02 ultrabizweb oh I just saw your question so I am ne to glusterfs so I don't know the answer to your question sorry.
22:03 ultrabizweb new to glusterfs
22:04 cjh973 joined #gluster
22:04 cjh973 does anyone know if i can talk s3 to the swift interface for gluster?  i couldn't seen to find this searching around
22:04 rossi_ cp0k: I think i read here that the two step process is the 'old' method and the one step process is the 'new' method
22:05 rossi_ cp0k: I could imagine that it is OK to first fix-layout and when the gluster volume is less busy do the rebalance.
22:06 rossi_ cp0k: never tried single step, only the fix-layout then rebanalance  on a non-replicated distributed volume
22:07 semiosis rebanalance?  i like it!
22:20 psyl0n joined #gluster
22:25 cp0k sounds good, thanks. I am wondering if there is any harm in doing the steps separately so that gluster does not trip over itself
22:32 rpowell joined #gluster
22:38 psyl0n joined #gluster
22:49 ikk joined #gluster
22:50 mrfsl left #gluster
22:53 cp0k any advantages to doing single step over the two-step of fix layout and rebalance? is one way safer over the other?
22:57 qdk joined #gluster
22:59 ColPanik kmai007: We're going to do a little testing in a non-production environment to see if we can recreate the conditions.  I'll try to make it back and post details when I know more
23:00 ColPanik thanks very much for all your help, everyone
23:02 semiosis Oh!  I get it, you're Colonel Panik
23:02 semiosis too late
23:10 japuzzo joined #gluster
23:15 jiqiren joined #gluster
23:21 jiqiren am i entering a world of hurt upgrading from gluster 3.3.0 to 3.4.2 ?
23:28 tg2_ joined #gluster
23:48 hagarth joined #gluster

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary