Camelia, the Perl 6 bug

IRC log for #gluster, 2013-06-11

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
01:06 jag3773 joined #gluster
01:18 semiosis @later tell realdannys1 sorry i got pulled afk in the middle of trying to help
01:18 glusterbot semiosis: The operation succeeded.
01:19 majeff joined #gluster
01:21 puebele joined #gluster
01:23 harish joined #gluster
01:35 hagarth joined #gluster
01:45 bala joined #gluster
01:51 plarsen joined #gluster
01:52 majeff1 joined #gluster
01:54 bala joined #gluster
01:58 portante joined #gluster
01:58 joelwallis joined #gluster
02:10 harish joined #gluster
02:24 hjmangalam1 joined #gluster
02:55 kevein joined #gluster
03:11 majeff joined #gluster
03:11 bharata joined #gluster
03:32 mohankumar joined #gluster
03:46 kevein_ joined #gluster
03:56 nueces joined #gluster
03:59 shylesh joined #gluster
04:07 hjmangalam1 joined #gluster
04:14 sgowda joined #gluster
04:32 lalatenduM joined #gluster
04:34 ngoswami joined #gluster
04:43 hjmangalam1 joined #gluster
04:46 DarkestMatter joined #gluster
04:46 ccha joined #gluster
04:48 hchiramm__ joined #gluster
04:52 majeff joined #gluster
04:53 lalatenduM joined #gluster
04:53 vpshastry joined #gluster
05:05 psharma joined #gluster
05:10 deepakcs joined #gluster
05:18 hchiramm__ joined #gluster
05:31 majeff1 joined #gluster
05:37 satheesh joined #gluster
05:45 shireesh joined #gluster
05:46 shireesh_ joined #gluster
05:47 aravindavk joined #gluster
05:47 bala joined #gluster
05:51 raghu joined #gluster
05:58 nightwalk joined #gluster
06:06 Guest64643 joined #gluster
06:06 pkoro joined #gluster
06:09 hchiramm__ joined #gluster
06:11 rgustafs joined #gluster
06:18 vimal joined #gluster
06:22 jtux joined #gluster
06:26 majeff joined #gluster
06:27 sohoo joined #gluster
06:28 hchiramm__ joined #gluster
06:33 StarBeast joined #gluster
06:39 glusterbot New news from newglusterbugs: [Bug 844584] logging: Stale NFS messages <http://goo.gl/z72b6>
06:45 majeff joined #gluster
06:45 ricky-ticky joined #gluster
06:50 ekuric joined #gluster
06:52 john1700 joined #gluster
06:55 guigui1 joined #gluster
07:03 satheesh joined #gluster
07:12 shylesh joined #gluster
07:13 sgowda joined #gluster
07:15 hchiramm__ joined #gluster
07:16 abyss^_ Hi, I get smth like this: split brain found, aborting selfheal of /backups/app1/backup_env.inc. How I can repair this? If I understand I should remove the bad file... But how to do it correct?
07:16 abyss^ I have gluster 3.2
07:17 hybrid512 joined #gluster
07:23 majeff joined #gluster
07:24 manik joined #gluster
07:28 rotbeard joined #gluster
07:33 CheRi joined #gluster
07:35 shylesh joined #gluster
07:36 CheRi joined #gluster
07:37 sgowda joined #gluster
07:54 koubas joined #gluster
08:04 spider_fingers joined #gluster
08:10 rgustafs joined #gluster
08:11 dobber_ joined #gluster
08:11 Guest64643 joined #gluster
08:14 rb2k joined #gluster
08:31 hchiramm__ joined #gluster
08:37 atrius joined #gluster
08:52 Hchl joined #gluster
08:52 jbrooks joined #gluster
09:01 manik joined #gluster
09:07 CheRi joined #gluster
09:11 harish joined #gluster
09:11 tziOm joined #gluster
09:14 mustafa joined #gluster
09:15 ujjain joined #gluster
09:15 Hchl joined #gluster
09:34 Hchl joined #gluster
09:34 ramkrsna joined #gluster
09:34 ramkrsna joined #gluster
09:57 realdannys1 joined #gluster
09:57 realdannys1 Did I miss anyones answer to this? sorry - http://pastie.org/8031926
09:57 glusterbot Title: #8031926 - Pastie (at pastie.org)
10:06 jbrooks joined #gluster
10:09 ctria joined #gluster
10:19 dxd828 joined #gluster
10:31 ngoswami joined #gluster
10:34 manik joined #gluster
10:35 hagarth joined #gluster
10:40 mooperd joined #gluster
10:55 Hchl joined #gluster
10:56 realdannys1 Can anyone help with this? Full question and status so far here - http://pastie.org/private/bnfuo6qtwlxyjjsxaz55da
10:56 glusterbot <http://goo.gl/Yc8TB> (at pastie.org)
10:57 tziOm joined #gluster
11:16 Hchl joined #gluster
11:18 social__ how is it with gluster volumes
11:18 social__ I can call gluster volume create on any peer I want but is there some sort of locking so I can put it into puppet? Or should I always select one node as master and run commands only from it?
11:24 tziOm Is any work beeing done on small file (stat) performance of glusterfs?
11:26 tziOm compared to "normal" nfs, cached nfs reads of 100 20-50k k files takes ~65ms with gluster (nfs) but only 2ms with kernel nfs server...
11:26 tziOm that is > 30 times slower
11:36 Hchl joined #gluster
11:47 puebele1 joined #gluster
11:48 tziOm Does it exist any documentation on setting/configuring translators in cli?
11:49 rb2k hmm, I seem to be in a place where I can't remove a brick, no matter what I try
11:49 rb2k https://gist.github.com/rb2k/2b566d3​2b7eb821c74a3/raw/a78987e803b789615c​1434a37b71edf3228313ed/gistfile1.txt
11:49 glusterbot <http://goo.gl/prTk9> (at gist.github.com)
11:50 rb2k with all of the options, I get a different error
11:51 satheesh joined #gluster
11:51 chirino joined #gluster
12:01 kkeithley1 joined #gluster
12:02 rotbeard joined #gluster
12:02 kkeithley1 nick kkeithley_
12:02 edward1 joined #gluster
12:05 bulde joined #gluster
12:19 joelwallis joined #gluster
12:21 vpshastry joined #gluster
12:21 vpshastry left #gluster
12:25 realdannys1 Is there a forum to post questions to?
12:26 realdannys1 I'm not sure I'm ever going to get a volume successfully created otherwise :(
12:27 mohankumar joined #gluster
12:39 hagarth joined #gluster
12:40 glusterbot New news from newglusterbugs: [Bug 973183] Network down an up on one brick cause self-healing won't work until glusterd restart <http://goo.gl/w2yKX>
12:45 mynameisbruce joined #gluster
12:45 kkeithley_ this is the place, but I suspect that many of the regular volunteer good doobies are are Red Hat Summit this week.
12:45 realdannys1 ahh ok
12:46 realdannys1 Maybe i'll try again with different linux distro on the EC2 instance
12:46 joelwallis joined #gluster
12:46 shireesh_ joined #gluster
12:46 kkeithley_ and/or it's still a bit early, especially for the ones on the left coast
12:48 kkeithley_ You're using Amazon instances right? What was your answer to firewall ports open?
12:50 kkeithley_ which Linux dist did you use?
12:53 rastar joined #gluster
12:53 realdannys1 Yes thats right, if you look in that paste I copied everything from last night. I opened all the ports that I was told to in the docu so, let me just check that it was 111, 24007, 24008, 24009, 24010 - I don't think I needed to open that many
12:54 realdannys1 and I used Amazon's 64bit Linux dist to start with
12:54 kkeithley_ I'm on a different machine and I don't have the scrollback from last night
12:54 realdannys1 everything worked fine, including peer probe and status until I go to create the volume when it pauses and nothing ahppens
12:55 realdannys1 Ah try this @kkeithley_ for a recap - Full question and status so far here - http://pastie.org/private/bnfuo6qtwlxyjjsxaz55da
12:55 glusterbot <http://goo.gl/Yc8TB> (at pastie.org)
12:55 kkeithley_ @ports
12:55 glusterbot kkeithley_: glusterd's management port is 24007/tcp and 24008/tcp if you use rdma. Bricks (glusterfsd) use 24009 & up. (Deleted volumes do not reset this counter.) Additionally it will listen on 38465-38467/tcp for nfs, also 38468 for NLM since 3.3.0. NFS also depends on rpcbind/portmap on port 111.
12:57 social__ hmm https://bugzilla.redhat.com/show_bug.cgi?id=841617 < I can reproduce this on 3.3.1 and 3.3.2 but can't on 3.4.0
12:57 glusterbot <http://goo.gl/CBD2r> (at bugzilla.redhat.com)
12:57 glusterbot Bug 841617: high, medium, ---, rabhat, CLOSED WORKSFORME, after geo-replication start: glusterfs process eats memory until OOM kills it
12:57 social__ is there any plan to have this fixed in 3.3.2?
12:58 kkeithley_ pastie.org is not responding
12:58 realdannys1 Grrr! I closed that page too - can't I just use paste bin? At least its reliable!
12:59 kkeithley_ sure, use pastebin
12:59 kkeithley_ for fpaste
12:59 kkeithley_ or dpaste
12:59 kkeithley_ s/for/or/
12:59 glusterbot What kkeithley_ meant to say was: or fpaste
13:01 realdannys1 looks like entire pastie site is down
13:01 realdannys1 hold on, ill just copy all the bits...
13:03 bulde joined #gluster
13:04 realdannys1 @kkeithley_ http://pastebin.com/Ha64sD9q
13:06 kkeithley_ social__: I reopened the bz 841617.
13:09 kkeithley_ realdannys1: anything in /var/log/gluster/cli.log when you issue the volume create cmd?
13:10 kkeithley_ er, /var/log/glusterfs/cli.log
13:10 realdannys1 Only the mention "Unable to parse create volume CLI"
13:11 social__ kkeithley_: btw quick reproducer : mount gluster volume somewhere and run on it  sysbench --num-threads=1 --max-requests=0 --init-rng=on --max-time=0 --test=fileio --file-total-size=$((1*10000))M --file-num=10000 --file-test-mode=seqwr run
13:11 kkeithley_ oh, sorry, that's already in your pb
13:12 ehg joined #gluster
13:13 kkeithley_ weird, where are you getting your glusterfs bits. (sorry if you answered this already)
13:14 kkeithley_ Does Amazon's Linux use rpms or .debs?
13:17 majeff joined #gluster
13:18 realdannys1 rpms I believe, its based on rhel
13:19 kkeithley_ okay, so which version of glusterfs is it installing? (rpm -q glusterfs)
13:20 majeff1 joined #gluster
13:21 satheesh joined #gluster
13:22 realdannys1 glusterfs-3.3.1-1.el6.x86_64
13:22 jack_ joined #gluster
13:23 kkeithley_ hmm, that should be good.
13:23 kkeithley_ is there a /var/lib/glusterd/vols/gv0/gv0.*.vol on either machine?
13:24 ollivera joined #gluster
13:29 realdannys1 nope
13:29 realdannys1 nothing in vols
13:30 realdannys1 on either machine
13:32 kkeithley_ very strange. At this point all I can think of is try another Linux dist. I know people have used EC2 successfully w/ RHEL/CentOS/SL, Fedora, and Ubuntu.
13:33 Hchl joined #gluster
13:34 kkeithley_ Maybe their rpclib is borked. Otherwise I can't imagine how it could be unable to parse the volume CLI.
13:38 realdannys1 yeah really weird - I know, according to one of the guides I read, I had to change the rhel source to even install it as it wouldn't see the packages, i'll trash the instances and start again
13:38 realdannys1 just as another question do I HAVE to use two instances? Can't I just use one for now and expand to a second instance at a later date
13:38 realdannys1 ?
13:39 kkeithley_ sure, you can use a one-node-one-brick "cluster"
13:39 dewey joined #gluster
13:39 kkeithley_ You can add the second brick at any time
13:39 realdannys1 If I explain my scenario a bit - the data isn't really too critical, we're using gluster as a temporary storage location but one that is mounted and accessed by every instance that EC2 fires up
13:40 kkeithley_ Not sure what you mean by change the rhel source. Do you mean you had to add a yum repo to get glusterfs?
13:40 kkeithley_ @yum
13:40 glusterbot kkeithley_: The official community glusterfs packges for RHEL/CentOS/SL (and Fedora 17 and earlier) are available here http://goo.gl/s077x
13:40 realdannys1 see whats happening is we have our site and 90% of the static assets are hosted on S3 - but we have user uploaded zip files. Now these get processed, unzipped, sometimes mp3s encoded, etc etc. We can't do this on S3, the files would need to be downloaded again to the server wasting bandwidth to do the processing then uploaded back to S3 again - not even efficient really.
13:41 realdannys1 now I thought EBS would be a shared attachable drive across multiiple EC2 instances but for some odd reason it isn't. So we were left with this issue of having uploaded files on different instances when we needed them to be stored in a temporary universal location really
13:42 kkeithley_ seems reasonable
13:42 vpshastry joined #gluster
13:43 realdannys1 step forward gluster - create a 30gb EC2 which mounts to all the auto launched instances. Now also push the processing to the 30gb ec2 with gluster, so our script would trigger the processing on the gluster ec2 so all the unzipping encoding and moving to S3 would happen there
13:43 kkeithley_ that is to say, using glusterfs on an EC2 seems reasonable
13:43 realdannys1 and not on the publicly accessed web servers
13:44 realdannys1 because at the moment we publish some uploaded content and it uses all the resources on the web server which is a bit daft
13:44 spider_fingers left #gluster
13:47 purpleidea joined #gluster
13:47 purpleidea joined #gluster
13:50 MrNaviPacho joined #gluster
13:52 vpshastry left #gluster
13:53 andrewjsledge joined #gluster
13:53 Hchl joined #gluster
13:55 aliguori joined #gluster
14:00 jack joined #gluster
14:13 tziOm Is any work beeing done on improving the speed of glusterfs s tat calls=
14:14 jack_ joined #gluster
14:20 matiz I have a question, about 'cluster.min-free-disk'. I set 'cluster.min-free-disk: 10%' in a mode of gluster 'Distibute'. My bricks have different size. In a smallest brick, exceeded limit 10%, and new files, still land on this brick.
14:21 MrNaviPa_ joined #gluster
14:21 bugs_ joined #gluster
14:22 daMaestro joined #gluster
14:23 matiz this is a good bahavior? I think that, when the limit in brick is exceeded, the new files land on the other bricks, where the limit is not exceeded?
14:24 hjmangalam1 joined #gluster
14:25 shylesh joined #gluster
14:32 jack joined #gluster
14:32 tqrst- I have a folder that can't be removed: 'rm -rf somefolder' -> 'rm: cannot remove `somefolder': Directory not empty'. I checked all bricks and their "somefolder" really is empty. What could be causing this?
14:35 tqrst- oops, shell issue, nevermind
14:40 portante joined #gluster
14:47 matiz anyone know this issue?
14:50 kkeithley_ matiz, it's a known issue. It's not good behavior, I think we all agree about that.
14:55 matiz kkeithley_: so, is a plan to fix this issue?
14:55 matiz cluster.min-free-disk works in the other mode of gluster?
14:56 bivak you can also set cluster.min-free-disk in GBs
14:58 waldner joined #gluster
14:58 waldner joined #gluster
14:59 bivak we also run into this issue matiz
14:59 kkeithley_ I believe it's being worked on, yes
14:59 bivak today I tested with gluster 3.4, and it looks better
15:02 bivak 3.4 is a beta release
15:02 realdannys1 kkeithley_ what I meant by fix it for Amazon Linux was this… #install glusterfs repo
15:02 realdannys1 wget -P /etc/yum.repos.d http://download.gluster.org/pub/gluster/glu​sterfs/LATEST/EPEL.repo/glusterfs-epel.repo
15:02 glusterbot <http://goo.gl/5beCt> (at download.gluster.org)
15:02 realdannys1 #fix it for amazon linux
15:02 realdannys1 sed -i 's/$releasever/6/g' /etc/yum.repos.d/glusterfs-epel.repo
15:03 realdannys1 It appears I have to do the same for Centsos6 otherwise yum install glusterfs just says no package available
15:04 kkeithley_ realdannys1: yes, that's correct. If you use EPEL on CentOS you'll get glusterfs-3.2.7.  Use the glusterfs.org repo instead
15:04 kkeithley_ which you're already doing.
15:04 realdannys1 I just got - glusterfs-3.3.1-1.el6.x86_64  is that ok?
15:05 realdannys1 I didn't have to do this, this time - sed -i 's/$releasever/6/g' /etc/yum.repos.d/glusterfs-epel.repo
15:05 realdannys1 sed -i 's/$releasever/6/g' /etc/yum.repos.d/glusterfs-epel.repo
15:05 bivak this is the location of the repo file; http://download.gluster.org/pub/gluster/glu​sterfs/3.3/3.3.1/CentOS/glusterfs-epel.repo
15:05 glusterbot <http://goo.gl/Y70il> (at download.gluster.org)
15:06 nueces joined #gluster
15:06 nueces left #gluster
15:11 kkeithley_ You can use those. LATEST is a symlink to 3.3/3.3.1
15:12 kkeithley_ Those are the "release" rpms that are built by Jenkins.
15:15 hjmangalam1 joined #gluster
15:15 kkeithley_ Or you can get the latest Fedora Koji built RPMS from the repo at http://download.gluster.org/pub/glust​er/glusterfs/repos/YUM/glusterfs-3.3/  They're the same for the most part, but have some things that haven't or hadn't made it into the source when 3.3.1 was released, e.g. hardened builds, updated UFO (which I expect you don't care about), etc.
15:15 glusterbot <http://goo.gl/9tJvd> (at download.gluster.org)
15:16 kkeithley_ Your choice
15:17 kkeithley_ Correction: the repo at http://download.gluster.org/pu​b/gluster/glusterfs/3.3/3.3.1/ is also Fedora Koji built RPMs, but it's frozen at 3.3.1-1.
15:17 glusterbot <http://goo.gl/ZO2y1> (at download.gluster.org)
15:17 kkeithley_ Use either one.
15:18 bala joined #gluster
15:20 kkeithley_ oh look, 3.4.0beta3 is out
15:26 dobber_ joined #gluster
15:27 MrNaviPa_ joined #gluster
15:27 jthorne joined #gluster
15:43 hjmangalam1 joined #gluster
15:47 edong23 joined #gluster
15:52 wgao joined #gluster
16:03 edward2 joined #gluster
16:04 ollivera_ joined #gluster
16:05 lanning_ joined #gluster
16:05 mohankumar__ joined #gluster
16:05 purpleid1a joined #gluster
16:07 vpshastry joined #gluster
16:07 vpshastry left #gluster
16:12 ultrabizweb joined #gluster
16:13 hjmangalam1 joined #gluster
16:14 edoceo joined #gluster
16:18 jclift_ joined #gluster
16:18 mynameisbruce joined #gluster
16:41 portante joined #gluster
16:44 Hchl joined #gluster
16:44 joelwallis joined #gluster
16:51 G________ joined #gluster
17:02 lalatenduM joined #gluster
17:03 bambi23 joined #gluster
17:14 hjmangalam joined #gluster
17:14 marmoset asking again in case anyone has suggestions
17:14 marmoset I'm trying to do a replace-brick (3.3.1) and it seems to give up after a random amount (one time after ~1.8TB, the other after only 42G) with the receiving brick having a glusterfs processing using 100% cpu on one core, but not doing anything.  status says it completed, but it didn't.  Any ideas?
17:17 hjmangalam joined #gluster
17:22 BBenB joined #gluster
17:24 BBenB Hey I was wondering if my understanding is correct because i'm having trouble getting NFS mounting to work... to mount a glusterfs as an NFS mount you startup the normal glusterd service and it should start up it's own NFS server that exposes the volume as an export... is this right?
17:30 kkeithley_ BBenB: Yes, gluster runs its own NFS server. When you start the volume there will be a glusterfs process and a glusterfsd process (usually one for each brick).  glusterfsd is the gluster native server, glusterfs is the NFS server.
17:34 BBenB thanks... I found the process and the log file which has errors that hopefully will guide me towards what isn't working
17:37 5EXAATM10 joined #gluster
17:40 edong23 joined #gluster
17:42 hajoucha joined #gluster
17:42 MrNaviPa_ joined #gluster
17:46 StarBeast joined #gluster
17:49 hjmangalam joined #gluster
17:53 lpabon joined #gluster
17:54 BBenB i suspect it is impossible, but can you run the gluster nfs server and system NFS at the same time?
17:54 BBenB system NFS server*
17:56 BBenB nevermind found wiki entry that says it is impossible
18:00 neofob left #gluster
18:01 ultrabizweb joined #gluster
18:08 rb2k joined #gluster
18:09 kkeithley_ Generally speaking that's correct. I believe there was a theory once that you could run both, that's why we used to run the gluster nfs server on a non-standard port. I've never seen it work though.
18:10 BBenB ahh interesting... I haven't seen anything about people running both
18:11 JoeJulian BBenB: I've never heard of anyone giving it any significant effort.
18:12 BBenB i'm using gluster in conjuction with MIT starcluster which is a tool for launching clusters on amazon EC2... which relies heavily on NFS server... I might be able to make one of the nodes be the glusterfs master because I think they don't need nfs server just the client... but ideal would be for it to run on the cluster master
18:13 JoeJulian There is no master.
18:14 JoeJulian But mounting nfs locally is not a good idea. There are memory race conditions that crop up.
18:22 y4m4 joined #gluster
18:22 BBenB The way I have things setup currently (and this might be stupid i'm experimenting) is all the servers have their local(ephemeral in Amazon terms) disks combined into a single gluster distributed volume and all of them have that volume mounted... then computer jobs are submitted to SGE(Sun Grid engine/job manager) from the Starcluster master server (the master server runs SGE server and also serves up various NFS mounts required for starcl
18:25 JoeJulian That sounds like the right model. We should see if we could get Starcluster involved in supporting the fuse client.
18:25 BBenB The reason why I started to look at NFS gluster mounting is because I ran the same process, 1 with a software hack to manually distribute data to the various nodes to get around EC2 not really having large super fast storage and 1 that just wrote everything to gluster... my hack took 3 hours and gluster took 4 hours... thus me looking into optimizing/tweaking gluster because my feeling is that it should be closer to the same performance..
18:27 BBenB Someone made a plugin to run gluster... starcluster seems to be working on adding it to the mainline... i'm not using it currently mostly because I didn't think to look until I had gluster working myself :-/
18:29 BBenB I also know that some parts of the pipeline write out way too many really small files... which I know can be a performance hit to gluster and really any distributed filesystem... I'm working to get rid of some of that, but parts of it will probably have to remain due to 3rd party tools that I can't modify :-/
18:29 BBenB anyways I will say the fuse client is really easy to use on starcluster... my only reason for looking at NFS is I saw suggestions online that it might help performance
18:31 BBenB though my implementation is lazy... starcluster normally supports nodes dying or getting shutdown... I don't and probably won't just because it would be a pain and/or very expensive... amazon persistent disk space is slow and/or expensive so I have just accepted that if a single node dies I have to restart the computation and store everything on the temp disk except the final results
18:32 BBenB sorry if this is too much info... you seemed somewhat interested and I love to talk about this stuff
18:33 JoeJulian No problem. I'm multi-tasking. We're at Red Hat Summit discussing the Gluster Community.
18:34 BBenB https://gist.github.com/dmachi/2853872 <--- gluster plugin for starcluster
18:34 glusterbot Title: Gluster Plugin for StarCluster (at gist.github.com)
18:36 rob__ joined #gluster
18:36 JoeJulian Hmm, need to tag dmachi and see if we can get him to put this up at forge.gluster.org...
18:37 Hchl joined #gluster
18:37 BBenB yea
18:37 MrNaviPa_ joined #gluster
18:38 bambi23 joined #gluster
18:49 bstr_work joined #gluster
18:51 rb2k joined #gluster
18:55 Hchl joined #gluster
18:56 MrNaviPacho joined #gluster
19:01 hajoucha joined #gluster
19:02 JoeJulian m0zes was mentioned.....
19:02 hlieberman joined #gluster
19:03 m0zes whar
19:03 m0zes what about?
19:04 JoeJulian We're at Red Hat Summit and you're one of the Universities we need to get involved as part of the Gluster Community organization.
19:05 mooperd joined #gluster
19:09 BBenB That plugin also seems to support installing gluster on the fly... if starcluster were to accept it I would assume they would pre-install it in their images.  I modded one of starcluster's images to have it preinstalled
19:11 JoeJulian I've reached out to Justin Riley to see if he wants to meet up while we're in town.
19:12 ferringb mmm
19:12 jdarcy joined #gluster
19:12 ferringb in the process of restoring a set of bricks that went down, w/ that content being a bit stale.  Things have quieted down, but still haven't been able to get output from `volume heal $vol info`
19:12 ferringb known issue, or where should I be looking to figure out what's going on here?
19:13 JoeJulian Try restarting all your glusterd. I've had that happen and that's a workaround I've had work.
19:13 ferringb self-heal daemon, metadata, entry, and data are all explicitly forced on volume wise; so it's definitely churning, just can't easily tell the state of it without looking at .glusterfs
19:13 ferringb JoeJulian: don't suppose you were the one who filed the ticket that self-heal doesn't actually add the gfid linkages to .glusterfs ?
19:14 ferringb because I recall the "restart glusterd" solution being mentioned there. ;)
19:14 JoeJulian doesn't sound familiar
19:14 ferringb hmm
19:14 JoeJulian I've file a lot of bugs though. :D
19:15 ferringb heh
19:15 Elektordi joined #gluster
19:15 ferringb what sort of self-heal rate can one realistically expect?
19:15 ferringb say 2 bricks, AFR keeping them in sync, needing to rebuild one from scratch
19:16 ferringb I know the rates at the drive level, but gluster seems to go a fair bit slower than that
19:16 ferringb also; inherited setup, so it's a bit tricky to pull perf stats for this in a clean setup, hence the questions. ;)
19:17 ferringb JoeJulian: also, don't suppose you've seen the FD leak for volume definition reloads?  Cause that one is fun. :)
19:18 hajoucha hi, is it possible to setup gluster volume striped in between two servers and later add another two servers to replicate the first two?
19:18 hajoucha or one must setup all 4 servers (2 stripes, 2 replicate) at the beginning..?
19:18 hajoucha * Disconnected (Connection timed out)
19:18 * jdarcy looks for an fd leak.
19:19 ferringb pook in /proc/*/fd/ , if sub 3.4 look for deleted files in /tmp
19:19 ferringb then just go cat some of the fd's, and reconize older volume definitions. :)
19:20 ferringb hajoucha: yes, doable.
19:20 ferringb stacked afr/replicate basically
19:22 jdarcy ferringb: Are you seeing that with client volfiles, or server?  That'll narrow the search.
19:23 ferringb oh, pardon
19:23 ferringb server
19:23 ferringb haven't checked client
19:23 jdarcy Thanks.  Still looking...
19:23 ferringb we're runnin 3.3.2qa2/qa3
19:23 ferringb also
19:23 ferringb jdarcy: if you screw up the config file in any way- say via a filter that adds some junk into it, note the line number that's mentioned in the los
19:23 ferringb *logs
19:25 jdarcy ferringb: You're using volfile filters?
19:25 * jdarcy had skipped those code paths as irrelevant.
19:25 ferringb no
19:26 ferringb well, rephrasing
19:26 ferringb for a bit, yes, but it's unrelated
19:26 ferringb the brick glusterfsd's are what are leaking fd's btw
19:26 ferringb jdarcy: if you've got questions, I'm looking at an instance of it riht now
19:26 ferringb *right now.  really need a new keyboard.
19:27 jdarcy If you could paste the volfile somewhere, that might give me some more clues.
19:27 ferringb can do, but it's fairly standard
19:27 hajoucha ferringb: thanks!
19:27 ferringb $vol-fuse.vol suffice?
19:27 ferringb or do you want me to scrape a copy straight out of one of the deleted handles?
19:28 jdarcy One of the "leftovers" in /tmp would be most useful.
19:28 hajoucha btw. has anyone experienced troubles with "cp" from client to gluster mounted space? We tried 3.4-beta and have random errors "Bad file descriptor".
19:28 hajoucha this bug seems to be related to caching.
19:29 ferringb http://dev.gentoo.or/~ferringb/vol01-fuse.vol
19:29 joelwallis joined #gluster
19:29 ferringb jdarcy: offhand, there really isn't anything to be gotten from digging through those fds
19:29 * ferringb already has; it's just a leaked fd for each time a `gluster volume set <whatever>` has been invoked
19:30 ferringb although it's a nice, albeit pretty horrible, way of getting a historical view at what has changed in the volume configuration and when. :)
19:30 jdarcy Hey, free volfile versioning.  ;)
19:30 ferringb as mentioned, this is 3.3.2qa2 or qa3; I'm not ruling out that being involved
19:30 jag3773 joined #gluster
19:31 Staples84 joined #gluster
19:32 ferringb related, if in looking through that xlater stack feel free to make sugestions for improvement.
19:33 * ferringb isn't a hue fan of the fact it's 2 stacks, w/ AFR=2 w/in each stack and dht layered over
19:33 ferringb means that if you lose one brick server, the pair gets its ass handed to it via 2x load- instead, should've been distributed across the misc brick servers (dependent on rack location)
19:36 ferringb random question also; how many folk here are running stable versions, or how many are running snapshots of trunk (or betas) ?
19:36 jdarcy I think so too, BTW.  http://hekafs.org/index.php/​2012/07/multi-ring-hashing/
19:36 glusterbot <http://goo.gl/THqFD> (at hekafs.org)
19:37 * ferringb gets the vibe its the latter
19:37 ferringb jdarcy: haven't had time (yet) to look at hekafs closely, although have been noticing it show up in the gluster googling
19:37 jdarcy Just doing "replicate between any two bricks" means O(n^2) translators, but there are more feasible compromises.
19:37 ferringb yep
19:38 ferringb would need to do something more like dht w/ AFR stated inline- something I could've sworn gluster could do back in '08 or so
19:38 * ferringb distinctly recalls in either unify, or afr xlater, the ability to control per file the level of redundancy per file
19:39 jdarcy I think there was something like that, but it was a simple fan-out.  Wasn't really crash-proof.
19:39 jdarcy Send to N, hope for the best.  ;)
19:39 ferringb prolly.  just rung a bell from my last stint maintaining a gluster stack. ;)
19:40 jdarcy Still trying to figure out why server volfiles would *ever* be in /tmp.
19:41 ferringb that's fixed
19:42 ferringb don't ask me which version has it, but I recall seeing that crap get moved to /var/lib/gluster/ in a later commit than what we're running
19:42 jdarcy I thought you said the old volfiles were showing up in /tmp.
19:44 Hchl joined #gluster
19:47 Elektordi Hi everybody! Just a quick question regarding a recent gluster setup. Does anyone ever had the error "0-fuse: inode not found"?
19:48 MrNaviPacho joined #gluster
19:48 mooperd joined #gluster
19:48 Elektordi It appends sometimes in the middle of the night, and then the mountpoint become unusable (all fs access inside the fuse mountpoint returns "Input/output error")
19:49 Elektordi *happends
19:49 ferringb jdarcy: oh, sorry; they are
19:50 ferringb jdarcy: clarifying; glusterd's transferance of the volume file goes to /tmp, gets read, then unlink'd.  it however never closes the fd, thus you can spot them hanging around in /proc/$pid/fd/ space
19:50 ferringb jdarcy: that bug is separate from where it's storing the content- I know they moved the content in a later commit from /tmp/ to /var/lib/gluster/
19:51 ferringb also
19:51 ferringb jdarcy: 71496826955cacac37abfd5fd017340a04988971
19:51 ferringb jdarcy: "glusterfsd: Fixed fd leak due to use of tmpfile()" <-- being the subject line, so looks like it's resolved in master
19:52 jdarcy Ah, OK.
19:53 ferringb so... what version are folks running? :)
19:54 marmoset i/part
19:54 marmoset left #gluster
19:54 ferringb cause in looking at >=3.3.2qa2, I'm seeing commits that look rather like our setup should have that running. ;)
19:54 jdarcy Practically always whatever's on master, but I don't think I count.  ;)
19:54 ferringb how's the stability been?
19:55 ferringb (dependent on your testing obviously, just wondering what the general experience has been)
19:56 jdarcy I haven't really been tracking the 3.3 branch TBH.  I think 3.4 is looking pretty good, master not so much (but that's expected).
19:56 jdarcy If it weren't for Red Hat QA, I think the bug workload for 3.4 would be *way* down from previous releases, despite many more users.
19:58 mooperd joined #gluster
19:58 andreask joined #gluster
19:59 jclift_ jdarcy: The new rdma stuff in master seems to be more resilient than the previous rdma stuff.
19:59 jclift_ I haven't been flogging the living heck out of it (concentrating on other things atm), but at least it's not breaking all the time on trivial things with rdma transport. :D
20:00 jdarcy jclift_: Oh good.  That's one of the newest pieces.
20:00 jclift_ Yeah, agreed. :)
20:00 hajoucha ferringb: we have tried 3.3.2, but failed to get rdma working, now we have 3.4-beta, rdma OK, but experience some troubles with "cp" - when copying large files onto mounted gluster volume.
20:01 jdarcy I don't think any other patch of that magnitude would have been allowed in so late in a release, but *so* many people wanted it.
20:02 hajoucha the cp problem may not be inherently in gluster, but rather FUSE thing. Unfortunately, gluster is affected by it.
20:02 nwood joined #gluster
20:03 jclift_ jdarcy: Oh random thought... with that Gluster filter stuff that lets people change .vol files when they're regenerated... do you reckon that could be used to insert a whole new translator (ie custom written glupy thing) into a .vol file?
20:03 jclift_ Haven't had a chance to try it out yet, but I'm hopeful... will probably take a look at it next week if I get this packstack stuff finished soon
20:04 jdarcy jclift_: Yes.  I'd rather not encourage too much of that, though.  It's really a stopgap kind of solution; the *right* one is to get support integrated into the regular CLI/volfile-generation machinery.
20:05 jclift_ So... it's a bad idea to write a blog post about it even if it's mentioned as a stop gap?
20:05 nwood hey guys, i have a rather interesting problem that I am hoping to get some help with
20:05 * jclift_ thought it'd be a big improvement over manually starting each daemon
20:05 jclift_ Non-optimal, but at least makes things possible... ;D
20:06 jdarcy jclift_: It's way better than starting each daemon by hand, that's for sure.
20:06 nwood i am running gluster 3.4 beta1 on debian wheezy with a distributed replicated volume ontop of zfs bricks
20:06 jclift_ k
20:06 jclift_ jdarcy: Well, I'll see how it goes with my glusterflow stuff when I get a chance to get back into that
20:06 nwood this is then tied to hadoop via a custom gluster fs plugin. when I run the terasort benchmark, the bricks crash
20:07 nwood any thoughts?
20:07 jclift_ hajoucha: Which OS are you running Gluster on?
20:07 nwood debian 7 (wheezy)
20:07 hajoucha jclift_: fedora 18
20:07 jdarcy nwood: What do you mean by "custom glusterfs plugin"?
20:08 nwood the glusterfs plugin currently maintained by redhat does not handle permissions so I added inheratence
20:08 jclift_ hajoucha: Any interest with trying Gluster built from git?  It's extremely easy to build on Fedora/RHEL/CentOS boxes
20:08 nwood hadoop streaming works, iozone cluster testing works, all bricks appear stable
20:08 nwood then terasort causing bricks to crash
20:08 hajoucha jclift_: would do it, no problem.
20:09 jclift_ hajoucha: http://www.gluster.org/community/do​cumentation/index.php/CompilingRPMS
20:09 glusterbot <http://goo.gl/aXOjy> (at www.gluster.org)
20:09 jclift_ hajoucha: You can pretty cut-n-paste the instructions the whole way through.  Takes about 5-10 mins (max) including compile time. :D
20:09 jdarcy nwood: Do you get any core files when the bricks crash?
20:10 dbruhn joined #gluster
20:10 hajoucha jclift_: great! Will test that tomorrow
20:10 hajoucha jclift_: sure, no problem with compilation at all.
20:10 nwood jdarcy: where might these be located? I see brick crash logs and I see readv failed errors in other logs
20:11 ferringb hajoucha: curious, 3.4 got the protocol level version awareness?
20:11 ferringb cause that was a big one in my list of desirables
20:11 hajoucha jclift_: we have been hacking into cp sources of coreutils to see why it gets bad file descriptor with gluster. However, the only thing we could come out with is that cache problem - the crashes look really random.
20:11 jclift_ hajoucha: Cool.  If the problems still exist with that, it means more urgent for us to look at.  Trying stuff with the beta release rpms is useful, but there might be some bugs that are already fixed in latest git.  The compiling approach kind of sorts that out. :D
20:11 dbruhn I have a server with two bricks that is part of a 10x2 distributed replicated system. Two of the bricks were offline for a couple of days waiting for a new raid controller. Do I need to do anything to fix them being out of sync manually?
20:11 jclift_ hajoucha: Interesting.  Keep us up to date. :D
20:12 hajoucha jclift_: we will make a bugreport.
20:12 jclift_ +1
20:12 ferringb dbruhn: I'd be curious what the perf/behaviour is like when you restore the bricks
20:13 ferringb our stack- 2TB per brick, lot of files (more I suspect than gluster's underlying gfid .glusterfs bits like), it basically cpu dos's the damn brick
20:13 jdarcy nwood: You could try /var/log/core, at least one a RHEL/Fedora style system.
20:13 ferringb reaching in and suppressing its desire to use spinlocks, instead using straight pthread mutex locks, it behaaves a bit better, but still
20:13 hajoucha jclift_: ok, will let you know tomorrow when I get back to the servers. They are cut off outside network for another reason...
20:14 jclift_ hajoucha: No worries at all. :)
20:14 jdarcy ferringb: I don't suppose you've submitted a patch for the spinlock/mutex conversion...?
20:14 dbruhn it is for sure putting a load on the server with the two bricks after it's been brought back up
20:14 dbruhn but seems to be handling it fine
20:17 dbruhn The machines are 2x4core 2.4ghz/ intel e5-2609's
20:17 dbruhn and 16GB of ram per machine
20:18 ferringb *cough*
20:18 ferringb jdarcy: was busy getting the dumb thing back into quasi HA. ;)
20:19 ferringb jdarcy: it was a hack to sidestep the issue and bring the brick back; I'm not sure it's the correct fix, although I don't think using spinlocks by default was sane
20:19 ferringb dbruhn: how many bricks per machine?
20:19 dbruhn 2 @ 2.7TB
20:19 nwood jdarcy: no core dumps unfortunately. deeper in the logs I see "is not a valid port identifier"
20:19 dbruhn 10 machines, 20 bricks
20:20 ferringb dbruhn: roughly similar hardware, just each machine having 8 bricks @ 2TB
20:20 dbruhn I am running QDR IB over RDMA on the backside too
20:20 ferringb individually bringing brick by brick back, and shunting client load off of it allowed it to come back- doing anything else however (even single brick) effectively lead to an outage
20:20 dbruhn and the drives are all 15K SAS
20:20 nwood jdarcy: other useful info is that I am running across QDR infiniband. I am using the tcp,rdma transport type but mounting with tcp
20:20 ferringb yeah, my hardware is significantly crappier. :)
20:24 Hchl joined #gluster
20:24 daMaestro joined #gluster
20:25 nwood jdarcy: i also see "reading from socket failed. Error (No data available)"
20:25 neofob joined #gluster
20:27 jdarcy nwood: Some of those messages are what we'd expect when the other side of the connection died.
20:28 nwood jdarcy: client side?
20:28 jdarcy nwood: I don't have any quick ideas how to isolate that further.  I suggest filing a bug report so we can get some more focused eyeballs on it.
20:41 badone_ joined #gluster
20:42 y4m4 joined #gluster
20:43 yosafbridge joined #gluster
20:51 jack_ joined #gluster
20:55 StarBeast joined #gluster
21:13 StarBeas_ joined #gluster
21:39 Hchl joined #gluster
21:57 realdannys1 well I've tried another EC2 distro but still no luck at volume create - I wonder if I'm not partitioning the EBS volume correctly
22:04 m0zes joined #gluster
22:11 Hchl joined #gluster
22:16 realdannys1 gluster hates me
22:21 mooperd joined #gluster
22:23 realdannys1 Can anyone see from the cli.log whats going wrong with my volume create? http://pastebin.com/yPtUzbvk
22:23 glusterbot Please use http://fpaste.org or http://dpaste.org . pb has too many ads. Say @paste in channel for info about paste utils.
22:24 realdannys1 paste for those that don't like paste bin - although theres no ads with adblock :) http://fpaste.org/18073/37098943/
22:24 glusterbot Title: #18073 Fedora Project Pastebin (at fpaste.org)
22:36 rb2k joined #gluster
23:26 StarBeast joined #gluster
23:31 fidevo joined #gluster
23:46 jbrooks joined #gluster
23:48 Hchl joined #gluster
23:54 RobertLaptop joined #gluster
23:57 realdannys1 Can anyone see from the cli.log whats going wrong with my volume create?  http://fpaste.org/18073/37098943/
23:57 glusterbot Title: #18073 Fedora Project Pastebin (at fpaste.org)

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary