Camelia, the Perl 6 bug

IRC log for #gluster, 2013-02-21

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:00 tqrst JoeJulian: restarting helped
00:00 Ryan_Lane will gluster not do a recursive delete?
00:04 JoeJulian iirc, the rm program actually handles the recursion.
00:04 sjoeboo joined #gluster
00:05 JoeJulian ... so there would be no way for gluster to know that a recursive delete was attempted.
00:11 semiosis JoeJulian: i'm beginning to suspect something changed between ubuntu precise 12.04 and precise 12.04.1 which affects the ability of glusterfs clients to resolve 'localhost' at boot
00:11 semiosis trying to test this theory
00:14 semiosis JoeJulian: this seems to be new since 12.04.1: http://pastie.org/6264314
00:14 glusterbot Title: #6264314 - Pastie (at pastie.org)
00:15 semiosis that's a client failing to mount from localhost at boot time.  glusterd is running but the client cant find it because 'localhost' isnt resolvable
00:16 semiosis if i add a sleep after starting glusterd, to hold up the client mount, no problem -- confirming the upstart blocking works.
00:16 semiosis trying to reproduce this now on a vanilla precise install from the april '12 image, so far no luck.  though it's easy to reproduce on a current 12.04.1 image
00:16 JoeJulian weird
00:17 JoeJulian Is localhost not in /etc/hosts?
00:17 semiosis these systems are configured the same except one has the latest precise image 12.04.1, the other has the origianl april 12 image
00:17 semiosis hosts, fstab, gluster config -- all the same
00:18 semiosis ec2 images
00:18 JoeJulian Well if none of them have localhost in /etc/hosts then it would be a network services sequence issue. If it is, then it would have to be a glibc issue, I think.
00:19 semiosis great info, thanks
00:19 semiosis i've done about a dozen reboots and never got that DNS error on the original precise image, so i'm reasonably sure that this problem was introduced by an update since then
00:20 JoeJulian Seems like a reasonable assumption.
00:21 semiosis maybe solution is to recommend literally 127.0.0.1 in fstab when intention is to mount from localhost
00:21 semiosis s/e s/e one s/
00:21 glusterbot What semiosis meant to say was: i've done about a dozen reboots and never got that DNS error on the original precise image, so i'm reasonably sure that this problem was introduced by an update one since then
00:21 semiosis no
00:21 semiosis s/e s/e one s/
00:21 glusterbot What semiosis meant to say was: maybe one solution is to recommend literally 127.0.0.1 in fstab when intention is to mount from localhost
00:21 semiosis yes
00:25 semiosis @later tell ThatGraemeGuy ok a better idea seems to be using literally '127.0.0.1:volname' instead of 'localhost:volname' in fstab, no change to upstart job needed.  still narrowing in on this issue but it seems to have been introduced by an update since precise was released.  i can reproduce this problem on 12.04.1 but not on original 12.04 from april!
00:25 glusterbot semiosis: The operation succeeded.
00:25 JoeJulian Sounds like a plan.
00:27 semiosis will have to try this out on quantal too when i have a chance.  for now just dropping a note in the PPA description.
00:32 yinyin joined #gluster
00:33 bala joined #gluster
00:39 nueces joined #gluster
00:46 semiosis JoeJulian: trying to replicate this with an nfs mount... if dns resolution isnt working at boot time, should affect other things besides gluster
00:53 semiosis meh, a long chain of dependencies leads from net-device-up IF=lo to mounting nfs clients
00:53 semiosis think thats my solution though :)
00:58 yinyin joined #gluster
01:39 bala joined #gluster
01:44 _pol joined #gluster
02:06 _pol joined #gluster
02:18 yinyin joined #gluster
02:24 kevein joined #gluster
02:31 tomsve joined #gluster
02:37 bala joined #gluster
02:41 nueces joined #gluster
02:48 bala joined #gluster
02:56 a2 joined #gluster
02:58 sahina joined #gluster
03:01 lala joined #gluster
03:05 pipopopo_ joined #gluster
03:07 disarone joined #gluster
03:08 raven-np joined #gluster
03:22 sgowda joined #gluster
03:32 lala joined #gluster
03:33 bala joined #gluster
03:45 anmol joined #gluster
03:57 lala joined #gluster
04:03 satheesh1 joined #gluster
04:07 sripathi joined #gluster
04:08 Ryan_Lane joined #gluster
04:11 pai joined #gluster
04:18 aravindavk joined #gluster
04:21 Humble joined #gluster
04:30 shylesh joined #gluster
04:32 yinyin joined #gluster
04:35 vpshastry joined #gluster
04:35 bulde joined #gluster
04:39 Humble joined #gluster
04:49 timothy joined #gluster
04:49 deepakcs joined #gluster
04:49 raven-np joined #gluster
04:49 deepakcs I am having a problem registering my email ID with review.gluster.org.. whom should i sent a mail to ask for help ?
04:50 deepakcs is there any ID like infra@gluster.org ?
04:50 deepakcs I keep getting this error when i register my email ID : "Application Error
04:50 deepakcs Server Error
04:50 deepakcs Identity in use by another account
04:52 deepakcs But in http://review.gluster.org/#settings,contact i only see my gmail.com ID. not the other one which I am trying to register. So not sure why it says "already in use
04:52 glusterbot Title: Gerrit Code Review (at review.gluster.org)
04:55 raven-np1 joined #gluster
04:55 yinyin joined #gluster
04:56 vshankar joined #gluster
04:57 _br_ joined #gluster
04:58 hagarth joined #gluster
04:59 Humble joined #gluster
05:05 anmol joined #gluster
05:08 vpshastry joined #gluster
05:09 raghu joined #gluster
05:20 _pol joined #gluster
05:22 yinyin joined #gluster
05:23 tomsve joined #gluster
05:25 mohankumar joined #gluster
05:31 dbruhn joined #gluster
05:34 raven-np joined #gluster
05:40 raven-np1 joined #gluster
05:42 badone joined #gluster
05:51 theron joined #gluster
05:59 lala joined #gluster
06:03 bulde joined #gluster
06:15 yinyin joined #gluster
06:27 rastar joined #gluster
06:28 ricky-ticky joined #gluster
06:35 rastar left #gluster
06:37 timothy joined #gluster
06:39 rastar joined #gluster
06:47 vikumar joined #gluster
06:50 Nevan joined #gluster
06:51 ekuric1 joined #gluster
06:52 nullsign joined #gluster
06:52 ramkrsna joined #gluster
06:54 bala joined #gluster
07:14 jtux joined #gluster
07:22 ngoswami joined #gluster
07:23 sripathi joined #gluster
07:25 hagarth joined #gluster
07:25 guigui3 joined #gluster
07:27 raven-np joined #gluster
07:38 mooperd joined #gluster
07:43 anmol joined #gluster
07:43 clag_ joined #gluster
07:51 bulde joined #gluster
08:00 jtux joined #gluster
08:01 theron joined #gluster
08:06 ctria joined #gluster
08:07 ctrianta joined #gluster
08:09 Humble joined #gluster
08:13 rgustafs joined #gluster
08:15 ThatGraemeGuy joined #gluster
08:21 ThatGraemeGuy semiosis: got you messages, thanks. I tried 127.0.0.1:volname in fstab, and i still get 'dns resolution failed on host 127.0.0.1', which is beyond weird :-/
08:28 tomsve joined #gluster
08:30 raven-np joined #gluster
08:34 badone joined #gluster
08:34 mooperd joined #gluster
08:35 rwheeler joined #gluster
08:35 rwheeler joined #gluster
08:37 rwheeler joined #gluster
08:40 Staples84 joined #gluster
08:42 glusterbot New news from resolvedglusterbugs: [Bug 764706] volume set for 'cache-min-file-size' succeeds even if 'min-file size' is greater than 'max-file-size' <http://goo.gl/q6ev4>
08:45 raghu joined #gluster
08:46 rwheeler joined #gluster
08:49 tryggvil joined #gluster
08:50 WildPikachu joined #gluster
08:51 pai joined #gluster
08:52 anmol joined #gluster
08:53 tjikkun_work joined #gluster
08:57 mooperd joined #gluster
08:57 lh joined #gluster
08:57 lh joined #gluster
09:12 lkoranda_ joined #gluster
09:23 tryggvil joined #gluster
09:24 Staples84 joined #gluster
09:27 duerF joined #gluster
09:28 gbrand_ joined #gluster
09:31 atrius_away joined #gluster
09:36 dobber_ joined #gluster
09:39 bala joined #gluster
09:52 bulde joined #gluster
10:08 shireesh joined #gluster
10:10 raven-np joined #gluster
10:26 Norky joined #gluster
10:27 hagarth joined #gluster
10:29 andreask joined #gluster
10:47 ninkotech_ joined #gluster
10:48 mooperd joined #gluster
10:52 satheesh joined #gluster
10:57 rotbeard joined #gluster
11:11 venkat joined #gluster
11:11 sas joined #gluster
11:11 sas #haskell
11:12 sas venkat, ping
11:17 raven-np joined #gluster
11:27 steinex left #gluster
11:30 hagarth joined #gluster
11:40 satheesh joined #gluster
11:41 lh joined #gluster
11:43 dec joined #gluster
11:44 bala joined #gluster
11:55 shireesh joined #gluster
11:56 DWSR joined #gluster
11:56 DWSR joined #gluster
12:04 overclk joined #gluster
12:08 ngoswami joined #gluster
12:09 atrius_away joined #gluster
12:19 andreask joined #gluster
12:32 edward1 joined #gluster
12:33 Staples84 joined #gluster
12:39 balunasj joined #gluster
12:43 glusterbot New news from resolvedglusterbugs: [Bug 904005] tests: skip time consuming mock builds for code-only changes <http://goo.gl/h0kIz>
12:50 hagarth joined #gluster
12:55 aliguori joined #gluster
13:05 venkat joined #gluster
13:09 dustint joined #gluster
13:10 ThatGraemeGuy left #gluster
13:17 gbrand_ joined #gluster
13:27 GabrieleV joined #gluster
13:37 ThatGraemeGuy joined #gluster
13:44 ThatGraemeGuy joined #gluster
13:46 raven-np1 joined #gluster
13:47 bennyturns joined #gluster
13:50 DWSR joined #gluster
13:50 DWSR joined #gluster
13:50 raven-np joined #gluster
13:53 bulde joined #gluster
13:55 flowouffff hi
13:55 glusterbot flowouffff: Despite the fact that friendly greetings are nice, please ask your question. Carefully identify your problem in such a way that when a volunteer has a few minutes, they can offer you a potential solution. These are volunteers, so be patient. Answers may come in a few minutes, or may take hours. If you're still in the channel, someone will eventually offer an answer.
13:55 raven-np joined #gluster
13:56 ngoswami joined #gluster
13:56 raven-np1 joined #gluster
14:10 raven-np joined #gluster
14:14 rwheeler joined #gluster
14:15 raven-np2 joined #gluster
14:16 raven-np3 joined #gluster
14:24 tryggvil joined #gluster
14:24 raven-np joined #gluster
14:40 raven-np1 joined #gluster
14:43 raven-np joined #gluster
14:44 H__ on 3.2.5->3.3.1 upgrade, what process populates all the .glusterfs/ hardlinks ? (and when does it run)
14:48 ThatGraemeGuy joined #gluster
14:55 stopbit joined #gluster
14:55 raven-np joined #gluster
14:55 jdarcy joined #gluster
15:04 tqrst is it normal for self-heal to fill up a replaced brick at only about 1 gigabyte per minute on a gigabit network?
15:04 tqrst I'm kind of hoping the answer is no
15:05 H__ I don't know. Does the brick have many small files ?
15:06 tqrst it's a pretty non-uniform mix
15:07 tqrst we're going to be purchasing 4T disks instead of 2T in the future, which means replacing a dead brick could take almost 3 days :\
15:07 tqrst at this rate anyhow
15:09 lala joined #gluster
15:10 lala_ joined #gluster
15:11 flrichar joined #gluster
15:20 bala joined #gluster
15:22 bugs_ joined #gluster
15:23 H__ Need help -> my getfattr (version 2.4.46) gives NO output on getfattr -m . -d -e hex /mnt/gluster/somefile
15:23 tqrst is /mnt/gluster/ your brick or your mounted volume?
15:26 H__ mounted volume
15:26 H__ I'm trying to reproduce http://www.gluster.org/2012/09/what-is​-this-new-glusterfs-directory-in-3-3/
15:26 glusterbot <http://goo.gl/sOR50> (at www.gluster.org)
15:26 tqrst those xattrs are only set on the brick, not on the mounted volume
15:27 tqrst so you would do getfattr -m . -d -e hex /mnt/mybrick/somefile on a brick that has the file
15:27 tqrst (in that tutorial, /data/glusterfs is a mounted brick)
15:28 H__ argh. yes. thanks !
15:29 tqrst glad I could be useful for once here :p
15:33 aliguori joined #gluster
15:35 atrius joined #gluster
15:44 lala_ joined #gluster
15:50 hagarth joined #gluster
15:53 raven-np joined #gluster
15:56 ctria joined #gluster
15:59 jclift_ joined #gluster
16:00 gbrand_ joined #gluster
16:03 jiffe98 is there a good primer on handling locked files?  I've had troubles running self heals recently because of locked uuids and haven't been able to unlock them, ended up restarting gluster services on the locking servers
16:03 aliguori_ joined #gluster
16:03 guigui3 left #gluster
16:04 sjoeboo joined #gluster
16:05 daMaestro joined #gluster
16:06 rastar joined #gluster
16:13 tqrst weird - "gluster volume rebalance myvol start" errors out with "Rebalance on myvol is already started", but "gluster volume rebalance myvol status" shows "in progress" on localhost, "stopped" on 3 nodes, and "not started" on my other ~10 nodes.
16:34 nueces joined #gluster
16:37 _Bryan_ So anyone up for a unique thought this morning?
16:44 Humble joined #gluster
16:46 nocko After adding two bricks to a distributed-replicated 3.3 volume, (dht) linked files produce "Invalid argument" when read on. xattrs and log messages are here: https://gist.github.com/nocko/5006065 , vol-file is here: https://gist.github.com/nocko/5006090 . Unable to upgrade to 3.3.1 due to https://bugzilla.redhat.com/show_bug.cgi?id=893778 . Advice?
16:46 glusterbot <http://goo.gl/NLoE3> (at bugzilla.redhat.com)
16:46 glusterbot Bug 893778: unspecified, unspecified, ---, kkeithle, ASSIGNED , Gluster 3.3.1 NFS service died after writing bunch of data
16:48 nocko I've solved similar problems before by removing the link files... but that was pre-3.3 and I think perhaps the associated .glusterfs/xx/xx/gfid.linkfile may complicate the issue.
16:49 nocko I should mention that the problems started with a rebalance fix-layout, and grew progressivly worse as it ran.
16:49 aliguori joined #gluster
16:50 nocko Presumably, the dht ranges were updated leading clients to look for files where they were not... creating link files that (for some reason) cannot be followed (even though they point to the correct replica).
16:53 nocko I've just confirmed that removing the link, causes it to immediately be created (as one would expect after a fix-layout).
16:56 nocko For fun... if you read the file from a fuse mount on a host serving a brick it not only works (where a non-brick host fuse mount fails), but the client can (for a time) read the file as well and the link file is gone.
16:59 rsevero_ joined #gluster
16:59 vpshastry left #gluster
17:02 nocko The link file is now present on the other brick, not gone...
17:05 ekuric left #gluster
17:06 jruggiero joined #gluster
17:09 nocko Ug. EXT4 (in recent kernels) broken for months, 3.3.1 with broken NFS, the last two rebalance operations have triggered bugs. A very boring distribute-replicate glusterfs setup is a full time job... It's like whack-a-mole; what glusterfs hole am I going to fall into this month?
17:30 dbruhn joined #gluster
17:37 puebele joined #gluster
17:42 xian1 joined #gluster
17:47 Guest38217 joined #gluster
17:48 Ryan_Lane joined #gluster
17:49 hagarth joined #gluster
17:53 badone joined #gluster
17:57 puebele joined #gluster
18:14 _pol joined #gluster
18:15 _pol joined #gluster
18:33 romero joined #gluster
18:35 tqrst nocko: similar feelings here
18:39 nocko I found a case where my bug isn't associated with the presence of a link file.
18:43 plarsen joined #gluster
18:45 Humble joined #gluster
18:46 disarone joined #gluster
18:49 nocko The client log file starts a "conservative merge" on the parent directory... that eventually leads to the "invalid arguement".
18:49 ekuric joined #gluster
18:55 _br_ joined #gluster
19:00 _br_ joined #gluster
19:05 Humble joined #gluster
19:05 sara joined #gluster
19:06 sara hi, i have ver big problem
19:06 sara with glusterfs
19:06 sara on client machine by openvz
19:06 sara on master node all work ok, but on client
19:06 sara glusterfs process will 100% cpu
19:06 sara please help me
19:06 sara :(
19:07 sara i try all, reinstall, reconfigure volume
19:07 sara chenge version
19:08 sara still CPU 100%
19:08 sara connect with nativ glustr client by fuse
19:10 jclift_ johnmark: ^^^ Any ideas?
19:11 sara please help my, i try fix this by 3-4 days
19:11 jclift_ sara: I'm not clueful enough to yet.  But hopefully some of the other people here will have ideas.
19:12 sara jclift_: thx :) i home find help on this channel
19:12 sara hope find*
19:13 rsevero_ sara: Is your volume replicated? Isn't the client sinchronizing the replicas? Which Gluster version are you using?
19:13 sara no, at this moment distrib, i change from replicate to distrib thimk maybe this fix pboblem
19:13 sara problem i sthe some on replica and distrib
19:14 sara at this moment:  652 root      20   0  660m 296m 2092 S 117.3 11.6 175:56.92 glusterfs
19:14 sara gluster serv webpages files
19:17 Ryan_Lane joined #gluster
19:18 sara problem is no on node but on client machine
19:19 avati sara: is your website very busy?
19:20 andreask joined #gluster
19:20 sara hmm, i thing not very at this moment, after upgrade cpu ~40-50% by glusterfs
19:20 sara after upgrade more then 99-110%
19:20 lpabon joined #gluster
19:23 sara i upgrade from 3.3.0 to 3.3.1 and try fix thix reupdate to 3.3.0 but still problem exist
19:23 sara now i have 3.3.1
19:24 avati can you find out what processes are using the mountpoint when there is high cpu? with 'lsof -n /mnt/glusterfs'
19:25 sara many lines
19:26 sara nginx   761         root   90r   REG    0,84  from 100 to 1082 gif files
19:26 sara and php-fpm
19:26 sara gif, jpg files, php files
19:26 sara many resoults
19:27 sara lots of files
19:27 sara sory for my english
19:30 sara php-fpm 1040       portal  cwd    DIR   0,84     4096  9484853208555337207 /home/portal/pages php-fpm 1036 portal    5r   REG   0,84    31060 10917668751297631436 /home/portal/pages/includes/auth.php
19:30 dbruhn joined #gluster
19:30 sara etc
19:30 haakond_ joined #gluster
19:32 sara i have centos and rpm pack installed at this moment
19:32 sara on node server, and lient server
19:33 sara lient -> client
19:33 sara i not sleap lots time
19:33 sara ;)
19:37 avati sara: what is the mount point? /home/portal?
19:37 sara mountion point is /home, portal is user on system
19:38 avati php is an intensive workload. it's not too surprising to see high cpu usage
19:38 avati you could improve things a bit by mounting with '--attribute-timeout=60 --entry-timeout=60 --negative-timeout=60'
19:38 luckybambu joined #gluster
19:39 sara mounting with gluster fuse that?
19:40 sara with this attributes?
19:40 avati yes
19:40 sara ok one moment i try this :)
19:41 avati those parameters also mean that if your volume is read/write, then other clients cannot see changes made by one for up to 60 seconds
19:41 gbrand_ joined #gluster
19:41 avati so you may want to have a separate volume for read-only data and mount that with these parameters
19:43 sara mount -t glusterfs --attribute-timeout=60 --entry-timeout=60 --negative-timeout=60 192.168.1.10:/PORTAL /home - this is ok? i neve ruse this parametr on mount
19:43 avati hmm with 'mount -t glusterfs' it would be little differe
19:43 sara ruse-> use
19:43 sara hmm?
19:44 avati mount -t glusterfs -oattribute-timeout=60,entry-t​imeout=60,negative-timeout=60 192.168.1.10:/PORTAL /home
19:44 avati or
19:44 avati glusterfs --server 192.168.1.10 --volfile-id PORTAL --attribute-timeout=60 --entry-timeout=60 --negative-timeout=60 /home
19:44 sara ok one moment :)
19:46 Ryan_Lane joined #gluster
19:46 sara unknown option negative-timeout (ignored) - but mounted
19:46 sara first option
19:46 sara one moment
19:46 sara i test
19:47 avati try second command
19:47 sara ok
19:48 sara glusterfs: unrecognized option '--server'
19:50 penglish Where should I look when it says "starting volume has been unsuccessful" ? The log doesn't have any additional information
19:51 sara one moment
19:51 wN joined #gluster
19:51 sara on log after umount 0 information
19:52 penglish actually, there's something here: [2013-02-21 11:52:24.616833] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (127.0.0.1:1022)
19:52 glusterbot penglish: That's just a spurious message which can be safely ignored.
19:53 penglish it seems to be climbing ports.. starting far below 1022, and work it's way up
19:53 penglish glusterbot: thanks
19:53 glusterbot penglish: you're welcome
19:54 penglish or how about this one: [2013-02-21 11:51:29.921101] W [rpc-transport.c:606:rpc_transport_load] 0-rpc-transport: missing 'option transport-type'. defaulting to "socket"
19:55 sara how use this option on command mount.glusterfs
19:55 sara ?
19:56 avati it is --volfile-server, not --server, sorry! :(
19:56 rotbeard joined #gluster
19:56 sara i try this, maybe i wtore wront, ok moment i try by copy this from u ;)
19:57 sara glusterfs: unrecognized option '--negative-timeout=60'
19:57 sara :(
19:57 sara glusterfs --volfile-server=192.168.1.10 --volfile-id PORTAL --attribute-timeout=60 --entry-timeout=60 --negative-timeout=60 /home
19:58 sara and return glusterfs: unrecognized option '--negative-timeout=60'
19:58 avati uh oh, what version are you running?
19:58 penglish nevermind: /etc/hosts had an incorrect IP address, while DNS was rigth
19:58 penglish correct
19:58 penglish the errors could have been more helpful. :-D
19:58 sara glusterfs --version glusterfs 3.3.1 built on Oct 11 2012 21:49:36
19:59 sara from rpm
19:59 sara at this moment
20:00 avati hmm, maybe negative-timeout came in master branch after release-3.3 branch
20:01 avati try without that for now
20:01 sara ok
20:01 avati you can upgrade to 3.4beta bits and get --negative-timeout feature that should help php apps quite a bit
20:02 sara mount fine
20:02 sara but
20:02 sara cpu still very high
20:03 sara hmm
20:03 sara master node i too must upgrade?
20:04 sara 3.4beta? i try alpha
20:04 sara i dont see beta
20:04 sara ?
20:04 avati yeah alpha for now.. beta next week
20:05 sara ok :)
20:05 sara onlu on client server
20:05 sara ?
20:05 sara only*
20:05 sara update
20:05 sara will fine?
20:05 sara i can upgrade for 2-4 minutes :) i think
20:05 avati both client and server
20:06 sara hmm ok 2 max 10 minutes i hope
20:06 sara one moment
20:07 avati i suggest you try a more stable version of 3.4 if this is a production environment
20:07 avati instead of trying alpha
20:07 avati or try it in a staging/test environment first, run your load against it, and then move it to production
20:08 sara one moment i try this
20:09 zaitcev joined #gluster
20:11 sara_ joined #gluster
20:11 sara_ sory i lost connection
20:11 avati 12:39 < avati> i suggest you try a more stable version of 3.4 if this is a production environment
20:11 avati 12:39 < avati> instead of trying alpha
20:11 avati 12:40 < avati> or try it in a staging/test environment first, run your load against it, and then move it to production
20:12 sara_ on aplha i can mount
20:13 sara_ but cpu still high
20:14 sara_ glusterfs --volfile-server=192.168.1.10 --volfile-id PORTAL --attribute-timeout=60 --entry-timeout=60 --negative-timeout=60 /home - work ok, cpu ~100%
20:16 Humble joined #gluster
20:21 sjoeboo joined #gluster
20:22 sara_ I need to go smoke a so nervous :) I will for max 2 minutes, I hope u can help me fix this problem is because I have fights with the mass of the time, and finally I think I can here are the specs from glustera :) and I see that they are actually;)
20:23 wN joined #gluster
20:29 doubleb joined #gluster
20:29 sara joined #gluster
20:29 sara i back, sory i again lost conenction
20:30 _pol_ joined #gluster
20:30 sara i conenct by webclient
20:30 sara freenode
20:30 glusterbot New news from newglusterbugs: [Bug 913699] Conservative merge fails on client3_1_mknod_cbk <http://goo.gl/ThGYk>
20:31 nocko Hey, that's me!
20:34 sara avati: back to 3.3.1 version?
20:35 doubleb Can someone tell me, if there was any memory size recommendation for running gluster?
20:35 avati sara: i would recommend so
20:35 sara ok
20:36 sara i back no, u have some else ideas to fix this problem?
20:37 avati it would require a much deeper profiling of your workload
20:37 avati i need to join a conf call now
20:37 avati bbl
20:37 doubleb I have 10x2 node gluster distributed-repliated system and added 4 new nodes. During rebalancing regulary ends with failed and I see in the log out-of-memory.
20:38 doubleb Each node 12TB size and have 4GB memory now.
20:40 sara_ joined #gluster
20:41 sara_ i back
20:42 _pol joined #gluster
20:43 tqrst doubleb: I'm rebalancing a 25x2=50T volume, and have yet to see a glusterfsd above 0.1% memory (these servers have 48-128G of ram)
20:45 sara joined #gluster
20:45 sara i back, again lost conenction
20:46 doubleb If i check the memory usage of gluster daemon, it conusmes not more than 20%, but the whole systemgradually run-out-of memory
20:46 tqrst sounds like a bug
20:46 tqrst what version are you on?
20:46 doubleb gluster 3.3.1  on centos 6.3 64bit
20:47 doubleb xfs (inode64) bricks
20:47 doubleb bricks are 5 disks md raid arrays
20:47 sara any ideas?
20:48 doubleb I thought that gluster store in memory the already processed file list
20:49 doubleb trst: i searched for memory leak related bugs, but did not found
20:49 doubleb sorry tqrst instead of trst
20:51 saraa joined #gluster
20:51 saraa hi i back from mirc
20:51 saraa is still conection i hope ;)
20:51 saraa avati: any ideas?
20:53 cjohnston_work joined #gluster
20:56 sjoeboo joined #gluster
20:57 badone joined #gluster
21:11 jdarcy joined #gluster
21:19 rubbs ok, I messed something up. I created a 8-node replica 2 volume and then mounted it. I ran `touch test` and then did `ls.` It's been hanging on ls for over 5 minutes now. Any ideas on where I can start troubleshooting?
21:27 jclift_ left #gluster
21:28 saraa any ideas enyone?
21:28 nocko doubleb: Maybe inode/dentry cache is eating all your memory (long-shot)? cat /proc/meminfo (plz)
21:29 nocko doubleb: Do you have a lot of small files / directories?
21:31 doubleb nocko: 65.000 directory and 1MB~20MB files izes
21:31 dbruhn joined #gluster
21:32 nocko cat /proc/meminfo ?
21:32 nocko Anything unexpectedly huge?
21:33 sjoeboo joined #gluster
21:34 doubleb nocko: there is no anything else. Where should I post the cat report?
21:34 nocko https://gist.github.com/ ?
21:34 glusterbot Title: Gists (at gist.github.com)
21:35 doubleb nocko: https://gist.github.com/anonymous/5008466
21:35 glusterbot Title: gist:5008466 (at gist.github.com)
21:36 doubleb nocko: this is of a node after failed rebalance.
21:36 doubleb If I issue a 'service glusterd restart' rebalance restarts
21:37 nocko Does it take longer to run out of memory the second time?
21:41 doubleb I do not know, after each start rebalance runs for days before i failed.  See this: https://gist.github.com/anonymous/5008522   this is teh situation after 5 months
21:41 glusterbot Title: gist:5008522 (at gist.github.com)
21:42 doubleb nocko: I mean I added the last bricks 5 month earlier but the rebalancing did not finished
21:42 nocko Yeah, the timescales on these operations often make troubleshooting super-tough.
21:45 nocko Can you post the OOM error message? Does gluster die after failing to allocate or does the OOM killer eat some of the processes?
21:46 nocko meminfo from a node that's been running a while would be helpful info as well.
21:54 doubleb nocko: https://gist.github.com/anonymous/5008615
21:54 glusterbot Title: gist:5008615 (at gist.github.com)
21:56 doubleb nocko: after restarting failed node: https://gist.github.com/anonymous/5008632
21:56 glusterbot Title: gist:5008632 (at gist.github.com)
22:02 _pol joined #gluster
22:03 _pol joined #gluster
22:05 nocko doubleb: Hrm. Nothing is ringing any bells for me. You should submit a bug report; hopefully one of the developers can take a look at it.
22:09 _br_ joined #gluster
22:14 doubleb nocko: thank you. One last question: what is your opinion, if I double the ram, would it make longer  the running period (i.e the falied state would occur more rarely)?
22:14 nocko Almost certainly.
22:15 nocko Even adding a little swap may get you by if the volume is near enough done rebalancing.
22:15 _br_ joined #gluster
22:15 nocko Although the rebalance progress didn't seem too promissing on that volume rebalance status you posted.
22:18 nocko crond triggered the OOM, so the system had been running for some interval without gluster requesting any allocations. If gluster suddenly requested all the memory (all at once). I'd have expected to seem glusterfs trigger the OOM or fail the allocation itself. This suggests to me that the leak (wherever it may be), is probably quite slow.
22:30 nueces joined #gluster
22:37 doubleb nocko: yes, the leak is quite slow.
22:41 y4m4 joined #gluster
22:52 raven-np joined #gluster
23:01 glusterbot New news from newglusterbugs: [Bug 905933] GlusterFS 3.3.1: NFS Too many levels of symbolic links/duplicate cookie <http://goo.gl/YA2vM> || [Bug 907202] Gluster NFS server rejects client connection if hostname is specified in rpc-auth <http://goo.gl/cxxJg>
23:02 hattenator joined #gluster
23:14 mooperd joined #gluster
23:15 glusterbot New news from resolvedglusterbugs: [Bug 799861] nfs-nlm: cthon lock test hangs then crashes the server <http://goo.gl/6WCTV> || [Bug 800735] NLM on IPv6 does not work as expected <http://goo.gl/OvsQB> || [Bug 802767] nfs-nlm:unlock within grace-period fails <http://goo.gl/oHwe1> || [Bug 803180] Error logs despite not errors to user (client) <http://goo.gl/OwIxy> || [Bug 803637] nfs-nlm: lock not honoured if tried from
23:27 sjoeboo joined #gluster
23:37 _pol joined #gluster
23:38 sjoeboo joined #gluster
23:38 _pol joined #gluster
23:42 _br_ joined #gluster
23:44 _br_ joined #gluster

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary