Camelia, the Perl 6 bug

IRC log for #gluster, 2013-01-13

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:05 yinyin joined #gluster
00:22 phreek joined #gluster
00:22 phreek Hi guys, I just added 2 bricks to a distributed volume. The bricks were just formatted (ext4), nothing in them. I started a rebalance operation and now im seeing lots of duplicate files/directories
00:22 phreek any ideas?
00:23 phreek :(
00:24 phreek 'ls' on the gluster volume appears to freeze the client now too
00:25 phreek the counter for 'scanned' on the rebalance is going up, but no actual rebalanced files
00:37 Ryan_Lane joined #gluster
00:38 Ryan_Lane non of my self-heal daemons are running. I have no idea why. any ideas?
00:42 phreek sorry - no idea.
00:42 phreek im dealing with duplicate files on my distributed volume myself
00:42 phreek and ive no idea whats up
00:42 phreek lol
00:42 phreek i did add-brick, then rebalance start
00:42 phreek and it all went haywire
00:45 phreek bullshit :)
00:46 Ryan_Lane yeah, so far from my use of gluster in numerous ways, the only feeling that I get from it is that it's incredibly unstable.
00:47 phreek yeah
00:47 phreek i dont know, maybe i did something wrong, but i pretty much followed the admin guide
00:47 phreek i mean, its 2 simple commands
00:48 phreek dont think i fucked it up
00:48 phreek it just decided to fuck itself up
00:48 phreek sad thing is i have 7 TB worth of data in that volume
00:49 phreek like, now its showing its scanning files for rebalancing, but its not rebalancing anything
00:49 phreek wtf
00:51 phreek [2013-01-12 19:49:27.098891] I [dht-rebalance.c:1629:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 787308, failures: 0
00:53 phreek this is a 210 TB volume
00:53 phreek im assuming this may take long...but shouldnt it be migrating files already?
01:04 semiosis it's been quite stable for me over the last couple years
01:04 semiosis Ryan_Lane: self heal daemon log files?
01:04 Ryan_Lane nothing in tem
01:04 Ryan_Lane *them
01:05 Ryan_Lane over the past year and a half I've had almost every one of my filesystems go belly-up
01:05 phreek semiosis: Any ideas on my issue?
01:05 Ryan_Lane nearly every one of my volumes is split-brained
01:05 phreek any help would be greatly appreciated
01:06 semiosis phreek: see ,,(ext4) -- are you using an affected kernel?
01:06 glusterbot phreek: Read about the ext4 problem at http://goo.gl/PEBQU
01:06 yinyin joined #gluster
01:06 phreek hmm
01:06 phreek im using gentoo
01:06 semiosis phreek: xfs is usually recommended, with inode size 512, fwiw
01:06 phreek kernel 3.7.0
01:06 Ryan_Lane does this also affect ubuntu?
01:06 semiosis yep you're bit by the bug then
01:06 phreek i am?
01:07 phreek ah crap
01:07 semiosis Ryan_Lane: afaik any linux kernels newer than 3.3.0, and also rh/cent kernels with the new ext4 code backported
01:07 Ryan_Lane ok
01:08 phreek lol
01:08 phreek so that must be why
01:09 Ryan_Lane semiosis: so, gluster volume status shows a pid for all of the self-heal daemons
01:09 Ryan_Lane but none are online
01:10 Ryan_Lane seems I have a core file matching the same timestamp as the glustershd.log
01:10 phreek if i downgrade to 3.2.9 i guess it should work as expected? or is it fucked now that i tried expanding + rebalancing on a newer kernel with the "bug"
01:12 semiosis Ryan_Lane: all your servers self heal daemons dumped core at the same time?
01:12 Ryan_Lane no
01:12 Ryan_Lane each one has a core file at the same time their shd died
01:12 Ryan_Lane which is likely right when they started
01:13 Ryan_Lane -_-
01:13 Ryan_Lane no symbols
01:14 Ryan_Lane I really need to switch to your version of the glusterfs packages
01:15 phreek semiosis: downgradning to 3.2.36
01:15 phreek is that old enough? :)
01:16 semiosis you tell me :)
01:17 phreek well im not sure!
01:17 phreek lol
01:19 khushildep joined #gluster
01:26 Ryan_Lane hm. well, the split-brain in the volume I'm debugging is actually all for directories
01:26 Ryan_Lane does gluster count bad timestamps as a meta-data split brain?
01:29 semiosis well split brain is determined by xattrs
01:31 semiosis afaik gluster will try to heal any differences, including timestamps... however if more than one replica has xattrs that indicate unsynced changes that's split brain and it cant proceed to heal
01:32 Ryan_Lane I'm confused as to how that would be a problem on directories
01:32 Ryan_Lane especially if the directories in question have the same permissions and ownership
01:33 semiosis have you looked at the ,,(extended attributes) on those directories?
01:33 glusterbot (#1) To read the extended attributes on the server: getfattr -m .  -d -e hex {filename}, or (#2) For more information on how GlusterFS uses extended attributes, see this article: http://goo.gl/Bf9Er
01:33 Ryan_Lane lemme see
01:33 semiosis if it says split brain, then more than one of the replicas should have non-zero afr attribs
01:33 semiosis as explained in jdarcy's article there
01:35 Ryan_Lane ah. size and quota xattrs are different
01:38 Ryan_Lane and yes, indeed there are some that have non-zero afr attrs
01:39 Ryan_Lane how would I go about fixing that?
01:39 Ryan_Lane for files that's easy enough, but it seems dangerous to delete an entire directory
01:40 semiosis i highly recommend jdarcy's article glusterbot just linked to understand how the afr attribs work
01:40 Ryan_Lane well, I read it
01:40 Ryan_Lane that doesn't really tell me much from the perspective of fixing the issue
01:40 semiosis to summarize, when afr attrib is zero, it means that inode (file or dir) is not aware of any unsynchronized changes
01:41 semiosis when non-zero it means that inode has changes that were not synced
01:41 Ryan_Lane such as timestamp or other xattrs?
01:42 semiosis good question, idk for sure but i would assume any/all kinds of changes count
01:42 Ryan_Lane so, as to fixing the issue, I still don't understand how
01:42 Ryan_Lane the directory exists on all 4 servers
01:43 semiosis in the case of directories you'd want to make sure the owners, perms, and children are the same -- not the content of the children, just their names
01:44 semiosis then you can zero-out or delete the afr xattrs from one of the replicas... this will allow self heal to proceed checking & syncing from the other replica
01:44 Ryan_Lane ah
01:44 Ryan_Lane ok
01:50 semiosis btw, directories exist on all bricks in a glusterfs volume
01:50 Ryan_Lane yep
01:50 semiosis the directory tree is the same on every brick
01:52 semiosis hopefully at least
01:55 khushildep_ joined #gluster
01:57 Ryan_Lane semiosis: if that xattr is removed completely on each node, will it be a problem?
02:00 semiosis hard to say
02:00 Ryan_Lane is it something that gets properly reset at some point?
02:01 semiosis the afr xattrs get reset to zero when self heal syncs all replicas of a file/dir
02:02 Ryan_Lane it seems to be based on client
02:02 semiosis if your self heal daemon isnt working then you'll have to ,,(repair) the old fashioned way
02:02 glusterbot http://goo.gl/uA812
02:02 Ryan_Lane yeah. I did that
02:02 Ryan_Lane it didn't seem to set the attribute
02:02 semiosis client logs?
02:03 Ryan_Lane nothing in them for that directory
02:03 semiosis is the client connected to all the replica bricks?
02:03 Ryan_Lane yes
02:03 semiosis hrm, that doesnt add up
02:04 semiosis if the client is connected to all bricks, and you stat a file who's afr xattrs aren't zero across all replicas, it should heal the file & fix the attrs
02:04 semiosis iirc
02:05 Ryan_Lane what if that xattr isnt set at all?
02:05 Ryan_Lane same case?
02:05 semiosis you mean doesnt exist?  should be created
02:05 Ryan_Lane yeah
02:05 Ryan_Lane ok
02:05 semiosis same as if you made a new replicate volume with one full brick & one empty brick... self heal would go through & replicate it all & add xattrs to the files on the full brick
02:06 Ryan_Lane it's properly updating the directory when I touch it
02:06 Ryan_Lane or chmod or chown
02:06 semiosis on all bricks?
02:06 yinyin joined #gluster
02:06 Ryan_Lane but it isn't setting afr
02:06 Ryan_Lane yes
02:06 semiosis weird
02:07 semiosis maybe thats some new behavior in 3.3 i'm not familiar with
02:07 semiosis tbh haven't run 3.3 in prod yet :/
02:07 Ryan_Lane the xattrs don't match, though
02:07 semiosis is your self heal daemon running again?
02:07 Ryan_Lane no
02:07 semiosis Ryan_Lane: what are the xattrs?
02:08 Ryan_Lane http://pastebin.com/dC5y0BHs
02:08 glusterbot Please use http://fpaste.org or http://dpaste.org . pb has too many ads. Say @paste in channel for info about paste utils.
02:08 edong23 joined #gluster
02:09 Ryan_Lane sorry
02:09 Ryan_Lane that was slightly off
02:09 semiosis ah, no afr attribs anywhere
02:09 Ryan_Lane http://pastebin.com/gZNafanC
02:09 glusterbot Please use http://fpaste.org or http://dpaste.org . pb has too many ads. Say @paste in channel for info about paste utils.
02:09 Ryan_Lane well, I removed them
02:09 semiosis ah hm
02:09 Ryan_Lane that's why I was asking if it would re-set them
02:10 semiosis well last time i tried this, it did
02:10 semiosis but that was pre-3.3, no self heal daemon
02:10 semiosis try starting the self heal daemon maybe?
02:10 Ryan_Lane how?
02:10 semiosis restart glusterd would be my guess
02:11 Ryan_Lane yeah. I've tried that
02:11 semiosis heh :(
02:11 Ryan_Lane it doesn't start the self-heal daemon
02:11 Ryan_Lane looks to me like it segfaults when it tries
02:12 Ryan_Lane this is the last thing in the log: http://pastebin.com/1EDWAswa
02:12 glusterbot Please use http://fpaste.org or http://dpaste.org . pb has too many ads. Say @paste in channel for info about paste utils.
02:15 Ryan_Lane I'm going to switch to your ubuntu packages, which will also upgrade me from 3.3.0 to 3.3.1 and see if that helps
02:15 semiosis since glusterd is trying to launch the shd, i'd check the glusterd log too
02:16 semiosis but a stack trace without any surrounding logs is kinda tough to troubleshoot, i dont know the code
02:31 raven-np joined #gluster
03:01 dustint joined #gluster
03:02 dustint joined #gluster
03:06 atrius joined #gluster
03:07 yinyin joined #gluster
03:50 khushildep_ joined #gluster
03:51 dustint joined #gluster
03:53 glusterbot New news from newglusterbugs: [Bug 894674] Container listing of objects can starve out other requests being handled by the same worker <http://goo.gl/3OYdp>
03:55 m0zes @ppa
03:55 khushildep joined #gluster
03:55 glusterbot m0zes: The official glusterfs 3.3 packages for Ubuntu are available here: http://goo.gl/7ZTNY
03:55 yinyin joined #gluster
04:03 dustint joined #gluster
04:24 dustint joined #gluster
04:30 raven-np1 joined #gluster
05:44 dustint joined #gluster
05:48 dustint joined #gluster
06:01 dustint joined #gluster
06:06 dustint_ joined #gluster
06:21 raven-np joined #gluster
06:22 greylurk joined #gluster
06:59 cyr_ joined #gluster
07:28 hagarth joined #gluster
07:32 yinyin joined #gluster
07:55 ctria joined #gluster
08:13 Ryan_Lane joined #gluster
08:14 jjnash joined #gluster
08:14 nightwalk joined #gluster
08:30 greylurk joined #gluster
08:47 raven-np1 joined #gluster
08:58 Ryan_Lane joined #gluster
09:23 yinyin joined #gluster
09:42 yinyin joined #gluster
09:58 gbrand_ joined #gluster
11:00 rags_ joined #gluster
11:03 rags joined #gluster
11:39 NashTrash joined #gluster
12:13 cyr_ joined #gluster
12:28 yinyin joined #gluster
12:55 rags_ joined #gluster
12:59 yinyin joined #gluster
13:30 mohankumar joined #gluster
13:52 chirino joined #gluster
13:57 yinyin joined #gluster
14:09 chirino joined #gluster
14:35 NashTrash joined #gluster
14:59 chirino joined #gluster
15:57 eightyeight joined #gluster
16:06 red_solar joined #gluster
16:48 chirino joined #gluster
16:49 chirino joined #gluster
17:40 rags joined #gluster
17:46 rags joined #gluster
18:10 zaitcev joined #gluster
18:25 greylurk joined #gluster
18:52 testarossa joined #gluster
18:56 rags joined #gluster
18:58 rags_ joined #gluster
19:01 RobertLaptop joined #gluster
19:02 theron joined #gluster
19:08 JoeJulian_ joined #gluster
19:09 n8whnp joined #gluster
19:10 elyograg joined #gluster
19:16 Maxzrz joined #gluster
19:17 neofob joined #gluster
19:23 rags_ joined #gluster
19:51 rags joined #gluster
19:53 joeto joined #gluster
20:00 NashTrash joined #gluster
20:00 NashTrash Hello
20:00 glusterbot NashTrash: Despite the fact that friendly greetings are nice, please ask your question. Carefully identify your problem in such a way that when a volunteer has a few minutes, they can offer you a potential solution. These are volunteers, so be patient. Answers may come in a few minutes, or may take hours. If you're still in the channel, someone will eventually offer an answer.
20:03 NashTrash I am trying to remove bricks (3 because I have replica = 3).  The command completes (with errors reported) but there are still files on the bricks and they still show as part of the volume.
20:08 JoeJulian_ NashTrash: The files will not be removed, but the volume info should be updated. Which version is this?
20:24 NashTrash JoeJulian_: We are running 3.3.1
20:25 NashTrash Most of the data (280MB out of 281MB) got moved off of the bricks.
20:25 NashTrash All of the directories are still there
20:26 JoeJulian_ It's been forever since I've looked at that command (and I've never ended up using it) but iirc, you remove-brick {brick-list} start, then watch the remove-brick...status until it's complete, then remove-brick...commit.
20:27 NashTrash Ah.  Ok.  The commit actually removes it from the cluster?
20:27 NashTrash Er, volume
20:27 JoeJulian_ Right
20:29 NashTrash Do you happen to know why some of the files give an error when being moved off of the bricks?  I see the same errors when I try to rebalance the volume.
20:30 JoeJulian_ You'd have to check the brick and rebalance logs. I've seen that too and have yet to figure it out.
20:33 NashTrash Ok.  Thanks.
20:33 NashTrash For those files they all get the same three error lines:
20:33 NashTrash attempt to set internal xattr: trusted.afr.volname-client-0: Operation not permitted
20:38 Ryan_Lane joined #gluster
20:59 gbrand__ joined #gluster
21:00 gbrand___ joined #gluster
21:01 andreask joined #gluster
21:15 NashTrash1 joined #gluster
21:17 rags_ joined #gluster
21:19 jbrooks joined #gluster
21:36 Ryan_Lane joined #gluster
21:49 smellis anyone know what changes in 3.4 might have in terms of performance for kvm vm images?
21:49 JoeJulian iirc, 3x faster for reads, 2x faster for writes.
21:50 JoeJulian I suspect the real winner would be in iops.
21:53 eightyeight joined #gluster
22:26 smellis hmm, i may try that, is there anything I should know about upgrading from 3.3.1 to 3.4qa6?
22:37 JoeJulian smellis: Probably.... I haven't tried it yet and there's usually nothing documented before it's actually released.
23:55 jbrooks joined #gluster

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary