Camelia, the Perl 6 bug

IRC log for #gluster, 2013-04-24

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:45 portante|ltp joined #gluster
01:06 twx joined #gluster
01:12 H__ joined #gluster
01:12 H__ joined #gluster
01:22 d3O joined #gluster
01:28 russm_ left #gluster
01:35 dustint joined #gluster
01:38 kevein joined #gluster
01:47 bala joined #gluster
01:51 bharata joined #gluster
02:12 jag3773 joined #gluster
02:14 nickw joined #gluster
02:23 zykure joined #gluster
02:26 d3O joined #gluster
02:27 d3O left #gluster
02:32 d3O_ joined #gluster
02:36 d3O_ left #gluster
02:53 theron joined #gluster
02:57 nickw joined #gluster
03:07 vshankar joined #gluster
03:18 fidevo joined #gluster
03:42 aravindavk joined #gluster
03:48 dmojoryder1 joined #gluster
04:00 sgowda joined #gluster
04:01 itisravi joined #gluster
04:25 zykure joined #gluster
04:25 shylesh joined #gluster
04:30 hchiramm_ joined #gluster
04:32 JoeJulian @which brick
04:32 glusterbot JoeJulian: To determine on which brick(s) a file resides, run getfattr -n trusted.glusterfs.pathinfo $file through the client mount.
04:32 jclift JoeJulian: You're up late?
04:33 JoeJulian Yeah, have some split-brain to heal and a couple servers to do kernel upgrades on.
04:33 JoeJulian Bummer... trusted.glusterfs.pathinfo doesn't work on split-brain files.
04:36 jclift Good luck. :)
04:37 * jclift is finally getting some brain traction happening with this Glupy stuff
04:37 jclift Sometimes doing stuff when I can't sleep actually works
04:37 JoeJulian Wish I had time to work with it. I've got ideas, but no time to try them.
04:38 jclift It's been super hard to figure out how to get it to actual come together and do anything at all
04:38 jclift But, finally got it "functional"
04:38 jclift No idea about debugging it properly though yet
04:38 JoeJulian Just don't write bugs.
04:38 jclift Heh
04:39 jclift Given a bit deeper understanding (next day or two), I'll write up a newbie level "how-to" guide so other people don't have to go through the same amount of pain
04:39 JoeJulian That would definitely be cool.
04:40 jclift If I wasn't both a Python novice and a Gluster internals novice, probably this would have been (and still be) a bunch easier. :)
04:40 JoeJulian True. What's your language of choice?
04:40 jclift bash
04:40 jclift Unfortunately, no joking. :/
04:41 jclift Used to do C coding, but wasn't ever any better than "average" at it.
04:41 JoeJulian Yeah, hard to work library interfaces with bash though...
04:41 jclift I like trying out ideas to see if concepts work
04:41 JoeJulian I'm in the same boat with C
04:42 jclift I really don't like having to then take said idea and make it into a fully operational product/project/etc (boring as heck).
04:42 JoeJulian Of course, everything I do and know is self-taught. Maybe I could have done better with some formal education.
04:42 jclift Yeah
04:42 jclift Same.  And not particularly fussed about it.
04:42 jclift My main thing to improve after Python is my maths skills.
04:42 JoeJulian I'm leading industries, so me neither. :)
04:43 jclift :)
04:43 jclift Hmm, gunna hit the sack finally now.  Will pick this Glupy stuff up in morning.
04:44 jclift 'nite dude, and hope it goes well. :)
04:44 JoeJulian later
05:03 itisravi joined #gluster
05:06 bala joined #gluster
05:07 chirino joined #gluster
05:10 _pol joined #gluster
05:14 vpshastry joined #gluster
05:18 bulde joined #gluster
05:23 hagarth joined #gluster
05:33 zykure joined #gluster
05:38 lalatenduM joined #gluster
05:58 raghu joined #gluster
06:01 bulde1 joined #gluster
06:03 d3O joined #gluster
06:06 36DAAGEZ4 joined #gluster
06:12 rotbeard joined #gluster
06:15 rastar joined #gluster
06:23 77CAAZ3QU joined #gluster
06:26 ollivera joined #gluster
06:26 d3O joined #gluster
06:27 vimal joined #gluster
06:27 ricky-ticky joined #gluster
06:33 shireesh joined #gluster
06:34 d3O left #gluster
06:35 ctria joined #gluster
06:40 kevein joined #gluster
06:41 satheesh joined #gluster
06:41 JoeJulian gah! I can't believe I did that... I rebooted a machine that still had ext4 bricks. Grrr....
06:42 puebele joined #gluster
06:51 zykure joined #gluster
06:53 hagarth joined #gluster
06:56 samppah JoeJulian :(
06:57 JoeJulian Oh well.. I wanted to convert those bricks and move them anyway...
07:03 VeggieMeat joined #gluster
07:03 bulde joined #gluster
07:06 jiffe98 joined #gluster
07:06 Shdwdrgn joined #gluster
07:12 hybrid512 joined #gluster
07:13 rb2k joined #gluster
07:20 hagarth joined #gluster
07:21 bala joined #gluster
07:26 * JoeJulian ponders why I can't just replace-brick 60 bricks in rapid succession instead of 5 minutes of "Connection failed. Please check if gluster daemon is operational." between them...
07:27 JoeJulian ... ok 2 minutes but still...
07:41 hybrid512 joined #gluster
07:44 dobber_ joined #gluster
07:47 rastar joined #gluster
07:49 stickyboy joined #gluster
08:00 exbeanwen joined #gluster
08:01 spider_fingers joined #gluster
08:03 saurabh joined #gluster
08:03 ngoswami joined #gluster
08:12 exbeanwen joined #gluster
08:14 exbeanwen ^_^
08:15 samppah :O
08:15 hagarth :O
08:26 d3O joined #gluster
08:26 d3O left #gluster
08:30 ujjain joined #gluster
08:36 rruban joined #gluster
08:36 manik joined #gluster
08:46 kevein joined #gluster
08:50 zhashuyu joined #gluster
09:02 exbeanwen left #gluster
09:12 duerF joined #gluster
09:15 vpshastry1 joined #gluster
09:17 ngoswami joined #gluster
09:21 hybrid512 joined #gluster
09:35 H__ How does one find out *which* replace-brick is in progress ?
09:36 H__ other than catting /var/lib/glusterd/vols/*/rbstate
09:52 bala joined #gluster
10:03 davis_ H__: does "gluster volume replace-brick FOO status"  work
10:03 davis_ ?
10:06 davis_ I'm running Red Hat Storage Server 6.2 (most recent), based on Gluster 3.3.0.7rhs. I have what I believe to be a split-brain. "gluster volume heal FOO info" reports only gfid's, rather than e.g. the fully resolved filename. What does this signify? Also, how do I fix it? (1 brick per each of 4 nodes, running in distributed/AFR)
10:12 guigui1 joined #gluster
10:18 hagarth joined #gluster
10:20 bala joined #gluster
10:22 bulde joined #gluster
10:27 ricky-ticky joined #gluster
10:33 yinyin joined #gluster
10:40 manik joined #gluster
10:50 sgowda joined #gluster
11:02 ngoswami joined #gluster
11:08 17SACTXZD joined #gluster
11:14 H__ is cluster.min-free-disk on 3.3.1 (or release-3.3 branch head) still only in percentage as the docs claim or is this format "1GB" also allowed ?
11:19 sgowda joined #gluster
11:19 lpabon joined #gluster
11:21 y4m4 joined #gluster
11:34 chirino joined #gluster
11:36 edward1 joined #gluster
11:49 vpshastry1 joined #gluster
11:50 guigui3 joined #gluster
11:57 rb2k quick question
11:57 rb2k with 3.3.X, when I mount a volume
11:57 rb2k The fuse client thingy will fetch the .vol file and connect to all available hosts.
11:57 rb2k Correct?
11:58 rb2k and the vol file is generated when doing a "gluster volume create"
11:58 saurabh joined #gluster
12:17 aliguori joined #gluster
12:18 valqk joined #gluster
12:18 valqk hi guys
12:19 valqk I need some help. I have a split-brained 2 brick glusterfs 3.3.0 (3.3.0-ppa1~lucid3). My gluster command coredumps when I do volume heal uservol info split-brain. What should I do and how do I fix the split-brain. I'll enable quorum afterwards
12:19 valqk I have > 500 files and I should find automated solution
12:20 valqk if the info didn't crashed I'd have wrote a script but...
12:21 valqk also is it safe to have 3.3.1 server and 3.3.0 clients?
12:27 valqk anyone?
12:28 ctria joined #gluster
12:36 yongtaof joined #gluster
12:38 manik joined #gluster
12:39 satheesh joined #gluster
12:40 bennyturns joined #gluster
12:47 _BuBU joined #gluster
12:47 _BuBU Hi
12:47 glusterbot _BuBU: Despite the fact that friendly greetings are nice, please ask your question. Carefully identify your problem in such a way that when a volunteer has a few minutes, they can offer you a potential solution. These are volunteers, so be patient. Answers may come in a few minutes, or may take hours. If you're still in the channel, someone will eventually offer an answer.
12:49 _BuBU I've an initial gluster install with only one box with about 4.5To of datas on it I've installed months ago
12:49 _BuBU and I just added new brick
12:49 _BuBU and launch the healing stuff.. so it synchronize all datas to the new brick
12:50 _BuBU the problem is that it seems to heat most of resources to do the sync
12:50 valqk someone on my split-brain issue?
12:50 yongtaof When we use geo-replication of glusterfs 3.3 we got the following error
12:50 yongtaof E [mem-pool.c:503:mem_put] (-->/usr/lib64/glusterfs/3.3.0.5rhs/xlator/clu​ster/replicate.so(afr_setxattr_wind_cbk+0xe6) [0x7f72ca9122f6] (-->/usr/lib64/glusterfs/3.3.0.5rhs/xlator/cl​uster/replicate.so(afr_setxattr_unwind+0xe3) [0x7f72ca90fde3] (-->/usr/lib64/glusterfs/3.3.0.5rhs/xlato​r/cluster/distribute.so(dht_err_cbk+0xfd) [0x7f72ca6c811d]))) 0-mem-pool: invalid argument
12:51 _BuBU when doing lsof -n|grep myglusterdir|wc -l I get about 1640 opened files by glusterds
12:51 _BuBU glusterfs
12:51 jclift_ joined #gluster
12:51 yongtaof how about the load you have
12:52 _BuBU 24 :(
12:52 _BuBU on the box where the 4To are
12:52 valqk _BuBU, maybe a ionice will do the job?
12:53 _BuBU this is a 4cores 3Ghz+16Go ram
12:53 _BuBU the other one is 8core 3Ghz+16Go ram
12:54 _BuBU valqk: on which process should I need to do and what are the best practice on class/level ?
12:55 ctria joined #gluster
12:55 lala_ joined #gluster
12:56 valqk _BuBU, see which process it eating upo the io and try to ionice it
12:59 JoeJulian davis_: heal $vol info reporting just gfids means that those gfids (inodes) are out of sync and are scheduled for healing. I haven't figured out why they don't heal yet, though I just had some myself last night.
13:00 JoeJulian davis_: It doesn't necessarily mean split brain, see "heal $vol info split-brain" for the list of most recently found split-brain files.
13:00 satheesh joined #gluster
13:00 JoeJulian ~mount host | rb2k
13:00 glusterbot JoeJulian: Error: No factoid matches that key.
13:01 yongtaof some body know what's the problem?
13:01 JoeJulian valqk: Known problem with 3.3.0. Upgrade to 3.3.1
13:01 valqk JoeJulian, yeah I already did on one of the machines
13:01 yongtaof >/usr/lib64/glusterfs/3.3.0.5rhs/xlator/​cluster/distribute.so(dht_err_cbk+0xfd) [0x7f72ca6c811d]))) 0-mem-pool: invalid argument
13:01 valqk the glusterd didn't broke... maybe it'd be ok to upgrade both
13:02 yongtaof E [mem-pool.c:503:mem_put] (-->/usr/lib64/glusterfs/3.3.0.5rhs/xlator/clu​ster/replicate.so(afr_setxattr_wind_cbk+0xe6) [0x7f72ca9122f6] (-->/usr/lib64/glusterfs/3.3.0.5rhs/xlator/cl​uster/replicate.so(afr_setxattr_unwind+0xe3) [0x7f72ca90fde3] (-->/usr/lib64/glusterfs/3.3.0.5rhs/xlato​r/cluster/distribute.so(dht_err_cbk+0xfd) [0x7f72ca6c811d]))) 0-mem-pool: invalid argument
13:02 valqk JoeJulian, is there any possibility to recover split-brained files without deleting them? I read here and I see rm -f http://joejulian.name/blog/fixin​g-split-brain-with-glusterfs-33/
13:02 glusterbot <http://goo.gl/FPFUX> (at joejulian.name)
13:02 JoeJulian yongtaof: I haven't seen that. Have you tried asking a Red Hat engineer?
13:03 davis_ JoeJulian, Hmmm. Fair enough. I had a feeling I might have jumped the gun with saying "split-brain".
13:03 davis_ JoeJulian, thanks, btw.
13:03 JoeJulian ~mount server | rb2k
13:03 glusterbot rb2k: (#1) The server specified is only used to retrieve the client volume definition. Once connected, the client connects to all the servers in the volume. See also @rrnds, or (#2) Learn more about the role played by the server specified on the mount command here: http://goo.gl/0EB1u
13:04 mohankumar joined #gluster
13:04 yongtaof JoeJulian it seems like this bug https://bugzilla.redhat.com/show_bu​g.cgi?format=multiple&amp;id=844324
13:04 glusterbot <http://goo.gl/FDeQI> (at bugzilla.redhat.com)
13:04 yongtaof but I can't find the fix
13:04 JoeJulian valqk: Yes, if you want it not to crash, upgrade both (or more if you have more)
13:04 rb2k JoeJulian: thanks!
13:05 JoeJulian valqk: ... and no. you cannot heal split-brain without manually deciding which version of the file in question is the sane one.
13:05 NeatBasis joined #gluster
13:06 JoeJulian davis_: You're welcome. :) What I did was to delete the file out of the .glusterfs tree on the bricks in question. I didn't have the time to try to go any deeper. I'm pretty certain I caused mine when I was deleting files for an actual split-brain.
13:07 JoeJulian bug 844324
13:07 glusterbot Bug http://goo.gl/XqVTp medium, high, ---, rabhat, CLOSED INSUFFICIENT_DATA, core: possible memory leak
13:07 JoeJulian :/
13:07 valqk JoeJulian, so I should delete the file in question on second machine (the one that left behind and have bad files) only and then heal again?
13:08 davis_ JoeJulian, - Yes, that's how I started trying to fix them -- the files had the same md5sum on the bricks, so I blew the relevant .glusterfs file away too. Seemed to work :)
13:08 JoeJulian valqk: yes, make sure you follow those instructions or you'll end up like davis_ and me with gfid files hanging around.
13:09 valqk the one I've pasted?
13:09 JoeJulian Yes
13:09 valqk ok 10x
13:10 JoeJulian yongtaof: All I can suggest is opening a new bug and including the core dump Amar was asking for. Be sure to reference the previous bug too. Go to the following link to file a bug report.
13:10 glusterbot http://goo.gl/UUuCq
13:11 JoeJulian yongtaof: ... or go through your Red Hat support. They'll do all the bug reporting for you.
13:12 satheesh joined #gluster
13:16 JoeJulian Just to be clear, I'm not trying to make you go away, yongtaof, just thinking that if I were paying the amount of money you are for support, I'd make them work for it. :D
13:16 valqk JoeJulian, should I sto the glusterfsd when I delete them? I think not (they are makerd spli-brained?) but wanted to ask
13:16 JoeJulian valqk: No need.
13:17 valqk JoeJulian, thanks
13:17 yongtaof It's not crashed so no core dump
13:18 valqk oh and something else. in the link they say to delete the files from the mounted dir, not from the physical dir....yes?
13:18 valqk I suppose I have to delete them from the physical dir so the gfd can sync them again?
13:20 _BuBU is it possible to stop the healing ?
13:20 JoeJulian valqk: I thought I specified to delete from the brick...
13:20 JoeJulian Now you're going to make me re-read my own page...
13:20 valqk :-D
13:21 JoeJulian _BuBU: perhaps you just want to reduce the number of simultaneous background self-heals.
13:21 koubas joined #gluster
13:21 valqk JoeJulian, it's not stated clearly but on thrid read I suppose you'r meaning from the fs on the brick :)
13:21 JoeJulian _BuBU: you can do that by setting cluster.background-self-heal-count
13:22 JoeJulian valqk: Yes, thank you, I'll look at rewording that.
13:22 _BuBU JoeJulian: ok but for an already running self-heal ?
13:23 valqk JoeJulian, sorry for asking stupid questions, just want to be sure before rm :) not afterwards
13:23 JoeJulian _BuBU: I would expect that any currently healing file will finish (but not necessarily) but yes, you can reduce the count on a life healing session.
13:23 JoeJulian valqk: If you're worried, you could always just mv it somewhere.
13:24 JoeJulian somewhere outside of the brick, that is.
13:24 JoeJulian s/life/live/
13:24 glusterbot What JoeJulian meant to say was: _BuBU: I would expect that any currently healing file will finish (but not necessarily) but yes, you can reduce the count on a live healing session.
13:24 valqk JoeJulian, good idea though but I have > 1500 files.. :-D
13:24 valqk anyway I got it :)
13:24 hybrid5121 joined #gluster
13:25 bulde joined #gluster
13:25 vpshastry1 joined #gluster
13:25 JoeJulian .... life healing you'd need a guru for...
13:25 _BuBU JoeJulian: I just did: gluster volume set customers cluster.background-self-heal-count 0
13:25 JoeJulian NO!
13:26 JoeJulian 0???
13:26 JoeJulian The first file your client comes across that needs healing it'll just hang there waiting for the file to be healed.
13:26 valqk JoeJulian, and last one :) I have a list with both /user/.config/chromium/Default/Bookmarks.bak and <gfid:9c8a02f9-7337-44f9-b812-880c5961ce69> because I'm wrtiting a script - should I ignore lines with <gfid:> and read only files?
13:27 _BuBU JoeJulian: so which value should I need to set ?
13:27 JoeJulian valqk: I would at first. Once you're sure the files are healed, the gfid files are probably superfluous and can just be deleted.
13:28 hybrid5121 joined #gluster
13:28 JoeJulian _BuBU: Something high enough to keep your clients satisfied with their ability to open files, but low enough to satisfy their performance needs.
13:28 valqk JoeJulian, aha. OK. I'm deleting just the files and gfids by your manual then I'll ask again. :) thanks again!
13:28 JoeJulian I usually shoot for a self-healing load of around 20
13:29 semiosis it's a background self heal, client shouldn't block, but will work with the "good" replica, right?
13:29 JoeJulian semiosis: "should"
13:29 semiosis heh ok
13:30 JoeJulian I still haven't figured out all the conditions that makes it block on self-heals.
13:30 dustint joined #gluster
13:30 JoeJulian Mostly sure it's just writes...
13:32 hybrid512 joined #gluster
13:33 partner hmph, surprisingly the open file problem doesn't repeat today when starting the rebalance again
13:34 JoeJulian definition of insanity
13:37 partner :)
13:38 partner well this just proves you can't test properly until in production, i never had 3+ million files on my testing instances. yes, i should have had but i didn't
13:39 theron joined #gluster
13:39 koubas hi, i see answers for most of questions, because of which i've came, answered just a minutes ago, great :) But i still have some: 1) if i deleted files without deleting corresponding gfid file (don't use Joe's script with wildcards!!!) , will the gfid files stay on a brick forever, will they appear somewhere in the log / gluster heal info? Should i delete them myself using ie. "find -links 1 -type f ...." ?  2) What does "healing failed" (file sta
13:39 koubas ys inaccessible on clients) exacly mean? How to solve it? Thanks
13:40 _BuBU JoeJulian: indeed 20 seems a good value too for me :)
13:40 _BuBU thx for your help
13:42 JoeJulian koubas: They'll probably appear somewhere. If the file was split-brain and you left one good copy on some server, it'll probably get re-hardlinked to the filename and turn it split-brain again.
13:42 ndevos ,,(ports) is going to need a change whan 3.4 is released: http://review.gluster.org/4840
13:42 glusterbot glusterd's management port is 24007/tcp and 24008/tcp if you use rdma. Bricks (glusterfsd) use 24009 & up. (Deleted volumes do not reset this counter.) Additionally it will listen on 38465-38467/tcp for nfs, also 38468 for NLM since 3.3.0. NFS also depends on rpcbind/portmap on port 111.
13:42 JoeJulian ndevos: Yeah... That's going to get messy...
13:43 JoeJulian koubas: "find -links 1 -type f ...." sounds like a good idea. That should work.
13:44 JoeJulian Unless, of course, it already has other hardlinks, but that would probably break anyway.
13:44 JoeJulian What does "healing failed" mean? I wish I knew. Try touching the file through the client and see what shows up in the client log.
13:46 hybrid512 joined #gluster
13:47 ehg hi. we're experiencing weirdness with directory quotas, e.g. quotas not being enforced sometimes and quotas needing an umount on the client side to be enforced. has anyone got any ideas?
13:48 koubas JoeJulian: to make it clear, i mean files listed by "v heal $vol info heal failed"
13:49 H__ i'm running release-3.3 head in my test setup and I cannot add data to it, nor can I erase data from it. I see lots of d--------- directories and "E [marker.c:2076:marker_setattr_cbk] 0-vol01-marker: Operation not permitted occurred during setattr of <nul>" in all the brick logs.
13:50 Supermathie [2013-04-24 09:50:33.794788] I [afr-self-heal-data.c:712:afr_sh_data_fix] 0-gv0-
13:50 Supermathie replicate-0: no active sinks for performing self-heal on file /fleming1/db0/ALTU
13:50 Supermathie S_flash/archivelog/2013_04_22/.o1​_mf_1_1093__1366653909363181_.arc
13:51 Supermathie D'oh, sorry about multi-line paste. Any idea about that ^ ?
13:51 Supermathie One of the files that had an error in https://bugzilla.redhat.com/show_bug.cgi?id=955753. I have tons of these.
13:51 glusterbot <http://goo.gl/oeaKn> (at bugzilla.redhat.com)
13:51 glusterbot Bug 955753: high, unspecified, ---, vraman, NEW , NFS SETATTR call with a truncate and chmod 440 fails
13:51 JoeJulian "no active sinks" means there's a source that's marked as having pending updates for a replica. That replica is not online.
13:52 Supermathie All volumes are online. This file shows up in both copies as needing healing. It's not split brained.
13:52 JoeJulian ehg: There's a whole re-working of the quota system being designed. Not sure where that is in the development process.
13:53 ehg JoeJulian: ah, thanks. would it be helpful to submit a bug report?
13:53 JoeJulian Sure. Worst that can happen is they close it as a duplicate.
13:54 JoeJulian Supermathie: Which log is that "no active sinks" in?
13:54 ehg ok, i'll do that then :)
13:54 JoeJulian koubas: That's how I understood the question.
13:56 Supermathie /var/log/glusterfs/glustershd.log:[2013-04-24 09:50:33.794788] I [afr-self-heal-data.c:712:afr_sh_data_fix] 0-gv0-replicate-0: no active sinks for performing self-heal on file /fleming1/db0/ALTUS_flash/archivelog/2013_​04_22/.o1_mf_1_1093__1366653909363181_.arc
13:57 JoeJulian H__: Looks like marker's being called with a null filename (or something in the hash). Have you looked in marker.c to see what it's doing?
13:57 Supermathie JoeJulian: The copies are actually in sync, but glusterfs got confused when:
13:57 Supermathie [2013-04-22 13:57:22.073354] W [client3_1-fops.c:707:client3_1_truncate_cbk] 0-gv0-client-9: remote operation failed: Permission denied
13:58 Supermathie [2013-04-22 13:57:22.073496] W [client3_1-fops.c:707:client3_1_truncate_cbk] 0-gv0-client-8: remote operation failed: Permission denied
13:58 guigui1 joined #gluster
13:59 lh joined #gluster
13:59 lh joined #gluster
13:59 JoeJulian Supermathie: EPERM, iirc, is the default error that gets returned if something breaks and no other error is indicated. I'd have to look at the source to be sure.
13:59 Supermathie Seriously, see the bug I linked. It explains things.
14:01 JoeJulian I've been monitoring that. No clue on it though and I haven't really put any effort into it as I don't use nfs.
14:01 koubas JoeJulian: ok, tanks for your answers
14:01 13WAA0PVV joined #gluster
14:02 sjoeboo joined #gluster
14:03 JoeJulian Supermathie: But glustershd shouldn't have anything to do with that and should be able to heal, or at least clear the heal status, of a file.
14:03 JoeJulian Have you checked the ,,(extended attributes)?
14:03 glusterbot (#1) To read the extended attributes on the server: getfattr -m .  -d -e hex {filename}, or (#2) For more information on how GlusterFS uses extended attributes, see this article: http://goo.gl/Bf9Er
14:05 JoeJulian koubas: You're welcome. I "shell" be happy to help again if you need it. (Ok, I'm reaching for a pun, but hey... where else can you go from tanks? :D )
14:06 Supermathie JoeJulian: They've got the same gfid and pathinfo. gluster just seems to lose its mind over them.
14:06 bugs_ joined #gluster
14:07 JoeJulian No trusted.afr stuff? That does sound interesting.
14:08 neofob left #gluster
14:08 JoeJulian pathinfo?
14:08 Supermathie JoeJulian: Was that to me? Trying to read trusted.afr on the file just crashed the mount.
14:09 JoeJulian First, that shouldn't happen. Second, I meant directly on the bricks, sorry for the confusion.
14:10 JoeJulian Hmm, can't duplicate the crash on my volume, but I use fuse mounts.
14:11 Supermathie same fuse mount
14:11 Supermathie Sorry, not crashed, wrong logfile. Just hung hard.
14:11 Supermathie Direct on the bricks:
14:11 Supermathie trusted.afr.gv0-client-0=0​x000000010000000000000000
14:11 Supermathie trusted.afr.gv0-client-1=0​x000000010000000000000000
14:11 satheesh joined #gluster
14:11 Supermathie trusted.afr.gv0-client-0=0​x000000010000000000000000
14:11 Supermathie trusted.afr.gv0-client-1=0​x000000010000000000000000
14:12 JoeJulian Yeah, that's split-brain
14:13 JoeJulian Since you're sure they're both good, you can just use setfattr to set those keys to all zeros to clean that.
14:13 JoeJulian I would add that to your bug report, btw.
14:14 JoeJulian Sounds like a race condition
14:14 Supermathie Presence of trusted.afr means that a brick has an unpropagated change?
14:14 JoeJulian Correct.
14:14 JoeJulian Well, a non-zero value
14:14 Supermathie non-zero ... yeah ;)
14:15 JoeJulian Check that second link glusterbot mentioned on extended attributes for the details.
14:15 Supermathie So that probably got set when the "remote operation failed" on each brick
14:15 JoeJulian probably, or it failed because it was already set.
14:15 koubas JoeJulian: haha, ok, maybe I'll leave my T28 in the garage... if you answer my bonus question ;) Is it enough to set cluster.background-self-heal-count only to all of my volumes? It seems that it gets projected only to $vol-fuse.vol files and not to glustershd-server.vol. And i blame glustrfshd to be responsible for io/cpu killing my bricks :/
14:15 JoeJulian pranithk will know
14:17 JoeJulian koubas: That's an interesting find and I was not aware of that. I'll have to experiment...
14:17 Supermathie JoeJulian: Not likely at all - immediately before the failed RPC was a successful write call. And the second brick shouldn't have ANY unreplicated changes, writes were only going to the first node.
14:17 glusterbot New news from newglusterbugs: [Bug 956245] 3.3git-fdde66d does not accept new files, does not allow to erase directory trees <http://goo.gl/TSPmy> || [Bug 956247] Quota enforcement unreliable <http://goo.gl/YlMA6>
14:18 JoeJulian Supermathie: nah, writes are going to both. The nfs service is just an nfs translation for a gluster client. That client writes to both replicas (unless you're saying that one was down during the write)
14:18 hagarth Supermathie: have you enabled quota or geo-replication for marker to be active?
14:18 JoeJulian hagarth! :D
14:19 Supermathie JoeJulian: Both were up, but both returned the permission denied on the write failure
14:19 Supermathie hagarth: no
14:19 hagarth JoeJulian: ltns! :D
14:20 JoeJulian Yeah, neither of us has been staying up late enough.
14:20 Supermathie And I have about 100 files in this state, all exact same conditions.
14:21 JoeJulian Supermathie: out of curiosity, does "gluster volume heal $vol info split-brain" list those 100 files?
14:21 hagarth JoeJulian: possible. I have been in a read-only mode of late over here as well.
14:21 Supermathie JoeJulian: Nope, split-brain list is empty
14:22 JoeJulian hagarth: read-only doesn't sound all that fun...
14:22 hagarth JoeJulian: yeah, been quite busy with a bunch of things..
14:25 koubas_ joined #gluster
14:25 manik joined #gluster
14:36 nicolasw joined #gluster
14:42 _BuBU quick question: I've raid5 for sata disks. What is the best IO scheduler to use with glusterfs ?
14:42 _BuBU using CFQ for timebeing.
14:43 vincent_vdk joined #gluster
14:43 JoeJulian I've heard conflicting arguments. I prefer deadline.
14:44 Supermathie _BuBU: deadline is generally the preferred scheduler. HW RAID? BBWBC?
14:44 _BuBU HW
14:44 _BuBU Dell Perc5
14:45 Supermathie BBWBC?
14:45 Supermathie deadline for sure, either way :p
14:45 _BuBU ok thx :)
14:45 Supermathie RHEL? CentOS? "tuned-adm profile throughput-performance"
14:46 Supermathie Or latency-performanceâ... don't know how big of a difference it makes.
14:46 _BuBU ubuntu
14:47 daMaestro joined #gluster
14:47 _BuBU ubuntu server of course :)
14:48 _BuBU using ArchLinux as desktop.
14:51 andrewjs1edge joined #gluster
14:51 nicolasw how about using the raw hdds without any RAID? is deadline also recommended?
14:54 Supermathie I'd probably just turn off all the data safety mechanisms and go deadline :D
14:54 nicolasw :P
14:55 Supermathie (actually, that's kind of what I'm doing on one setup, except SSD or Fusion and not SATA. Yeah, deadline.)
14:55 nicolasw how much perf improvement did you get?
14:56 Supermathie That's where I started. I doubt I'd get any - the SSDs are way way faster than glusterfs so far.
14:56 nicolasw yes sure
14:57 aravindavk joined #gluster
15:00 _BuBU left #gluster
15:10 spider_fingers left #gluster
15:24 partner hmph, stupid be, of course it doesn't leave files now open as its not moving anything..
15:24 partner me*
15:26 jag3773 joined #gluster
15:28 sjoeboo joined #gluster
15:30 lalatenduM joined #gluster
15:39 rastar joined #gluster
15:40 Umarillian joined #gluster
15:42 jbrooks joined #gluster
15:50 awickhm joined #gluster
15:50 awickhm left #gluster
15:50 awickhm joined #gluster
15:50 awickhm left #gluster
15:50 awickhm joined #gluster
15:53 sjoeboo joined #gluster
15:53 DEac- joined #gluster
15:54 _pol joined #gluster
15:55 _pol joined #gluster
15:57 Umarillian If I am unable to write to a volume but able to write to the bricks locally as root where would be the best place to check in the logs for the cause? Apologies, I am quite new to gluster.
15:59 JoeJulian Umarillian: The client logs in /var/log/glusterfs is usually a good place to start.
15:59 H__ On add-brick i get "is already part of a volume", but it really is not. Nothing extra in logs. What can I check ?
16:00 Rocky_ joined #gluster
16:00 JoeJulian Umarillian: "gluster volume status", and "gluster volume heal $vol info split-brain" might be informative as well.
16:00 JoeJulian ~reuse brick | H__
16:00 glusterbot H__: To clear that error, follow the instructions at http://goo.gl/YUzrh or see this bug http://goo.gl/YZi8Y
16:01 H__ JoeJulian: I'll try, but it was never part of a volume. this is new hardware
16:01 Supermathie ~reuse brick | glusterbot
16:01 glusterbot glusterbot: To clear that error, follow the instructions at http://goo.gl/YUzrh or see this bug http://goo.gl/YZi8Y
16:01 * Supermathie whistles innocently
16:03 nueces joined #gluster
16:05 Umarillian Crazy; do you need to mount the volume locally for it to be writeable remotely?
16:05 Umarillian JoeJulian: Thanks.
16:06 JoeJulian Umarillian: nope
16:07 JoeJulian Umarillian: Did you find anything in your log? Feel free to use fpaste or dpaste to share the log if you need help diagnosing this.
16:08 Umarillian Well the instant I mounted the volume locally all of a sudden I can now write to it through NFS and on another server.
16:08 Umarillian I mounted it with glusterfs client on the server itself and then immediately I was able to write to it from wherever. Hopefully it's just a coincidence and something else occured.
16:08 Umarillian occurred*
16:09 JoeJulian strange
16:10 awickhm I have a glusterfs+swift box I'm trying to integrate with cloudstack. I have the url,account,username,and key entered correctly but keep getting a 401 error in /var/log/messages on the swift node. Are there any additional permissions that need to be added into the proxy-server.conf file to allow a remote host to access the volume?
16:10 _pol joined #gluster
16:11 satheesh joined #gluster
16:11 andrewjs1edge hi all, any tips on being able to get AWS Ubuntu 12.04.1 glusterfs to survive a reboot? mountall is causing glusterfs daemon to not start at all.
16:11 H__ JoeJulian: did that on both nodes, (the two setfattr, there was no .glusterfs) and restarted both glusterd. Still does not work : E [glusterd-brick-ops.c:1435:glusterd_op_add_brick] 0-: Unable to add bricks and E [glusterd-op-sm.c:2806:glusterd_op_ac_commit_op] 0-management: Commit failed: -1
16:12 H__ I hope you have more ideas where I can look for the problem :)
16:15 JoeJulian awickhm: I'm running on 2 hours sleep and am feeling really lazy... ;) Check http://github.com/joejulian/ufopilot . I had a quick walkthrough on getting a working swift volume in the readme.
16:15 glusterbot Title: joejulian/ufopilot · GitHub (at github.com)
16:15 JoeJulian andrewjs1edge: You're using the ,,(ppa)?
16:15 glusterbot andrewjs1edge: The official glusterfs 3.3 packages for Ubuntu are available here: http://goo.gl/7ZTNY
16:18 semiosis andrewjs1edge: mountall is causing glusterfs daemon to not start at all?  what?
16:18 semiosis could you please explain that more or provide some logs (on pastie.org)
16:18 JoeJulian H__: Can you fpaste that log? There's lines leading up to that error that would tell me more.
16:20 manik1 joined #gluster
16:20 andrewjs1edge semiosis: I believe that's what is going on - getting into AWS instance to look is impossible, and getting the system log from the console shows mountall errors
16:21 andrewjs1edge semiosis: killed those instances and trying again. Using your PPAs :).
16:21 H__ JoeJulian: http://fpaste.org/itGa/   I see something interesting; one brick has a .glusterfs now, and a glusterfsd servicing it.The other side does not. And also does not start one on 'start glusterd'
16:21 glusterbot Title: Viewing Paste #294368 (at fpaste.org)
16:22 jds2001 joined #gluster
16:22 Azrael joined #gluster
16:22 semiosis andrewjs1edge: try adding 'nobootwait' to your fstab options
16:22 semiosis andrewjs1edge: for the glusterfs mounts
16:23 H__ a 'gluster volume info' *does* show the new brickpair
16:23 andrewjs1edge semiosis: about to go back through it - I'll let you know
16:26 JoeJulian H__: What's the command you're trying to execute?
16:27 H__ an add-brick
16:28 H__ with a replica pair on two different nodes
16:28 JoeJulian I mean the actual command. Looking to pair the syntax with what I'm reading in the source code.
16:33 H__ JoeJulian: oh sorry, here it is : gluster volume add-brick vol01 stor3-idc1-lga:/gluster/f stor4-idc1-lga:/gluster/f
16:49 _pol joined #gluster
16:50 _pol joined #gluster
16:53 rwheeler joined #gluster
17:06 andrewjs1edge semiosis: is there a preference as to boot order? I set the nobootwait per your recommendation and now they both come up. However, the replication isn't replicating.
17:06 andrewjs1edge semiosis: sorry, sort of new to this system
17:06 theron joined #gluster
17:07 andrewjs1edge semiosis: peer status shows connected
17:07 zaitcev joined #gluster
17:08 rwheeler joined #gluster
17:10 _pol joined #gluster
17:11 hagarth joined #gluster
17:11 _pol joined #gluster
17:24 Keawman joined #gluster
17:24 H__ JoeJulian: I got it to work ! I did multiple things to get at this : add all volume nodes into all /etc/hosts . They're all properly in DNS already so this should not be needed. A glusterd stop and start then resulted in a glusterfsd for the brick half that did not have one.
17:26 Keawman Does anyone know how to remove/delete geo-replication
17:26 H__ Then 4 other new bricks needed a 'stop glusterd' *before* the two setfattrs (none had a .glusterfs/). The xattrs they had must have been the result of the initial failed add-brick with all replica pairs in one go.
17:28 theron joined #gluster
17:29 Keawman i scewed up on my geo-replication setup somehow and the status is defunct, but i'm unable to clear it out or remove it via commands that i can find
17:30 H__ my last geo-replication attempt was with 3.2.5, which failed after 45 hours crawling the tree and then startgin the crawl all over again. So i cannot help you there
17:30 Keawman wow so you are saying geo-replication is crap to begin with?
17:31 H__ it supposedly works on 3.3.1 and later. but i did not try that yet
17:31 Keawman ah ok...i'm experimenting with 3.4alpha
17:33 H__ cool, let me know hoe it goes ;-) (I hope you have/test with about a million directories)
17:44 Umarillian Is there a method of tracking IO on each device built into gluster or is it best to rely on third party applications for that?
17:49 mtanner_ joined #gluster
17:49 Supermathie Umarillian: Lots! volume top <volname> ...
17:49 aravindavk joined #gluster
17:49 Supermathie Umarillian: volume profile <volname> info
17:50 Supermathie Ohh... nice.
17:50 Supermathie %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
17:50 Supermathie 13.02     114.81 us       8.00 us 1664199.00 us        1117609        READ
17:50 Supermathie 33.07      64.58 us      13.00 us 3594803.00 us        5045416    FXATTROP
17:50 Supermathie 44.37     173.17 us      16.00 us 1278912.00 us        2524485       WRITE
17:53 Supermathie errr... so if I were to look in gluster source to change the order of operations performed by gluster in a multi-property NFS SETATTR call, where should I start?
17:54 Umarillian Oh that is great.
17:58 chirino joined #gluster
18:04 JoeJulian ~hack | Supermathie
18:04 glusterbot Supermathie: The Development Work Flow is at http://goo.gl/ynw7f
18:04 JoeJulian Supermathie: Then I'd ask in #gluster-dev or the gluster-devel mailing list
18:09 H__ Hi JoeJulian :) what is 3.3.1's rebalance status in your opinion ? I need to make space on some nearly filled bricks fast. What about reading files from the nearly-full brick, erasing those on the gluster volume (and thus also on the bricks) and then reinserting them in the gluster volume so they get DHT's "hopefully" to the other bricks ?
18:10 semiosis andrewjs1edge: you'll need to look at your client log files to see whats going wrong.  check /var/log/glusterfs/the-mount-point.log, feel free to pastie.org it here too
18:11 andrewjs1edge semiosis: thanks - found out I was the problem :). All is working now. Thanks for your help.
18:11 semiosis andrewjs1edge: peer status usually isnt much help (at least right off the bat) diagnosing client issues.  volume status may be, but the client log is the best place to start.
18:11 semiosis andrewjs1edge: oh great. yw
18:17 mtanner_ joined #gluster
18:29 Keawman semiosis, whould you have any idea on how to remove geo-replication configs
18:29 semiosis no, sorry
18:30 Keawman i have two failed attempts showing in the status and can't find any info on how to clear them out and start over
18:30 Supermathie Keawman: volume geo-replication <srcvol> stop
18:30 Keawman Supermathie, I did try that thanks though
18:31 Keawman basically one attempt is faulty and the other is defunct under their statuses
18:36 Supermathie Keawman: Yeah... at that point I'd stop gluster and hack away with vi
18:37 Supermathie But that's me.
18:42 lpabon joined #gluster
18:44 JoeJulian joined #gluster
18:59 jskinner_ joined #gluster
19:27 t35t0r joined #gluster
19:27 t35t0r joined #gluster
19:41 rwheeler joined #gluster
19:54 partner hmm ok rebalance is moving files again and lsof shows deleted files and open files count keeps growing
19:55 partner and all of them seem to be the ones rebalanced elsewhere, no traces of access to those files on client side
19:55 partner so, is it normal or is there something i should tune not to run out of file handlers as i did yesterday?
19:57 partner 3.3.1-1 in wheezy still running here, closing to 2 million processed files shortly
20:00 _pol joined #gluster
20:01 Rorik joined #gluster
20:12 JoeJulian ran out of file handles???? That does seem strange.
20:12 JoeJulian The rebalance takes place on the server where you started the rebalance.
20:14 partner it happened yesterday and it is happening again, lsof keeps piling deleted files (which, with random selections, i confirm to be moved to other brick)
20:15 partner i'm now just continuing from where i was forced to stop yesterday as the receiving server with new brick actually ran out of file handlers after 800k or so
20:16 partner open count is 13408, rebalanced files is 12503 - and the value before i started was roughly 900
20:16 partner so it matches
20:18 partner from yesterday. when i stopped rebalance the receiving new server almost immediately dropped the open files count to "zero" but the old one kept them for ~12 hours
20:19 partner and then it went towards zero with steep curve. i don't have enough knowledge to have any idea why it took so long on the source, i even stopped the volume and umounted clients and tried to get the number down with no success (thought lsof no longer listed much any open files)
20:22 partner i can upload few graphs somewhere if it helps to visualize anything..
20:23 partner but other than that if nobody has any idea i guess i have the burden of collecting the evidence and trying to repeat the issue first on my side and then share the results
20:24 JoeJulian Have you filed a bug on it already?
20:25 partner nope as i want first know who to "blame", i always point the finger to myself first
20:26 partner i don't have anything concrete other than the above explanation, did 16 hours day yesterday fixing the production so i'm a bit tired to file detailed reports right now :o
20:26 JoeJulian No, that sounds like a bug. File something generally about rebalance leaving filehandles open and using up all the kernel's available handles.
20:27 JoeJulian You can always add details later. This sounds like it might be something where Pranith or one of those guys looks at it and slaps themselves in the forehead.
20:31 partner https://bugzilla.redhat.com/show_bug.cgi?id=928631 - seems its filed already
20:31 glusterbot <http://goo.gl/3Xruz> (at bugzilla.redhat.com)
20:31 glusterbot Bug 928631: urgent, high, ---, kaushal, ASSIGNED , Rebalance leaves file handler open
20:41 _pol joined #gluster
20:47 jag3773 joined #gluster
20:53 brunoleon joined #gluster
21:06 partner added a comment but the original is exactly what i experienced too
21:07 partner reminds, were there any "official" fix for the debian problem of not mounting on the client side without manual tweaks to random places?
21:08 partner Joe you probably remember the case as you blogged about it but not sure if anything was actually done to somehow fix the packaging perhaps
21:09 semiosis partner: use tab completion to get full nicknames, when one's own nickname is complete (and spelled correctly) in a message the client highlights it
21:10 semiosis irc client*
21:11 semiosis about the debian mounting problem, there's not (afaik) a single problem that prevents everyone from mounting at boot.  there are specific cases where mounts fail at boot, but that is hardly a universal thing
21:11 semiosis if you could provide client logs showing a failed mount attempt at boot we could help you resolve that issue
21:12 semiosis or if you have a link to a specific problem (bug id, mailing list post, etc) then I'll take a look and see if I can find any updates on that specific issue
21:14 partner semiosis: my keyboard is lacking tab button, i need some extra effort to get that pressed.. :)
21:14 semiosis ouch thats not good, sorry to hear that
21:14 partner well old laptop already, not the only one missing..
21:14 JoeJulian Ctrl-I
21:15 semiosis joe
21:15 semiosis didnt work
21:15 JoeJulian Well that's lame...
21:15 semiosis hahaha
21:15 partner semiosis: nevertheless you were around when we debugged that here and bunch of us were able to produce it in squeeze/wheezy debians, vanilla ones
21:15 semiosis hmm
21:15 partner i don't remember anymore if any bug reports were filed, its month or so ago
21:16 semiosis more than a month!
21:16 semiosis i remember being unable to reproduce the problem on debians, though i did run into some weird intermittent issue on ubuntus
21:17 JoeJulian It has to do with a glusterfs mount depends on glusterd being started, but the volume isn't necessarily started fast enough for the mount to proceed so the mount may fail.
21:18 JoeJulian What's needed is some way of waiting for the status of the volume(s) to be all "Y" before the gluster-server job is considered finished.
21:19 semiosis ohhhh well in that case
21:19 JoeJulian Hehe
21:19 semiosis if *all* your bricks are local to the client, why bother using glusterfs at all :P
21:19 partner umm the client side doesn't have glusterd running
21:19 partner no no
21:19 partner remote client
21:19 semiosis a purely remote client failed to mount at boot?  that's unlikely
21:19 JoeJulian In my case, two servers that were both clients.
21:20 semiosis JoeJulian: why couldnt the client get the volinfo from localhost then find bricks on the other server?
21:20 semiosis in my tests that worked fine
21:20 JoeJulian It worked if the other server was up, but if both servers were down....
21:21 semiosis if both servers are down, your clients are SOL
21:21 semiosis yes i admit this is a small annoyance
21:22 partner ok, i read you don't believe me once again, it took some time last time to convince too :)
21:22 semiosis but imho hardly a blocking issue for regular deployments
21:22 JoeJulian And there was something about if I had one server running, booted the other server that was also a client, then rebooted the other server after it was mounted, something broke. It's been too long now.
21:22 semiosis partner: could you find the irc logs?  i'd like to take another look if you could find the info
21:22 partner not blocking but without putting a separate mount command to say rc.local you simply don't get your volume mounted on client
21:22 partner semiosis: sure, just a sec
21:22 partner you had it all nicely on web..
21:23 semiosis theres a logs link in the /topic, just type /topic to see it
21:24 semiosis oh look there's a 3.4 alpha3 in ,,(qa releases)
21:24 glusterbot The QA releases are available at http://bits.gluster.com/pub/gluster/glusterfs/ -- RPMs in the version folders and source archives for all versions under src/
21:26 partner semiosis: that might work: http://irclog.perlgeek.de/g​luster/2013-01-30#i_6393955
21:26 glusterbot <http://goo.gl/0pUqi> (at irclog.perlgeek.de)
21:27 H__ Question : I need to make space on some nearly filled bricks fast. Would this work -> read files from the nearly-full brick, erase those on the gluster volume and then reinsert them in the gluster volume. Would this effectively rebalance them over all bricks ?
21:27 JoeJulian No, the hash would still put the file on the same brick unless you've done a fix-layout or a targeted fix-layout
21:28 semiosis JoeJulian: would glusterfs allocate the new file creation on another brick?  i thought it tried to do that when bricks were nearly full
21:28 semiosis there's an option for that right?
21:28 H__ there's new bricks added
21:29 H__ a rebalance fix-layout takes days too
21:29 H__ how does one do a targeted fix-layout ?
21:29 JoeJulian @google gluster targeted fix layout
21:29 glusterbot JoeJulian: [Gluster-users] Targeted fix-layout?: <http://goo.gl/UEtCx>; Administration Guide - Using Gluster File System: <http://goo.gl/bzF5B>; HekaFS » GlusterFS Algorithms: Distribution: <http://goo.gl/8lBlP
21:29 glusterbot JoeJulian: distribution/>; Configuration of High-Availability Storage Server Using GlusterFS ...: <http://goo.gl/cOQm4>; Full Text Bug Listing: <http://goo.gl/7sXSj>; Setting Up Clients - Red Hat Customer Portal: <https://access.redhat.com/site/documentation/en- (1 more message)
21:29 JoeJulian @meh
21:29 glusterbot JoeJulian: I'm not happy about it either
21:30 JoeJulian I should have used @lucky
21:30 JoeJulian Jeff darcy explained how on that first link
21:30 JoeJulian s/d/D/
21:30 glusterbot What JoeJulian meant to say was: Jeff Darcy explained how on that first link
21:32 * semiosis finally getting the 3.4 alpha uploaded to the ,,(ppa)
21:32 glusterbot The official glusterfs 3.3 packages for Ubuntu are available here: http://goo.gl/7ZTNY
21:32 semiosis https://launchpad.net/~semiosis​/+archive/ubuntu-glusterfs-3.4 actually
21:32 glusterbot <http://goo.gl/u33hy> (at launchpad.net)
21:33 H__ so, a trusted.distribute.fix.layout on every directory that holds files on the nearly full brick pair. And then the move out of gluster volume and back in ?
21:35 JoeJulian Ah, semiosis, here's what I was dealing with: a boot-order issue with lucid that was causing the client to try to mount before the filesystems were mounted. That, in turn, caused the gluster-server upstart job to trigger as well. Since the brick was not yet mounted when glusterd started, this caused the brick server to exit which caused the client to only be connected to the remote server.
21:35 semiosis JoeJulian: lucid!  running cobol on that ;)
21:35 JoeJulian Hehe, no, that was a contract job.
21:36 * johnmark snickers
21:36 johnmark if I were a Cobol engineer, I would have it made!
21:36 JoeJulian Isn't lucid newer than Fedora 6 anyway? :D
21:36 semiosis lucid had "issues" with the boot ordering, it's been much refined since then, i wouldnt spend a minute troubleshooting upstart mount problems on lucid
21:36 johnmark because there are so few of us, I would *never* be out of work
21:37 JoeJulian cobol engineer = oxymoron
21:37 JoeJulian cobol artist, maybe. hacker, definitely
21:37 partner semiosis: anyways, the problem is real and reported by several persons so is there something i can do for you to prove the case on debian side? i hate to think many people do rc.local glue to get pass the issue.
21:38 partner i guess i'll first provide the logs from failed mount on boot ?
21:38 JoeJulian +1
21:38 partner booting..
21:39 jbrooks joined #gluster
21:40 semiosis partner: dont worry about what problems other people may or may not have had.  if you are having a problem, we will try to help.  get us those logs :)
21:41 partner i can figure out my problems but if i see it affects more than me i usually raise the issue
21:42 semiosis partner: also whats your fstab line currently?  do you have the _netdev?
21:42 H__ JoeJulian: I don't get this. I've added new bricks and they're being used by new files. No rebalance fix-layout was done. What would make a filename re-appear on the same brick when it is removed and later remade in the changed volume ?
21:43 Supermathie H__: Oohhh... lemme take a stab at this... the DHT entries on the underlying directory still controlled the layout and hadn't been fixed?
21:44 semiosis H__: can new client mounts write to the new bricks, but old/existing client mounts not?
21:44 JoeJulian "No rebalance fix-layout was done."? And it's using the new bricks? That's impossible... <sigh> See joejulian.name/blog/dht-misses-are-expensive/ where I show how the hash calculation is done and used
21:45 H__ Supermathie: no fix-layout command has been used, yet the new bricks are being used just fine.
21:45 H__ semiosis: afaik all clients can write data to the new bricks
21:46 semiosis H__: it's possible (though maybe unlikely) for a file to hash to the same brick as before you expanded
21:46 JoeJulian But if the has ranges haven't been reallocated, it's impossible.
21:46 JoeJulian s/has/hash/
21:46 glusterbot What JoeJulian meant to say was: But if the hash ranges haven't been reallocated, it's impossible.
21:48 BSTR joined #gluster
21:48 H__ I have added several brick pairs , and I see data arrive on those bricks. This is 3.3.1 . I assumed this was intended DHT behaviour, that's why i proposed my targetted rebalance
21:48 JoeJulian Unless, maybe, it's rolling over because the max free space has been reached and it's rolling over to the new bricks... I suppose that could be it.
21:49 H__ that would mean the nearly full bricks do not grow any further; which they do (sadly)
21:49 JoeJulian Well, if a file on that brick grows it's still going to use up space on that brick.
21:50 H__ not the particular file, new files.
21:50 JoeJulian You can check the directory hash allocations by reading their ,,(extended attributes)
21:50 glusterbot (#1) To read the extended attributes on the server: getfattr -m .  -d -e hex {filename}, or (#2) For more information on how GlusterFS uses extended attributes, see this article: http://goo.gl/Bf9Er
21:50 jclift_ It sounds like in theory something that's impossible, in practise is happening. :/
21:50 JoeJulian Theoretically, you can even modify them by hand if you really want.
21:51 JoeJulian jclift_: Definition of insanity.
21:51 jclift_ We used to get similar in Cloud projects... we call them unexpected corner cases :)
21:51 jclift_ But, it's pretty common for theory != reality :(
21:52 H__ I just want to prevent my bricks from filling up. Last rebalance took 45 days, and data is twice as much now. and I don't have enough time for a full rebalance to even start.
21:52 jclift_ "In theory this should work" -> "Where'd my filesystem go?"
21:53 JoeJulian So what process could you do that would tell less time and still produce the desired results? It would seem to me that walking the directory tree is walking the directory tree. Moving a percentage of your files is still moving a percentage of your files (with the exception being that if you move them off the volume and back on, it'll take even longer).
21:54 JoeJulian s/tell/take
21:54 JoeJulian How did I even type that... <boggle/>
21:56 H__ trying to parse that ;-)
21:58 semiosis partner: have a theory, trying to reproduce now
21:58 H__ i have one brick-pair that used up much more space than the others. So the volume said X% free and I assumed that'd go for all bricks too, which it does except for this one brick-pair
21:59 JoeJulian I understand. I blame jdarcy for not building his concentric ring rebalance.
21:59 partner semiosis: sorry, i'm getting slow here, 1 AM already and i'm very tired, i better continue tomorrow or so, nothing urgent thought, was just wondering if its me or others or what (i didn't start this discussion as can be seen from the irc logs :)
22:00 semiosis i'll leave you a note here if i find anything
22:01 partner thanks
22:01 semiosis yw
22:02 JoeJulian H__: Hmm... speaking of jdarcy, he threw some info in here the other day of a special naming convention that could be used to place files on specific dht subvolumes. Theoretically, you could copy your biggest files to that special name, then rename them to their original filename
22:03 H__ very interesting
22:04 JoeJulian Could get you past your critical stage and allow you the time for the complete rebalance (assuming you have enough filehandles to deal with bug 928631 )
22:04 glusterbot Bug http://goo.gl/3Xruz urgent, high, ---, kaushal, ASSIGNED , Rebalance leaves file handler open
22:06 partner semiosis: actually i managed to reproduce it and its again fuse-related..
22:06 partner [2013-04-25 01:03:11.075250] E [mount.c:598:gf_fuse_mount] 0-glusterfs-fuse: cannot open /dev/fuse (No such file or directory)
22:06 semiosis ohhh now we're making some progress :D
22:06 partner [2013-04-25 01:03:11.075311] E [xlator.c:385:xlator_init] 0-fuse: Initialization of volume 'fuse' failed, review your volfile again
22:07 semiosis i just tried to reproduce (again) and wasnt able to
22:07 partner pretty much what is described in Joes blog post regarding the issue: http://joejulian.name/blog/glusterfs-volumes-​not-mounting-in-debian-squeeze-at-boot-time/
22:07 glusterbot <http://goo.gl/t6PY4> (at joejulian.name)
22:07 H__ There's an estimated 30M files on this volume. I doubt I have enough filehandles
22:07 semiosis took my vanilla squeeze vm, set up a pure remote client mount in fstab (with _netdev) and it mounted at boot time
22:08 H__ I just did a blunt test on production : find a file stored on the issue brick, move that out of gluster, move it back in, and it did get stored on other bricks !
22:08 partner semiosis: our machines are not "vanilla" but nothing special added either, patches from upstream of course up to date
22:09 JoeJulian H__: Well there you go.
22:12 semiosis partner: thats weird... i did not need to do anything like JoeJulian describes for glusterfs moutns to work... they just work
22:12 semiosis thats what i mean by vanilla
22:13 partner i recall that was the case last time too...
22:13 partner weird
22:14 JoeJulian semiosis: Probably one difference is lvm. I found a xen bug that made the xfs partition break when directly on the virtualized device. I had to create and lvm partition first. That made xfs not be mounted by the time glusterfs tried to mount, which started glusterd.
22:15 * JoeJulian can't type today and is about to give up and go play some FPS.
22:15 partner but we are still on client end which is not aware of any lvms or such so its different case
22:15 semiosis JoeJulian: glusterfs mounts start glusterd only in my hacked lucid upstart job, that's waaay deprecated now.  (for the record)
22:15 rb2k joined #gluster
22:16 JoeJulian +1
22:16 JoeJulian despite it being lts... ;)
22:16 semiosis LTS considered harmful
22:16 * JoeJulian ducks
22:17 partner previous lts was ~ok, stopped using since..
22:18 partner semiosis: anyways, nothing urgent, proven some people do suffer from it but the exact details to produce are still missing, i can try to install another pure clean machine and try things out
22:19 partner and document out all the stuff
22:19 semiosis proof = you tell me what to do to produce that behavior
22:19 semiosis and i verify your proof by following your instructions & seeing that behavior
22:19 partner yeah
22:19 partner exactly
22:19 semiosis so, not proven yet :)
22:20 semiosis though i'd like to find such a proof, so we can fix it
22:20 partner enough for me, there's several reports out there and i trust my ass ;)
22:20 semiosis thanks for your patience helping me work through this
22:20 semiosis those reports aren't enough to actuall fix the problem though.  just "something didnt work"
22:21 partner yeah i know, it just makes me feel i'm not alone and that drives me to find the root cause
22:21 semiosis great, hope i can help
22:26 Jippi joined #gluster
22:27 semiosis @later tell JuanBre i updated the 3.4 ppa with alpha3 (released monday) - https://launchpad.net/~semiosis/+arc​hive/ubuntu-glusterfs-3.4/+packages
22:27 glusterbot semiosis: The operation succeeded.
22:28 semiosis JoeJulian: still around?
22:28 JuanBre semiosis: great! I will try to update them tomorrow
22:28 partner alright i guess i should finally give in, 01:30 AM..
22:29 semiosis JuanBre: excellent.  let me know how it goes.  you can use glusterbot like i just did to leave me a message.
22:29 partner seeing from the graphs i should not run out of filehandlers before morning so i shall leave the rebalance running over the night
22:30 JuanBre semiosis: are there any "best practices" to upgrade gluster ?
22:30 semiosis uhhhh
22:30 JuanBre semiosis: other than backing up configuration files
22:30 semiosis there is the ,,(3.3 upgrade notes)
22:30 glusterbot http://goo.gl/qOiO7
22:31 semiosis not sure what upgrade you're doing
22:31 semiosis and keep in mind your asking someone who is *still* running 3.1 in prod
22:31 semiosis it... just... keeps... working...
22:32 JoeJulian yep
22:32 semiosis JoeJulian: pm
22:58 wN joined #gluster
23:08 duerF joined #gluster
23:11 fleducquede joined #gluster
23:21 fidevo joined #gluster
23:51 slabgrha joined #gluster
23:52 slabgrha anyone have any patience for some n00b questions?
23:52 slabgrha i have a use case and want to know if i'm heading in the right direction with gluster

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary