Perl 6 - the future is here, just unevenly distributed

IRC log for #gluster, 2014-06-26

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:03 bala joined #gluster
00:06 Peter2 joined #gluster
00:06 Peter2 I am on 3.5 with 6 nodes config
00:06 Peter2 rebooted one node and not able to rejoin :(
00:07 Peter2 [2014-06-25 23:17:37.533337] E [glusterd-store.c:1979:glusterd_store_retrieve_volume] 0-: Unknown key: brick-0
00:07 Peter2 [2014-06-25 23:17:37.533373] E [glusterd-store.c:1979:glusterd_store_retrieve_volume] 0-: Unknown key: brick-1
00:07 Peter2 [2014-06-25 23:17:37.533391] E [glusterd-store.c:1979:glusterd_store_retrieve_volume] 0-: Unknown key: brick-2
00:07 Peter2 [2014-06-25 23:17:37.533406] E [glusterd-store.c:1979:glusterd_store_retrieve_volume] 0-: Unknown key: brick-3
00:07 Peter2 [2014-06-25 23:17:37.533421] E [glusterd-store.c:1979:glusterd_store_retrieve_volume] 0-: Unknown key: brick-4
00:07 Peter2 3.5.0
00:07 Peter2 anyone has idea?
00:08 doc|holliday joined #gluster
00:09 doc|holliday found out I had booted the wrong image (with v3.4.4), so 3.4.4 clients do indeed work with 3.4.2 servers
00:10 vpshastry joined #gluster
00:10 doc|holliday and 3.4.2 does *not* leak -- only 3.4.4
00:10 Peter2 good to know. Does 3.4 still more stable then 3.5?
00:12 doc|holliday not sure. haven't tried 3.5
00:20 lpabon joined #gluster
00:22 bene2 joined #gluster
00:27 sjm left #gluster
00:28 zerick joined #gluster
00:31 jag3773 joined #gluster
00:35 Peter2 anyone able to successfully reboot a 3.5.0 server and rejoin the cluster?
00:42 bala joined #gluster
00:53 baojg joined #gluster
01:00 jag3773 joined #gluster
01:01 baojg_ joined #gluster
01:33 mjsmith2 joined #gluster
01:34 JoeJulian Peter2: those are false errors.
01:35 Peter2 but i wasn't able to join the cluster after restarted
01:35 Peter2 but i think 3.5.1 resolved
01:35 Peter2 i am progressively upgrading all the nodes
01:35 Peter2 it works!
01:35 Peter2 3.5.1 rocks
01:35 JoeJulian excellent
01:40 vpshastry joined #gluster
01:42 harish joined #gluster
01:49 Peter2 quotad.log still having error
01:49 Peter2 [2014-06-26 01:49:29.008343] E [client_t.c:305:gf_client_ref] (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x275) [0x7f84d3c1e9a5] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.1/xlator/features/quotad.so(quotad_aggregator_lookup+0xbb) [0x7f84ce704e5b] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.1/xlator/features/quotad.so(quotad_aggregator_get_frame_from_req+0x71) [0x7f84ce7048a1]))) 0-client_t: null client
01:50 Peter2 in 3.5.1
01:50 Peter2 what does that means?
01:50 Peter2 is that a bug or my filesystem error?
01:59 kdhananjay joined #gluster
02:05 Peter2 and i still getting these on the brick log
02:05 Peter2 [2014-06-26 01:51:17.848085] E [index.c:267:check_delete_stale_index_file] 0-gfs-index: Base index is not createdunder index/base_indices_holder
02:32 DV joined #gluster
02:41 MacWinner quick question on upgrade process for 3.5.1 from 3.5.0 if i'm using yum repos.. should I stop gluster services on each node that I upgrade prior to running "yum update"?  or will the yum update go through the work of restarting services etc
02:45 baojg joined #gluster
02:58 bharata-rao joined #gluster
03:07 troj joined #gluster
03:07 siel joined #gluster
03:11 verdurin joined #gluster
03:14 troj joined #gluster
03:16 jag3773 joined #gluster
03:19 vpshastry joined #gluster
03:21 dusmant joined #gluster
03:23 Bullardo joined #gluster
03:27 nbalachandran joined #gluster
03:30 rejy joined #gluster
03:37 atinmu joined #gluster
03:50 baojg joined #gluster
03:51 RameshN joined #gluster
04:02 gmcwhistler joined #gluster
04:03 shubhendu__ joined #gluster
04:08 baojg joined #gluster
04:08 baojg joined #gluster
04:10 itisravi joined #gluster
04:10 confusedp3rms joined #gluster
04:19 ndarshan joined #gluster
04:20 lalatenduM joined #gluster
04:24 rastar joined #gluster
04:24 jcsp joined #gluster
04:27 nshaikh joined #gluster
04:35 Peter2 on 3.5.1 when i did gluster volume heal <vol> full -> success
04:35 Peter2 then gluster volume heal <vol> info -> failed
04:35 Peter2 does it means heal failed??
04:40 AaronGr joined #gluster
04:42 vpshastry joined #gluster
04:46 Intensity joined #gluster
04:50 ramteid joined #gluster
04:50 shylesh__ joined #gluster
04:58 vimal joined #gluster
04:59 rjoseph joined #gluster
05:00 psharma joined #gluster
05:04 kdhananjay joined #gluster
05:07 saurabh joined #gluster
05:10 ppai joined #gluster
05:17 dusmant joined #gluster
05:17 davinder15 joined #gluster
05:18 meghanam joined #gluster
05:18 meghanam_ joined #gluster
05:19 theron joined #gluster
05:25 aravindavk joined #gluster
05:28 kanagaraj joined #gluster
05:28 ppai joined #gluster
05:28 prasanthp joined #gluster
05:29 JoeJulian MacWinner: The yum update will restart glusterd. It will restart all the services if you're using fedora or RHEL7. To use the new client, the mountpoint will need unmounted and mounted again.
05:30 MacWinner JoeJulian, cool.. I ended up just rebooting because I had a kernel update.. didn't realize rhel7 is out now
05:30 JoeJulian Yeah, came out about a week or so ago.
05:31 JoeJulian waiting on CentOS 7 but it doesn't look like it'll take as long as 6 did.
05:31 MacWinner i'm on centos6.5.. will probably stick here for the near future
05:33 MacWinner JoeJulian, do you have any experience with GridFS?  I was thinking of migrating some stuff over to it..  use it as a caching layer as well as a pseudo redunancy layer..
05:34 JoeJulian Nope
05:34 JoeJulian mongodb huh...
05:35 nbalachandran joined #gluster
05:40 hagarth joined #gluster
05:48 bala1 joined #gluster
05:48 kshlm joined #gluster
05:48 nishanth joined #gluster
05:50 dusmant joined #gluster
05:53 aravindavk joined #gluster
05:54 ws2k33 joined #gluster
06:01 Intensity joined #gluster
06:07 ricky-ti1 joined #gluster
06:36 baojg joined #gluster
06:45 aravindavk joined #gluster
06:46 ekuric joined #gluster
06:48 ctria joined #gluster
06:48 stickyboy Wow, been watching my failed brick heal for a few days now... up to yesterday I had No. of entries healed: 1467413 ... No. of heal failed entries: 1756680
06:48 stickyboy This morning I came in and found:  failed: 0
06:48 stickyboy w00t.
06:48 stickyboy Heal all teh things!
06:55 hchiramm_ joined #gluster
06:57 jonathanpoon joined #gluster
06:58 jonathanpoon hey guys, I have 2 servers where each server has 2 bricks.  If I set the replication value to 2, how do I ensure the data is replicated on two different servers?
07:03 Nightshader joined #gluster
07:04 haomaiwang joined #gluster
07:05 ndarshan joined #gluster
07:08 kdhananjay joined #gluster
07:08 haomai___ joined #gluster
07:10 eseyman joined #gluster
07:12 glusterbot New news from newglusterbugs: [Bug 1113403] Excessive logging in quotad.log of the kind 'null client' <https://bugzilla.redhat.com/show_bug.cgi?id=1113403>
07:13 fraggeln JoeJulian: not at all, Im greatfull for all the help. for now we have focused on read/writes using cp. today I will install a real application and see how it performs.
07:17 hchiramm_ joined #gluster
07:27 ktosiek joined #gluster
07:34 mbukatov joined #gluster
07:54 fsimonce joined #gluster
07:55 calum_ joined #gluster
08:13 liquidat joined #gluster
08:16 _polto_ joined #gluster
08:42 Slashman joined #gluster
08:50 monotek joined #gluster
08:50 monotek left #gluster
09:12 glusterbot New news from newglusterbugs: [Bug 1113460] Distributed volume broken on glusterfs-3.5.1 <https://bugzilla.redhat.com/show_bug.cgi?id=1113460>
09:14 glusterbot New news from resolvedglusterbugs: [Bug 1101942] Unable to peer probe 2nd node on distributed volume (3.5.1-0.1.beta1) <https://bugzilla.redhat.com/show_bug.cgi?id=1101942>
09:22 qdk joined #gluster
09:27 itisravi_ joined #gluster
09:29 ndevos kshlm: bug 1101942 seems to be something with peer probing, I've changes the subject and component for that now
09:29 glusterbot Bug https://bugzilla.redhat.com:443/show_bug.cgi?id=1101942 urgent, unspecified, ---, rwheeler, CLOSED DUPLICATE, Unable to peer probe 2nd node on distributed volume (3.5.1-0.1.beta1)
09:29 ndevos uh, bug 1113460
09:29 glusterbot Bug https://bugzilla.redhat.com:443/show_bug.cgi?id=1113460 urgent, unspecified, ---, kparthas, NEW , peer probing fails on glusterfs-3.5.1
09:34 karnan joined #gluster
09:39 chirino joined #gluster
09:42 glusterbot New news from newglusterbugs: [Bug 1113460] peer probing fails on glusterfs-3.5.1 <https://bugzilla.redhat.com/show_bug.cgi?id=1113460> || [Bug 1113476] [SNAPSHOT] : gluster volume info should not show the value which is not set explicitly <https://bugzilla.redhat.com/show_bug.cgi?id=1113476>
09:43 qdk joined #gluster
09:50 shubhendu__ joined #gluster
09:50 fraggeln that sounds like a nasty bug :D
09:57 RameshN joined #gluster
09:59 itisravi joined #gluster
10:00 Nopik joined #gluster
10:02 kdhananjay joined #gluster
10:03 Nopik hi.. i have some glusterfs installation with few bricks, and a number of clients which do write to that fs a lot. i added new brick, but rebalancing adds so much load to the system that clients work 10x slower and queue too much. is there any way to make rebalance (or at least removing brick) less heavy?
10:03 Nopik if i change the priority of the processes, it helps, but not much
10:04 kaushal_ joined #gluster
10:04 Nopik also, is it possible to somehow make a brick readonly, so no new data will be added to it? that would probably solve my problem, as i could slowly move the data out of that brick. right now the stop brick is slower than the writes, so despite of brick stopping, it gets more and more data
10:05 fraggeln there is an option at least to replace a brick, or remove.
10:06 baojg joined #gluster
10:07 Nopik yeah, that is what i'm using. but it virtually kills the performance, so the whole things becomes useless
10:08 _polto_ joined #gluster
10:08 _polto_ joined #gluster
10:08 Nopik i mean, i have a number of jobs queued which are writing to the fs and they need to be served in a timely fashion. the moment i start replace brick, remove brick, rebalance, they start to take 10x more time, and they are still faster than the brick stop, so the amount of data on the brick still grows
10:15 atinmu joined #gluster
10:19 ricky-ticky1 joined #gluster
10:30 deepakcs joined #gluster
10:32 kkeithley1 joined #gluster
10:33 shubhendu joined #gluster
10:39 hagarth joined #gluster
10:43 FooBar Nopik: i have the same problem here.... healing or rebalancing kills performance, so much so that application performance goes out the window
10:49 fraggeln Nopik, FooBar: what is the speed between your nodes? 1gbit? 10gbit?
10:52 lalatenduM joined #gluster
10:55 dusmant joined #gluster
10:56 kaushal_ joined #gluster
11:00 capri Nopik, you could rsync your data to the new brick
11:06 haomaiwa_ joined #gluster
11:07 atinmu joined #gluster
11:10 rjoseph joined #gluster
11:15 harish_ joined #gluster
11:15 spandit joined #gluster
11:19 dusmant joined #gluster
11:22 jag3773 joined #gluster
11:24 LebedevRI joined #gluster
11:27 hchiramm__ joined #gluster
11:35 ninkotech joined #gluster
11:37 kdhananjay joined #gluster
11:41 edward1 joined #gluster
11:43 glusterbot New news from newglusterbugs: [Bug 1113543] Spec %post server does not wait for the old glusterd to exit <https://bugzilla.redhat.com/show_bug.cgi?id=1113543>
11:53 pkoro joined #gluster
11:53 B21956 joined #gluster
12:01 marbu joined #gluster
12:02 dusmant joined #gluster
12:03 baojg joined #gluster
12:04 diegows joined #gluster
12:05 baojg_ joined #gluster
12:05 baojg_ joined #gluster
12:16 sjm joined #gluster
12:18 ws2k33 hello, when i would have a remote location, can geo-replication go both ways ?
12:18 ppai joined #gluster
12:18 ws2k33 so what i change in location a will be synced to location b and what i change in location b will he synced to location a ?
12:19 chirino joined #gluster
12:20 Thilam joined #gluster
12:21 chirino_m joined #gluster
12:32 ws2k3 joined #gluster
12:32 ws2k3 hello
12:32 glusterbot ws2k3: Despite the fact that friendly greetings are nice, please ask your question. Carefully identify your problem in such a way that when a volunteer has a few minutes, they can offer you a potential solution. These are volunteers, so be patient. Answers may come in a few minutes, or may take hours. If you're still in the channel, someone will eventually offer an answer.
12:33 ws2k3 ireaded something about multi master was moved from 3.4 to 3.5 but is this feature already in 3.5 ?
12:34 tdasilva joined #gluster
12:35 RameshN joined #gluster
12:46 rwheeler joined #gluster
12:47 sroy joined #gluster
12:49 theron joined #gluster
12:50 theron_ joined #gluster
12:52 bennyturns joined #gluster
12:56 baojg joined #gluster
13:04 ctria joined #gluster
13:04 mbukatov joined #gluster
13:16 VerboEse joined #gluster
13:16 julim joined #gluster
13:19 dusmant joined #gluster
13:20 coredump joined #gluster
13:22 calum_ joined #gluster
13:36 bchilds joined #gluster
13:37 Alex____1 joined #gluster
13:39 TvL2386 joined #gluster
13:46 vpshastry joined #gluster
13:46 sadbox joined #gluster
13:48 hagarth joined #gluster
14:02 lmickh joined #gluster
14:09 gmcwhistler joined #gluster
14:10 prasanthp joined #gluster
14:14 rjoseph joined #gluster
14:21 Ark joined #gluster
14:23 jobewan joined #gluster
14:24 ctria joined #gluster
14:28 daMaestro joined #gluster
14:29 wushudoin joined #gluster
14:31 mjsmith2 joined #gluster
14:36 shubhendu joined #gluster
14:44 semiosis ws2k3: there's no multi-master geo-replication yet.  maybe 3.7 last I heard
14:46 _polto_ joined #gluster
14:48 m0zes multi-master replication is hard.
14:49 m0zes much harder if there can be delays transferring changes to either side
14:49 primechuck joined #gluster
14:51 theron_ joined #gluster
14:53 ctria joined #gluster
14:53 theron joined #gluster
14:56 ws2k3 yeah i can believe that that is pretty hard
14:56 ws2k3 caus ethat means you sometimes have to merge multiple changes maby
14:57 theron joined #gluster
14:58 ws2k3 but in the current situation something simual can happen what is communicate with 1 glusterfs node is broken with the rest and clients make changes to files and communicate with the glusterfs node and the rest of the cluster is connected again...
14:58 deepakcs joined #gluster
14:58 m0zes merging changes in binary files is impossible unless the storage sync daemon understands the file type. the best you could hope for would be getting two different copies and then having to merge the changes manually.
15:00 m0zes ws2k3: and in that situation, if the file is modified on both replicas the file goes splitbrain and cannot be accessed by the clients when the link is restored.
15:00 sjm joined #gluster
15:02 rsavage joined #gluster
15:02 rsavage hello all
15:02 rsavage I am somewhat new to gluster so please forgive my questions.
15:03 ws2k3 m0zes and how does glusterfs solve that>?
15:03 rsavage Does glusterfs currently have a way to provide backups or gluster volumes?  I.E. crash consistant backups?
15:05 m0zes ws2k3: iirc you can set a policy for which node to keep the files from if it splitbrains. otherwise it is a manual process of either picking the one you want to keep or merging the file changes yourself...
15:07 nshaikh joined #gluster
15:09 ws2k3 hmm that sucks
15:11 chirino joined #gluster
15:13 m0zes split-brains in a local setup are less likely to happen, simply because the links between the clients and servers are going to be more stable. if a server goes away it is because the server probably crashed, not because the network broke and is letting some clients talk to one server and others to a different server.
15:14 bala1 joined #gluster
15:14 m0zes not saying it doesn't happen, it just happens less frequently than with multi-master communication over the WAN.
15:24 ws2k3 and does it work when one glusterfs node goes offline and it gets online 3 days later ?
15:24 ws2k3 does it then automaticly resync with the cluster ?
15:27 m0zes yes, that would be fine. if the node itself goes down (files haven't changed on *both* replicas) the self heal daemon will take care of it and bring the cluster back to a fully synced state;
15:29 lpabon joined #gluster
15:30 ws2k3 how you mean files havent change on both replica's ?
15:30 ws2k3 when i have a 4 nodes cluster and one goes down ofcrouse all kinda files will change on the remaining 3 node cluster
15:30 kshlm joined #gluster
15:32 ws2k3 so when the node that went down comes up again then the self heal daemon will sync aan the changes from the cluster to the node that just came up ?
15:34 ekuric left #gluster
15:35 m0zes most people set up gluster with replica 2. this means that 2 bricks will contain the same files. if one of those bricks goes away (i.e. not writable in *any* fashion) the self-heal daemon will fix it when the brick comes back.
15:36 m0zes s/not writeable/hasn't been written to/
15:36 glusterbot m0zes: Error: I couldn't find a message matching that criteria in my history of 1000 messages.
15:36 m0zes s/not writable /hasn't been written to /
15:36 rsavage What are people doing to backup glusterfs volumes?
15:36 glusterbot What m0zes meant to say was: most people set up gluster with replica 2. this means that 2 bricks will contain the same files. if one of those bricks goes away (i.e. hasn't been written to in *any* fashion) the self-heal daemon will fix it when the brick comes back.
15:37 jiffe98 there seems to be some problem with glusterfs and apache-mpm-itk, if the directory the vhost is trying to access is chmod 700, it works fine on local disk but I get a permissions error over gluster regardless of whether I mount using fuse or nfs
15:38 rsavage What are people doing to backup glusterfs volumes?  Looks like glusterfs volume snapshots are not ready yet...
15:38 ctria joined #gluster
15:39 jiffe98 it sounds like they're using the new linux capability CAP_DAC_READ_SEARCH to access the .htaccess file
15:39 m0zes rsavage: I use bacula. I've used geo-replication in the past, though
15:39 rsavage m0zes, bacula isn't that just a normal file-level backup?
15:39 m0zes rsavage: yep.
15:40 m0zes snapshots != backup ;)
15:40 rsavage mozes, I know that
15:41 rsavage I was hoping for a way to automatically do crash consistant backups via a snap to dump approach, so I could then restore
15:42 ndevos snapshots are coming in glusterfs 3.6 :)
15:43 lmickh joined #gluster
15:44 MacWinner joined #gluster
15:48 lpabon joined #gluster
15:51 semiosis rsavage: i snapshot my servers & their attached EBS volumes using the EC2 API call CreateImage
15:52 semiosis which are backups, for me
16:01 kdhananjay joined #gluster
16:03 wushudoin left #gluster
16:06 kanagaraj joined #gluster
16:14 mjsmith2_ joined #gluster
16:23 lpabon_test joined #gluster
16:24 mjsmith2 joined #gluster
16:24 irated Does gluster keep a reserve on each strip to prevent failure?
16:25 sputnik13 joined #gluster
16:25 cmtime joined #gluster
16:28 sputnik13 joined #gluster
16:35 ramteid joined #gluster
16:39 diegows joined #gluster
16:41 theron joined #gluster
16:42 semiosis irated: i dont understand.  could you please rephrase using ,,(glossary)?
16:42 glusterbot irated: A "server" hosts "bricks" (ie. server1:/foo) which belong to a "volume"  which is accessed from a "client"  . The "master" geosynchronizes a "volume" to a "slave" (ie. remote1:/data/foo).
16:44 irated semiosis, in a 4 brick setup and each brick has a striped volume, is there a natural rebalance when it hits the min-free disk
16:44 semiosis no
16:45 semiosis gluster places files on bricks according to a hash of the filename, not bytes or blocks
16:45 semiosis and there's no automatic rebalance anyway
16:45 irated Does it rebalance on its own when one stripe is full?
16:45 irated brick*
16:46 semiosis still no
16:47 semiosis although it will try to place new files on a brick with that isn't almost full
16:51 SpeeR_ joined #gluster
16:53 irated yeah
16:53 irated there is no default free-min
16:53 irated min-free
16:56 mjsmith2 joined #gluster
16:59 cmtime I have a major splitbrain problem with two nodes.  Tracing the symlinks in .glusterfs I am running into a problem with the symlink being to many symlink's to follow.  So they look broken when they really are not.  After running a full heal I have 1024 splitbrain gfid files on both host.  This was caused by failure of the brick on one of the two nodes.  In failed-heal I have zero files.  So my question is this.  Could I simply wipe out the .g
17:03 JoeJulian cmtime: it was cut off but it's possible that wiping .glusterfs would work as long as all the files have their correct metadata in the extended attributes. That won't heal split-brain however. Split brain is caused by the same file on two different bricks both having extended attributes claiming there are writes pending for the other server.
17:04 JoeJulian cmtime: That's where you just have to pick the good one and delete (or mv off brick) the other. ,,(split-brain)
17:04 glusterbot cmtime: (#1) To heal split-brain, use splitmount. http://joejulian.name/blog/glusterfs-split-brain-recovery-made-easy/, or (#2) For additional information, see this older article http://joejulian.name/blog/fixing-split-brain-with-glusterfs-33/
17:04 VonNaturAustreVe joined #gluster
17:05 pureflex joined #gluster
17:07 cmtime JoeJulian: it seems daunting with both servers having 1024 split-brain trying to track down the 1024 files seems impossible at this point.
17:07 * JoeJulian needs a better tool than "heal info". It shows files that are actively being written, not just files that need healed. There also needs to be a way for the self-heal daemons to show which files they're currently working on.
17:08 JoeJulian cmtime: Another option would be to just wipe the "bad" brick, depending on how big the brick is.
17:08 cmtime 73T  7.4T   66T  11% /brick0  So 7.4T to resync a 2nd time.
17:13 cmtime JoeJulian: both bricks are listing 1024 gfid.  I am assuming my problem brick is causing them both to throw the errors up.  Only a small percentage of files on this gluster matter I would call it beta in a work environment.  So if I do wipe the brick out this time I should rsync the files skipping the .glusterfs and then add then turn glusrtd back on and let selfheal solve the .glusterfs dir?
17:13 JoeJulian No. Just let self-heal do the whole thing.
17:14 JoeJulian Why duplicate the function of self-heal with rsync?
17:14 cmtime I did the first time but we ran into this.
17:14 JoeJulian df doesn't tell you.
17:15 JoeJulian du -s if you want to check sizes
17:15 JoeJulian er, -b
17:15 JoeJulian du -b
17:20 cmtime JoeJulian: do a du -b on /brick0/ ?  I probably have a billion files
17:21 JoeJulian You can't use df because replication will create sparse copies.
17:22 JoeJulian semiosis: You might find that bit of info interesting. ^ Just confirmed it yesterday.
17:23 cmtime Running the du -sb on both servers
17:24 SpeeR_ is there a way to list the file that's pointed to a hex in the .glusterfs?
17:26 JoeJulian cmtime: Yesterday, I asked management how much time they wanted to give me to determine that the replica was valid. Starting from that number, I offered a couple of plans to spot-check a percentage of the files (actually in this case it was all of the files, but only the first and last N Gb) where I changed the size of the validation to match the time given.
17:26 JoeJulian ~ gfid resolver | SpeeR_
17:26 glusterbot SpeeR_: https://gist.github.com/4392640
17:27 SpeeR_ excellent, thanks Joe]
17:27 SpeeR_ Joe
17:27 semiosis why does replication create sparse copies?
17:28 JoeJulian Probably because I don't use "whole file" self-heal.
17:28 JoeJulian When most of the file matches (is all 0s), it doesn't write that bit.
17:28 cmtime Maybe I should instead of the default self-heal?
17:28 J_Man joined #gluster
17:29 JoeJulian Since I have some images that are 10Tb, "full" is not a reasonable option for me.
17:30 cmtime This is a 12 node 2x replica setup talking over 40G infiniband TCP for now.   In this case it is office files and img/video content.
17:31 JoeJulian cmtime: Those 1024 entries... have you noticed if they're the same file (or set of files) over and over?
17:31 JoeJulian info...split-brain is a log, not a list.
17:32 cmtime JoeJulian: not sure we can try to look at that.  But they look unique and if they are trying to follow more than a few under centos sucks big time.
17:33 cmtime [root@gf-osn09 brick0]# du -sb . = 8033653120654   . <Good/Bad> [root@gf-osn10 brick0]# du -sb . = 8033645303296   .
17:34 JoeJulian cmtime: Are you saying the gfid is a symlink under .glusterfs?
17:34 cmtime yes
17:34 JoeJulian that should mean it's a directory.
17:34 cmtime We hit the max symlinks
17:35 cmtime So using the tool does not work and we have to follow it by hand to find the file.
17:36 JoeJulian So if you ls -l that file, where does it point?
17:36 cmtime Let me post a example
17:37 cmtime http://fpaste.org/113549/38042631/
17:37 glusterbot Title: #113549 Fedora Project Pastebin (at fpaste.org)
17:38 cmtime Its not complete but gives you a idea of what I am facing.
17:39 cmtime In the end it points /brick0/office-xx-xx-backup-exclude/data-xxxxx/tuesday/Users/username/Application\ Data/vlc/art/
17:40 JoeJulian Ok, it is a directory then. split-brain directories are relatively easy to fix.
17:40 JoeJulian These are all on brick0?
17:41 cmtime ya
17:41 Peter2 joined #gluster
17:42 JoeJulian getfattr -m . -d -e hex /brick0/office-xx-xx-backup-exclude
17:44 calum_ joined #gluster
17:44 cmtime what server should I run that on?
17:44 cmtime the good the bad or both?
17:47 JoeJulian bad one
17:48 XpineX joined #gluster
17:50 Peter2 i still seeing # gluster volume heal gfs info
17:50 Peter2 Volume heal failed
17:50 cmtime JoeJulian: Here is what I got back http://fpaste.org/113550/80501514/
17:50 glusterbot Title: #113550 Fedora Project Pastebin (at fpaste.org)
17:50 Peter2 and no log enteris in glusterhsd.log ….
17:51 Peter2 any idea why the heal keep failing?
17:51 JoeJulian yes
17:53 sjm joined #gluster
17:53 JoeJulian find /brick0/* -type d -exec setfattr -n trusted.afr.gf-osn-client-8 -v 0x000000000000000000000000 {} \; -exec setfattr -n trusted.afr.gf-osn-client-9 -v 0x000000000000000000000000 {} \;
17:53 JoeJulian then do a heal...full again.
17:54 JoeJulian (that's on the bad brick still)
17:54 theron joined #gluster
17:54 Matthaeus joined #gluster
17:54 Peter2 let me try
17:54 Peter2 thanks!
17:54 mjsmith2 joined #gluster
17:54 JoeJulian You're welcome
17:57 Peter2 and why do we need to do that?
18:00 vpshastry joined #gluster
18:03 JoeJulian Those are the keys that are used to determine if a self-heal is required. If a directory is marked as split-brain, most of the time it's just silly. If the permissions and owners match, what's left to heal? That's (for some reason) not done properly and it marks it as split-brain. I've been meaning to diagnose this and file a bug report but I've not had time.
18:03 glusterbot https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS
18:03 cmtime JoeJulian: running that find
18:04 Peter2 i done w/ running the find
18:04 Peter2 still head info returned failed
18:08 Peter2 i got a list of heal-failed
18:09 semiosis whats left to heal?  entries
18:09 semiosis ACLs perhaps
18:09 Peter2 Brick glusterprod006.shopzilla.laxhq:/data/gfs
18:09 Peter2 Number of entries: 72
18:09 Peter2 at                    path on brick
18:09 Peter2 -----------------------------------
18:09 Peter2 2014-06-26 01:44:05 /NetappFs2//marketing//sem//account-management/bing/keyword/bid/msz/output/Failure-119.txt
18:09 Peter2 joined #gluster
18:09 semiosis Peter2: use pastie.org for multiline pastes
18:10 Peter2 ok sorry
18:10 Peter2 http://pastie.org/9327598
18:10 glusterbot Title: #9327598 - Pastie (at pastie.org)
18:11 gomikemike joined #gluster
18:12 gomikemike is feasable to make bricks ontop of LVM LVs?
18:12 semiosis people do that
18:12 semiosis some people
18:12 semiosis not me, but people
18:13 gomikemike i would like to keep all data of a customer on the same brick
18:13 gomikemike and if they need more space, i can grow the LV and keep going
18:14 gomikemike the part that i have doubts is about the brick growing that way, will gluster support it
18:14 semiosis yep, works great
18:14 semiosis you can even do it online
18:14 semiosis once you finish expanding all the replicas the space is immediately available
18:15 JoeJulian I did that at $old_job to separate projects.
18:15 MacWinner joined #gluster
18:15 Peter2 how do i fix the heal-failed ?
18:17 gomikemike so, for security on the mount points, how can i restrict that part?
18:17 gomikemike I read ports get added as bricks get added so i was thinking of managing the ports per customer...is that somewhat sane?
18:19 cmtime JoeJulian: the find might take some time =P
18:20 Peter2 my find is done but still failed on heal
18:20 Peter2 is that normal to see heal-failed enteries?
18:20 Peter2 how about the heal info failed?
18:23 theron joined #gluster
18:24 JoeJulian Peter2: the find just set up the heal to succeed. When you next do the heal, at least that should cure any directories that were failing.
18:25 _polto_ joined #gluster
18:25 Peter2 ic
18:25 pasqd hi
18:25 glusterbot pasqd: Despite the fact that friendly greetings are nice, please ask your question. Carefully identify your problem in such a way that when a volunteer has a few minutes, they can offer you a potential solution. These are volunteers, so be patient. Answers may come in a few minutes, or may take hours. If you're still in the channel, someone will eventually offer an answer.
18:25 Peter2 but how about the heal-failed enteries?
18:27 pasqd anyone notice a gluster client synchronization problem due to high traffic between client<-->server? i cought gluster on braking connection with error: FSYNC() ERR => -1 (Transport endpoint is not connected)
18:27 * JoeJulian is frustrated that you still can't tell if a heal crawl is complete...
18:28 JoeJulian file a bug
18:28 glusterbot https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS
18:28 pasqd i use gluster with openstack and when i spawn big instances, like 40-50gb other instances are going into read only mode.
18:29 Peter2 will do thanks!
18:29 JoeJulian That was actually for me. I just wanted the link
18:30 gomikemike
18:30 _dist joined #gluster
18:31 Mo__ joined #gluster
18:32 sjm joined #gluster
18:33 mjsmith2 joined #gluster
18:36 lmickh joined #gluster
18:41 jag3773 joined #gluster
18:44 glusterbot New news from newglusterbugs: [Bug 1113724] There is no way to tell if a heal crawl is in progress <https://bugzilla.redhat.com/show_bug.cgi?id=1113724>
18:46 JoeJulian pasqd: We're running openstack with cinder and nova images backed by glusterfs. I have not seen that issue, even when spawning multi TB volumes.
18:47 lmickh joined #gluster
18:47 mjsmith2 joined #gluster
18:47 Mo__ joined #gluster
18:47 _polto_ joined #gluster
18:47 theron joined #gluster
18:47 MacWinner joined #gluster
18:47 gomikemike joined #gluster
18:47 Peter2 joined #gluster
18:47 XpineX joined #gluster
18:47 J_Man joined #gluster
18:47 VonNaturAustreVe joined #gluster
18:47 ramteid joined #gluster
18:47 sputnik13 joined #gluster
18:47 kanagaraj joined #gluster
18:47 rsavage joined #gluster
18:47 primechuck joined #gluster
18:47 daMaestro joined #gluster
18:47 gmcwhistler joined #gluster
18:47 hagarth joined #gluster
18:47 sadbox joined #gluster
18:47 Alex joined #gluster
18:47 bchilds joined #gluster
18:47 sroy joined #gluster
18:47 ws2k3 joined #gluster
18:47 Thilam joined #gluster
18:47 edward1 joined #gluster
18:47 ninkotech joined #gluster
18:47 hchiramm__ joined #gluster
18:47 LebedevRI joined #gluster
18:47 harish_ joined #gluster
18:47 kkeithley_ joined #gluster
18:47 Nopik joined #gluster
18:47 qdk joined #gluster
18:47 AaronGr joined #gluster
18:47 jcsp joined #gluster
18:47 confusedp3rms joined #gluster
18:47 troj joined #gluster
18:47 verdurin joined #gluster
18:47 siel joined #gluster
18:47 DV joined #gluster
18:47 mjrosenb joined #gluster
18:47 sage__ joined #gluster
18:47 gts joined #gluster
18:47 ultrabizweb joined #gluster
18:47 hybrid512 joined #gluster
18:47 Kedsta joined #gluster
18:47 77CAAJHTE joined #gluster
18:47 swebb joined #gluster
18:47 firemanxbr joined #gluster
18:47 nhayashi joined #gluster
18:47 churnd joined #gluster
18:47 Georgyo joined #gluster
18:47 decimoe joined #gluster
18:47 d-fence joined #gluster
18:47 irated joined #gluster
18:47 cfeller joined #gluster
18:47 jvandewege joined #gluster
18:47 NuxRo joined #gluster
18:47 y4m4 joined #gluster
18:47 and` joined #gluster
18:47 systemonkey joined #gluster
18:47 eshy joined #gluster
18:47 koobs joined #gluster
18:47 doekia joined #gluster
18:47 jbrooks joined #gluster
18:47 rturk|afk joined #gluster
18:47 elico joined #gluster
18:47 msciciel joined #gluster
18:47 tziOm joined #gluster
18:47 Norky joined #gluster
18:47 partner joined #gluster
18:47 fraggeln joined #gluster
18:47 ThatGraemeGuy joined #gluster
18:47 yosafbridge joined #gluster
18:47 foster joined #gluster
18:47 social_ joined #gluster
18:47 Peanut joined #gluster
18:47 SFLimey joined #gluster
18:47 fim joined #gluster
18:47 lava joined #gluster
18:47 tom[] joined #gluster
18:47 JonathanD joined #gluster
18:47 tjikkun_ joined #gluster
18:47 morse joined #gluster
18:47 primusinterpares joined #gluster
18:47 romero joined #gluster
18:47 cyberbootje joined #gluster
18:47 the-me joined #gluster
18:47 l0uis joined #gluster
18:47 huleboer joined #gluster
18:47 n0de joined #gluster
18:47 tty00 joined #gluster
18:47 a2 joined #gluster
18:47 T0aD joined #gluster
18:47 fyxim_ joined #gluster
18:47 mtanner_ joined #gluster
18:47 eightyeight joined #gluster
18:47 johnmwilliams__ joined #gluster
18:47 pasqd joined #gluster
18:47 _jmp_ joined #gluster
18:47 masterzen joined #gluster
18:47 samkottler joined #gluster
18:47 lezo joined #gluster
18:47 osiekhan1 joined #gluster
18:47 Andreas-IPO joined #gluster
18:47 SpComb joined #gluster
18:47 atrius` joined #gluster
18:47 Ramereth joined #gluster
18:47 oxidane joined #gluster
18:47 mkzero joined #gluster
18:47 bfoster joined #gluster
18:47 lanning joined #gluster
18:47 rturk-away joined #gluster
18:47 k3rmat joined #gluster
18:47 uebera|| joined #gluster
18:47 capri joined #gluster
18:47 NCommander joined #gluster
18:47 codex joined #gluster
18:47 kke joined #gluster
18:47 mwoodson joined #gluster
18:47 twx joined #gluster
18:47 atrius joined #gluster
18:47 xavih joined #gluster
18:47 JustinClift joined #gluster
18:47 tru_tru joined #gluster
18:47 msvbhat joined #gluster
18:47 saltsa joined #gluster
18:47 Rydekull joined #gluster
18:47 al joined #gluster
18:47 Bardack joined #gluster
18:47 radez joined #gluster
18:47 delhage joined #gluster
18:47 klaas joined #gluster
18:47 Slasheri joined #gluster
18:47 ackjewt joined #gluster
18:47 FooBar joined #gluster
18:47 crashmag joined #gluster
18:47 neoice joined #gluster
18:47 Gugge joined #gluster
18:47 lkoranda joined #gluster
18:47 samppah joined #gluster
18:47 Kins joined #gluster
18:47 sman joined #gluster
18:47 muhh joined #gluster
18:47 fuz1on joined #gluster
18:47 txmoose joined #gluster
18:47 abyss_ joined #gluster
18:47 coredumb joined #gluster
18:47 eclectic joined #gluster
18:47 DanF joined #gluster
18:47 _NiC joined #gluster
18:47 mibby joined #gluster
18:47 jezier_ joined #gluster
18:47 * JoeJulian grumbles about impatience in the world.
18:47 AbrekUS joined #gluster
18:47 JoeJulian or was it a netsplit?
18:48 Peter2 netsplit
18:48 jiffe98 joined #gluster
18:48 Peter2 i though nobody likes me
18:48 JoeJulian I turned off joins and parts...
18:48 JoeJulian pasqd: We're running openstack with cinder and nova images backed by glusterfs. I have not seen that issue, even when spawning multi TB volumes.
18:48 Bardack joined #gluster
18:48 DanF joined #gluster
18:48 tru_tru joined #gluster
18:48 JustinClift joined #gluster
18:48 msvbhat joined #gluster
18:48 twx joined #gluster
18:48 Kins joined #gluster
18:48 edong23 joined #gluster
18:48 Peter2 i see this
18:48 Peter2 http://pastie.org/9327718
18:48 glusterbot Title: #9327718 - Pastie (at pastie.org)
18:49 Peter2 and heal info just returned failed
18:49 msvbhat joined #gluster
18:49 JustinClift joined #gluster
18:49 * JoeJulian whispers, "I see dead sas drives."
18:49 coredumb joined #gluster
18:49 JoeJulian where do you see that?
18:49 JoeJulian I may have to close my own ticket.
18:50 JoeJulian I dont' get anything like that in 3.4
18:50 natgeorg joined #gluster
18:50 Peter2 gluster volume heal gfs statistics
18:50 JoeJulian Ah, not in 3.4
18:50 JoeJulian crap
18:50 Peter2 hmm
18:50 AbrekUS geo-replication fails to start with the following error:
18:50 AbrekUS [resource(slave):1044:inhibit] <top>: mount cleanup failure:
18:50 AbrekUS Traceback (most recent call last):
18:50 AbrekUS File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1042, in inhibit
18:50 AbrekUS self.cleanup_mntpt(mntpt)
18:50 AbrekUS File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1066, in cleanup_mntpt
18:50 AbrekUS os.rmdir(mntpt)
18:50 AbrekUS OSError: [Errno 16] Device or resource busy: '/tmp/gsyncd-aux-mount-3EaTKs'
18:50 semiosis AbrekUS: pastie.org for multiline pastes please
18:51 _VerboEse joined #gluster
18:51 AbrekUS and _master_ server has lots of empty /tmpg/syncd-aux-mount-XXX dirs
18:51 vincent_1dk joined #gluster
18:51 suliba_ joined #gluster
18:51 AbrekUS semiosis: will do
18:51 sjm joined #gluster
18:51 semiosis JoeJulian: glusterbot doesnt kick flooders much these days
18:52 semiosis i miss that
18:52 JoeJulian yeah, that was broken and was locking down the channel incorrectly.
18:52 semiosis aww
18:52 Peter2 i got kicked earlier....
18:52 JoeJulian I need to plug in the other flood protection I downloaded...
18:52 semiosis Peter2: by freenode, not glusterbot
18:52 AbrekUS is geo-replication working in 3.5 at all?
18:52 Ark joined #gluster
18:53 JoeJulian AbrekUS: I've seen people using it.
18:53 JoeJulian Why is the mountpoint busy?
18:54 _NiC joined #gluster
18:55 AbrekUS JoeJulian: have no idea - dirs are created by geo-replication sub-system and they are empty
18:56 AbrekUS also, , first I had to fix path to the gsyncd in the gsyncd.conf, it was /nonexistent/gsyncd instead of /usr/libexec/glusterfs/gsyncd for some reason
18:56 sjm joined #gluster
18:57 AbrekUS now I'm getting "Device or resource busy" and, as result?, "failed to get the 'volume file' from server"
18:57 JoeJulian Let me know when you figure that out. It's causing that exception.
18:57 purpleidea joined #gluster
18:57 purpleidea joined #gluster
18:57 JoeJulian I assume you're familiar with lsof and fuser.
18:58 AbrekUS JoeJulian: I'm not expert in the Gluster internals :) lsof shows that these dirs are not used
18:59 tomased joined #gluster
18:59 JoeJulian "device or resource busy" when attempting to unmount a directory is a kernel thing. It's true regardless of the filesystem that's mounted.
18:59 RobertLaptop joined #gluster
19:00 tdasilva joined #gluster
19:00 AbrekUS JoeJulian: geo-replication creates these dirs, mounts remote? volume to them, then tries to unmount volume and fails because _geo-replication_ still uses? that dir?
19:01 JoeJulian I don't know what's using it. Maybe. That's what I was wondering. I'm not a huge geo-replication expert, but then again I don't think there's anyone that really is (except the developer maybe).
19:03 AbrekUS JoeJulian: these are temp dirs created by geo-replication, so it is unlikely that some other software/person hijacks them constantly :) looks like some kind of race condition in geo-replication code
19:03 JoeJulian probably
19:04 AbrekUS JoeJulian: do you know is geo-replication developer?
19:04 JoeJulian You could see if you can unmount it now...
19:05 AbrekUS JoeJulian: "mount" does not show them as mounted
19:06 JoeJulian hmm, maybe the exception is irrelevant?
19:06 Peter2 so how do we fix the heal-failed enteries?
19:06 JoeJulian but I suppose you're saying that it's not functioning.
19:07 JoeJulian Csaba Henk <csaba@redhat.com> and Venky Shankar <vshankar@redhat.com>
19:07 AbrekUS JoeJulian: yes, it is in "faulty" state and I see repeated attempts to get volume file? and they are failing with same exception
19:07 JoeJulian Oh, of course, it can't unmount it because it never did mount it.
19:08 JoeJulian Guess that makes sense.
19:08 primusinterpares joined #gluster
19:08 JoeJulian Should it be able to contact the server at port 24007 to get the vol file?
19:08 sspinner joined #gluster
19:08 JoeJulian It will be trying to reach the master.
19:10 AbrekUS JoeJulian: "it" is slave or master? i.e. is slave trying to get vol file from master via port 24007?
19:11 Peter2 not sure if that's a bug, after i restarted gluster service with the heal-failed brick, all the heal-failed gone. but gluster volume heal info still giving "Volume heal failed"
19:11 JoeJulian Assuming that error was on the slave, the slave is probably trying to mount the volume from the master.
19:11 AbrekUS but, these dirs are created on _master_
19:12 JoeJulian Peter2: Not a bug. That's the only way to clear the split-brain list. Check etc-glusterfs-glusterd.vol.log to see why you're getting that failed message though.
19:12 JoeJulian AbrekUS: Ah, ok...
19:13 JoeJulian AbrekUS: Well then, that makes less sense. It should be able to reach itself I would expect. Is there a client log for tmp.* in /var/log/glusterfs ?
19:14 Peter2 JoeJulian: There were no log entries when i did the heal info
19:14 JoeJulian Peter2: cli.log then maybe?
19:15 Peter2 no
19:15 ghenry joined #gluster
19:15 Peter2 i checked all
19:15 jobewan joined #gluster
19:15 cmtime joined #gluster
19:15 cmtime JoeJulian: after that find finishes what should be my plan of attack with that split-brain?
19:15 JoeJulian wierd... the cli.log should at least show the attempt. Is /var/log full?
19:16 Peter2 filesystem are not full
19:16 glusterbot New news from resolvedglusterbugs: [Bug 1113724] There is no way to tell if a heal crawl is in progress <https://bugzilla.redhat.com/show_bug.cgi?id=1113724>
19:16 Peter2 i tail all logs when running the command
19:19 dblack joined #gluster
19:19 SpeeR_ joined #gluster
19:19 17SAALNDT joined #gluster
19:19 mortuar joined #gluster
19:19 burnalot joined #gluster
19:19 sauce joined #gluster
19:19 georgeh|workstat joined #gluster
19:19 silky joined #gluster
19:19 [o__o] joined #gluster
19:19 tg2 joined #gluster
19:19 pdrakeweb joined #gluster
19:19 marmalodak joined #gluster
19:19 ccha joined #gluster
19:19 lyang0 joined #gluster
19:19 weykent joined #gluster
19:19 kkeithley joined #gluster
19:19 ry joined #gluster
19:19 dblack joined #gluster
19:19 chirino joined #gluster
19:19 dblack_ joined #gluster
19:21 Intensity joined #gluster
19:22 mjsmith2 joined #gluster
19:23 sjm joined #gluster
19:26 Peter2 anyone on 3.5.1 able to run gluster volume heal <vol> info with luck?
19:32 _polto_ joined #gluster
19:32 _polto_ joined #gluster
19:37 AbrekUS Peter2: I just run "gluster volume heal volume1 info" and got "Number of entries: 0" for each brick
19:37 Peter2 u on 3.5.1 ?
19:37 AbrekUS Peter2: glusterfs 3.5.1 built on Jun 24 2014 15:09:43
19:38 Peter2 i got the same only with i run "gluster volume heal gfs info heal-failed"
19:38 Peter2 gluster volume heal gfs ino gave me Volume heal failed :(
19:38 Peter2 did u upgrade from 3.5.0 or fresh install?
19:38 primechuck joined #gluster
19:39 AbrekUS Peter2: upgrade from 3.5.0
19:39 AbrekUS but I geo-replication does not work at all for me :(
19:40 Peter2 hmm
19:40 AbrekUS will try to downgrade to 3.4.4 and try geo-replication
19:40 Peter2 me too upgrade from 3.5.0 but heal info doesn't seems work for me
19:40 AbrekUS what "doesn't work" means?
19:44 Peter2 it keep returning "Volume heal failed"
19:45 MacWinner joined #gluster
19:46 AbrekUS is it replicated volume?
19:46 Peter2 nope
19:46 Peter2 hmm it's a replica 2
19:46 Peter2 no geo
19:47 AbrekUS what "gluster pool list" shows?
19:48 Peter2 http://pastie.org/9327842
19:48 glusterbot Title: #9327842 - Pastie (at pastie.org)
19:50 AbrekUS gluster volume status <VOLUME> detail
19:51 Peter2 http://pastie.org/9327849
19:51 glusterbot Title: #9327849 - Pastie (at pastie.org)
19:51 AbrekUS gluster volume heal volume1 info heal-failed
19:52 AbrekUS s/volume1/gfs/
19:52 glusterbot What AbrekUS meant to say was: gluster volume heal gfs info heal-failed
19:52 Peter2 http://pastie.org/9327851
19:52 bene2 joined #gluster
19:52 glusterbot Title: #9327851 - Pastie (at pastie.org)
19:52 AbrekUS hmm... it failed list is empty
19:53 Peter2 right
19:53 Peter2 that's so weired
19:53 sjm joined #gluster
19:53 AbrekUS try to heal it again: gluster volume heal gfs full
19:54 Peter2 i did
19:55 Peter2 the crawl full finished and healed files
19:55 Peter2 but still keep returning failed
19:55 cmtime I read you need to restart glusterd to clear that out
19:55 Peter2 and logs has no entries
19:55 cmtime maybe not the same things
19:55 Peter2 let me try
19:55 Peter2 rolling restart gluster d?
19:56 cmtime Some post I think I remember reading said just that
19:57 dtrainor joined #gluster
19:59 cmtime JoeJulian: after that find finishes what should be my plan of attack with that split-brain?
19:59 Peter2 just restarted
19:59 Peter2 still the same...
19:59 AbrekUS is there (architecture) doc which explains how geo-replication works? i.e. "master runs command X on slave via SSH, this command connects back to master to get ..." etc.
20:00 _dist JoeJulian: while testing something else, I've noticed that regular files (at least for me) have that "always healing" behaviour. If I DD to a file, as long as writes are happening it will show up in heal info, that's not expected is it?
20:01 JoeJulian It will show up in heal info because of the way it marks files as dirty. The heal info checks for that marker and reports it, even if it's transitory.
20:03 _dist got it, so the question is, if a file is being written to a lot, how can I know if it's because the file is actually unhealthy or just because it's being written to?
20:06 _dist If I can't trust the heal info for when a VM is healed, I never know for sure when a brick is healthy again after an outage
20:07 _dist forget VM, it's any file, but VMs just have alot of IO is all
20:13 Supermathie joined #gluster
20:16 Supermathie Hello everyone, long time no see.
20:16 dtrainor joined #gluster
20:16 Supermathie I'm trying to add a new brick to my volume (with the goal of shortly converting from replicate to distribute-replicate) but am getting this error: https://gist.githubusercontent.com/Supermathie/e6a7a36d1bb1c259f236/raw/a32722c56d552f3a9dc5f18be4ca0d74d25ddf7f/gistfile1.txt
20:18 Supermathie and from the gluster cli: "volume add-brick: failed: "
20:18 Supermathie it makes me sad.
20:19 Supermathie volume info/status: http://paste.ubuntu.com/7707735/
20:19 glusterbot Title: Ubuntu Pastebin (at paste.ubuntu.com)
20:28 dtrainor joined #gluster
20:29 AbrekUS left #gluster
20:32 ndevos joined #gluster
20:32 ndevos joined #gluster
20:39 gomikemike anyone using gluster in AWS
20:40 gomikemike im trying to figure out how to handle the "security" part of allowing certain hosts to certain volumes
20:42 B21956 joined #gluster
20:44 semiosis i use gluster in ec2
20:44 semiosis the straightforward way would be to use iptables or ec2 security groups to limit access to the brick ports
20:45 gomikemike semiosis: do you enforce some restrictions on who can access different share?
20:45 Guest___ joined #gluster
20:45 semiosis no
20:45 gomikemike yea, i was thinking of going the secgrp/port way but wanted to know if there was a bettah way
20:46 gomikemike AND are the ports somehow tied to the bricks?? I would hate to have the server reboot and the ports get switched around
20:46 semiosis ,,(ports)
20:46 glusterbot glusterd's management port is 24007/tcp and 24008/tcp if you use rdma. Bricks (glusterfsd) use 24009 & up for <3.4 and 49152 & up for 3.4. (Deleted volumes do not reset this counter.) Additionally it will listen on 38465-38467/tcp for nfs, also 38468 for NLM since 3.3.0. NFS also depends on rpcbind/portmap on port 111 and 2049 since 3.4.
20:46 semiosis the port should be fixed when the volume is created and not change (or even be recycled if that volume is deleted) last time I checked
20:47 gomikemike ohh nice
20:50 gomikemike is there a gui for glusterfs?
20:50 gomikemike its not for me, its for my friend...
20:52 semiosis maybe ,,(ovirt)
20:52 glusterbot http://wiki.ovirt.org/wiki/Features/Gluster_Support
20:54 theron joined #gluster
20:56 primechuck joined #gluster
20:56 hagarth joined #gluster
20:56 Supermathie Hrm, regarding my error... might I need to turn nfs back on to add a volume?
20:58 mbukatov joined #gluster
21:01 coredump joined #gluster
21:17 gomikemike semiosis: so the "clients" all they need is access to the 49152 port (for first brick)
21:18 semiosis also 24007
21:19 gomikemike ok, so all SGs would be 24007, 49152+1 as we grow
21:19 gomikemike thanks
21:20 Peter2 anyone notice 0-cli: Failed to create auxiliary mount directory /var/run/gluster/
21:20 Peter2 when first enable quota it failed
21:20 Peter2 why required to create auxiliary mount for quota?
21:25 lmickh joined #gluster
21:29 Peter2 i just rebuiled another gluster 3.5.1 fresh install and still getting heal info failed
21:29 Liquid-- joined #gluster
21:29 Peter2 anyone has clue what could cause that?
21:31 Peter2 any idea how to debug the heal fail?
21:33 dtrainor joined #gluster
21:36 _dist anyone know how to resolve a situation like " 0-management: Unable to get lock for uuid: b8bd7a0c-76f4-4df6-b149-18c8aa51953f, lock held by: 57749ac1-8a1e-4510-8d5f-f713d6c3e61a"
21:37 _dist I realize now my problem that I created the bug (and closed last week) for. It was that the client didn't have DNS for all the bricks
21:37 _dist that caused the severs and the clients to go log happy, (servers about xattr errors), (client about dns and other errors)
21:40 _dist so my bricks aren't happy about this lock, not sure how to clear/fix it or if it will resolve itself
21:40 primechuck left #gluster
21:41 _dist and all three bricks pretty much say "[glusterd-handler.c:1007:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req" every other second or so
21:44 _dist seems like this issue http://toruonu.blogspot.ca/2012/12/gluster-stuck-with-lock.html <-- I don't really want to do that solutoon though :)
21:44 glusterbot Title: Ramblings on IT and Physics: gluster stuck with a lock (at toruonu.blogspot.ca)
21:44 ninkotech_ joined #gluster
21:45 ninkotech__ joined #gluster
21:45 mjsmith2_ joined #gluster
21:48 _polto_ joined #gluster
21:49 kshlm joined #gluster
21:49 _dist looks as though the lock did clear itself (as the article said)
21:50 _dist but heal info retriggers it.
21:53 Peter1 joined #gluster
21:57 Peter1 _dist: are u on 3.5.1?
21:57 Peter1 i wonder why my heal info is broken
21:58 sn joined #gluster
21:58 _dist Peter1: nope, 3.4.2 I'm hoping to release this lock without having to restart all nodes
21:58 Peter1 ic
21:58 Guest90905 hey guys quick question. If I make a gluster brick with gluster create without any replication. Can I add servers to replicate to later? Or will I have to make the brick all over again. Thanks
22:00 _dist right now I can't really run any commands it complains and tells me to wait
22:00 ron1n To be clear, this volume has no replication at all. I want to change that and make it replicate to a new server. Is that possible?
22:01 _dist ron1n: yes, when you increase replica to 2 it'll turn the volume into a replicate volume
22:01 Peter1 file a bug
22:01 ron1n _dist, Excellent thanks
22:01 glusterbot https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS
22:02 _dist Peter1: I suppose, the problem is that I screwed up and let a client that didn't have DNS for all 3 bricks mount the volume
22:03 _dist yes it could have handled it better, but it the end it was an error on my part
22:03 Peter1 ic, i think i encounter a different issue that even i fresh create a volume, heal info report "Volume heal failed"
22:03 _dist I've not seen that, my current problem is that a brick is locking itself
22:05 cmtime joined #gluster
22:07 _dist does anyone know how to clear a lock when the error message is " 0-management: Unable to get lock for uuid: 57749ac1-8a1e-4510-8d5f-f713d6c3e61a, lock held by: 0086bd33-75be-4bdb-b6a9-82f3cec55e18" ?
22:08 Peter1 peer status ok?
22:08 _dist yeap
22:12 _dist I'm going to try keeping the brick that has the lock down for 30 min, see if that is the right spell to cast :)
22:15 SpeeR_ my nfs.log is growing very large, I've deleted it a few times, but glusterfsd has a lock on it, if I restart glusterfsd will that disconnect all connections?
22:15 glusterbot New news from newglusterbugs: [Bug 1113778] gluster volume heal info keep reports "Volume heal failed" <https://bugzilla.redhat.com/show_bug.cgi?id=1113778>
22:21 elico SpeeR_: simply yes.. but the clients should be configured to allow this kind of distruption.
22:22 SpeeR_ thanks elico
22:23 elico SpeeR_: what clients are you using? linux?
22:23 SpeeR_ heh, esx
22:23 SpeeR_ yeh
22:23 SpeeR_ linux
22:23 Ark joined #gluster
22:23 SpeeR_ so the timeouts are all set to 180 sec, so we should be good
22:23 _dist ok, I tried stopping the volume, but that doesn't seem to be enough
22:24 _dist but stopping all the servers isn't really an option...
22:29 _dist how does this lock affect me? how can I find out what is actually locked
22:37 B21956 joined #gluster
22:38 _dist having trouble finding statedump location
22:40 _dist guess I could manually set it
22:41 ron1n hey guys i'm having a problem setting the ownership of a gluster volume with gluster volume set gv0 storage.owner-uid=<uid>
22:41 ron1n I keep getting Usage: volume set <VOLNAME> <KEY> <VALUE>
22:42 ron1n storage.owner-uid is the key = uid is the value... what gives?
22:45 _dist so this is the easiest way to do it? :) https://access.redhat.com/site/documentation/en-US/Red_Hat_Storage/2.0/html/Administration_Guide/ch21s02.html
22:45 glusterbot Title: 21.2. Troubleshooting File Locks (at access.redhat.com)
22:46 _dist (online of course)
22:48 sjm joined #gluster
22:59 _dist The path in my statedump is a gfid, and I don't seem to be able to clear it, it complains no such file or directory
23:04 _dist is the "start" in an inodelk line the inode ?
23:05 _dist I'm beginning to suspect the inodelk file doesn't exist
23:12 siel joined #gluster
23:20 sjm joined #gluster
23:21 _dist all fixed, I'm not really sure if I'd consider this a bug, but it is important that all clients can resolve all bricks on their own
23:37 theron joined #gluster
23:40 sputnik13 joined #gluster
23:52 gildub joined #gluster
23:56 sjm left #gluster
23:59 sputnik13 joined #gluster

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary