Perl 6 - the future is here, just unevenly distributed

IRC log for #gluster, 2013-12-03

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:07 g4rlic I shut off SELinux, and I was able to create a volume *once*.  Then I wiped out everything, same as I've been doing, and then nothing..  same error as before.
00:07 jbd1 joined #gluster
00:08 g4rlic http://pastebin.centos.org/6331/
00:08 g4rlic ^^ that's the error I get from cli.log
00:09 davidbierce joined #gluster
00:14 davidbierce joined #gluster
00:47 hchiramm_ joined #gluster
01:11 itisravi joined #gluster
01:20 jag3773 joined #gluster
01:29 shyam joined #gluster
01:48 MrNaviPacho joined #gluster
01:51 gergnz_ joined #gluster
02:02 Technicool joined #gluster
02:07 harish joined #gluster
02:29 social joined #gluster
02:48 kshlm joined #gluster
02:54 satheesh1 joined #gluster
02:54 hchiramm_ joined #gluster
03:06 lpabon joined #gluster
03:09 social_ joined #gluster
03:23 sgowda joined #gluster
03:25 vpshastry joined #gluster
03:29 ndarshan joined #gluster
03:33 davinder4 joined #gluster
03:43 shyam joined #gluster
03:45 kanagaraj joined #gluster
03:46 RameshN joined #gluster
03:47 itisravi joined #gluster
04:08 vpshastry joined #gluster
04:27 _BryanHm_ joined #gluster
04:32 DV_ joined #gluster
04:38 nshaikh joined #gluster
04:39 ngoswami joined #gluster
04:39 plarsen joined #gluster
04:42 satheesh joined #gluster
04:43 dylan_ joined #gluster
04:43 CheRi joined #gluster
04:47 MiteshShah joined #gluster
04:47 purpleidea joined #gluster
04:51 ngoswami joined #gluster
04:55 msolo joined #gluster
04:56 spandit joined #gluster
04:59 _BryanHm_ joined #gluster
05:02 satheesh1 joined #gluster
05:04 shylesh joined #gluster
05:13 anands joined #gluster
05:17 dusmant joined #gluster
05:20 shubhendu joined #gluster
05:20 bala joined #gluster
05:27 meghanam joined #gluster
05:27 meghanam_ joined #gluster
05:29 davinder4 joined #gluster
05:34 thundara_ left #gluster
05:34 thundara joined #gluster
05:36 hagarth joined #gluster
05:40 raghu joined #gluster
05:40 shruti joined #gluster
05:43 hchiramm_ joined #gluster
05:47 shubhendu joined #gluster
05:49 satheesh joined #gluster
05:49 vpshastry joined #gluster
05:53 kanagaraj joined #gluster
05:53 ppai joined #gluster
05:53 ndarshan joined #gluster
05:55 bala joined #gluster
05:57 vshankar joined #gluster
05:59 mohankumar joined #gluster
06:01 aravindavk joined #gluster
06:03 ricky-ti1 joined #gluster
06:07 hchiramm_ joined #gluster
06:07 bulde joined #gluster
06:13 shruti joined #gluster
06:16 RameshN joined #gluster
06:20 krypto joined #gluster
06:20 krypto joined #gluster
06:20 Atin joined #gluster
06:22 lalatenduM joined #gluster
06:27 hchiramm_ joined #gluster
06:32 Atin joined #gluster
06:32 ngoswami joined #gluster
06:35 bennyturns joined #gluster
06:36 msvbhat joined #gluster
06:36 kkeithley joined #gluster
06:41 psharma joined #gluster
06:44 shubhendu joined #gluster
06:45 sgowda joined #gluster
06:46 kanagaraj joined #gluster
06:46 vpshastry1 joined #gluster
06:47 bala joined #gluster
06:48 shyam joined #gluster
06:49 aravindavk joined #gluster
06:50 satheesh joined #gluster
06:53 vpshastry2 joined #gluster
06:57 saurabh joined #gluster
06:57 hchiramm_ joined #gluster
07:08 sgowda joined #gluster
07:09 davinder4 joined #gluster
07:14 ndarshan joined #gluster
07:15 RameshN joined #gluster
07:20 MiteshShah joined #gluster
07:26 jtux joined #gluster
07:28 geewiz joined #gluster
07:32 ctria joined #gluster
07:39 hchiramm_ joined #gluster
07:42 ndarshan joined #gluster
07:49 bulde joined #gluster
07:51 Nev__ joined #gluster
07:52 jtux joined #gluster
07:54 sgowda joined #gluster
08:00 vpshastry1 joined #gluster
08:11 keytab joined #gluster
08:12 _polto_ joined #gluster
08:16 satheesh joined #gluster
08:18 eseyman joined #gluster
08:18 spandit joined #gluster
08:24 shyam joined #gluster
08:31 franc joined #gluster
08:33 bentech4you joined #gluster
08:34 bentech4you Hi..
08:35 bentech4you glusterFs 3.3.1 running on 2-node simple replica is having split-brain and data is being written only to one node. self heal occures after 10 minute interval.
08:35 bentech4you please advice.
08:36 bulde joined #gluster
08:40 mkzero is there any way in 3.4.1 to find the normal path for a gfid without crawling the brick with the id for the corresponding inode?
08:42 Nev__ Got a problem on glusterfs afr, when accessing with php fopen, file is send to the client, the client gets the message missing file...when the client requests it again, the file is found and send, someone got a solution for that?
08:44 shri joined #gluster
08:44 ndarshan joined #gluster
08:50 bala joined #gluster
08:59 vimal joined #gluster
09:00 calum_ joined #gluster
09:08 hchiramm_ joined #gluster
09:29 bulde joined #gluster
09:30 bala joined #gluster
09:36 atrius joined #gluster
09:47 psyl0n joined #gluster
09:55 bala joined #gluster
09:56 harish joined #gluster
09:56 bulde joined #gluster
09:59 keytab joined #gluster
10:43 samppah [2013-12-03 10:42:40.289395] W [client-rpc-fops.c:4137:client3_3_fsync] 0-dc1-sas1-vmstor1-client-1:  (71533fbb-5923-4cd5-a203-4a89dcfb138b) remote_fd is -1. EBADFD
10:44 samppah ^ any idea what this means? shows up in client log once in a minute
10:54 glusterbot hagarth: (help [<plugin>] [<command>]) -- This command gives a useful description of what <command> does. <plugin> is only necessary if the command is in more than one plugin.
11:31 dusmant joined #gluster
11:32 tziOm joined #gluster
11:36 edward1 joined #gluster
11:49 DV__ joined #gluster
11:52 baoboa joined #gluster
12:03 kkeithley1 joined #gluster
12:04 itisravi joined #gluster
12:14 kshlm joined #gluster
12:18 hagarth joined #gluster
12:27 _polto_ joined #gluster
12:27 _polto_ joined #gluster
12:37 ocholetras joined #gluster
12:37 ocholetras hi!
12:37 ocholetras Guys
12:37 ocholetras does anyone knows
12:37 ocholetras if i can remove bricks migrating data to others bricks on the volume
12:38 ocholetras in order to not loose anything?
12:38 ocholetras http://www.gluster.org/community/documentation/index.php/WhatsNew3.3 ---> here says " Remove-brick can migrate data to remaining bricks "
12:39 ocholetras we are on 3.4 now
12:42 mkzero ocholetra: it will migrate data. just takes a while, depending on your setup it can take several weeks
12:42 ocholetras everyplace on the docs says will loosing data O_o
12:43 ocholetras """ Data residing on the brick that you are removing will no longer be accessible at the Gluster mount
12:43 ocholetras point. Note however that only the configuration information is removed - you can continue to
12:43 ocholetras access the data directly from the brick, as necessary. """
12:43 ocholetras that is here: http://www.gluster.org/wp-content/uploads/2012/05/Gluster_File_System-3.3.0-Administration_Guide-en-US.pdf
12:44 ocholetras is the latest doc i could find
12:45 dylan_ joined #gluster
12:46 mkzero ocholetra: yeah, the docs could really need some polishing, especially for 3.4.x.. when you remove a brick with gl. vol rem.-brick *start* it will start a rebalance
12:47 eseyman joined #gluster
12:47 mkzero ocholetra: after that you can replace start with status to monitor the progress of the rebalance. once it says 'finished' instead of 'in progress' you replace start with commit and everything should be as expected - no missing data and the brick removed
12:49 mkzero as long as you type start you should be fine. as i understand it leaving the start out would remove the brick w/out the rebalance
12:52 ocholetras aha
12:52 ocholetras good to know
12:52 ocholetras was trying to do a replace
12:52 ocholetras because what we want is to add a new bigger brick on same host
12:52 ocholetras and delete the smaller one
12:52 ocholetras but migrating data first to not loos anything
12:52 ocholetras loose
12:53 ocholetras https://bugzilla.redhat.com/show_bug.cgi?id=915742
12:53 ocholetras found this error
12:53 ocholetras I didnt get too much the "replace-disk" behaviour, dont know if the brick you are using to replace the old  one have to be on the volume, or not.
12:53 ocholetras So tried doing that wit a brick outside de volume, just a directory
12:54 mkzero if you want a bigger brick.. is that new brick on the same storage-controller? if so, you could just resize the old brick ;)
12:55 ira joined #gluster
12:56 B21956 joined #gluster
12:59 glusterbot New news from newglusterbugs: [Bug 969461] RFE: Quota fixes <http://goo.gl/XFSM4>
13:00 ocholetras mkzero: yes but im testing operations on a testing environment
13:00 ocholetras to se what gluster is cappble off
13:00 ocholetras so this could be a situation
13:00 ocholetras i want to replace a brick that is small with a new one
13:01 mkzero there are several ways to expand gluster - the easiest and fastest way really would be to expand the fs below the brick(s)
13:02 mkzero the other one would be a replace-brick, that just takes some time
13:03 mkzero if you use replication you could also replace one of the bricks, heal, then replace the other one and heal again
13:03 ocholetras but when you say "replace"
13:04 ocholetras do you mean with volume replace-brick command?
13:04 ocholetras o to simulate disk faliure
13:04 ocholetras set volume-id attribute
13:04 ocholetras and healing
13:04 ocholetras like if you replaced the disk with a new one
13:04 ocholetras ?
13:08 hagarth joined #gluster
13:09 diegows joined #gluster
13:14 social hagarth: I got quite pissed off and I just started glusterd in valgrind >.< I should have rasonable output today when the memleak hits again
13:15 dusmant joined #gluster
13:19 hagarth social: great, please let me know what you find.
13:21 samppah hagarth: hey! any idea what this means? http://pastie.org/8525399
13:21 samppah that warning keeps coming every minute to client log
13:24 hagarth samppah: is the afr client connected to all bricks?
13:24 hagarth scratch afr .. is the client connected to all bricks is more appropriate
13:25 samppah i'll check that.. it should be but we had some problems this morning with one of the nodes
13:27 hagarth samppah: seems close to this RHS bug - https://bugzilla.redhat.com/show_bug.cgi?id=870948
13:27 hagarth but that was closed since it was not reproducible
13:33 samppah hagarth: it looks like that warning was shown when client was connected to all bricks.. oddly enough it's not showing up anymore
13:35 chirino joined #gluster
13:38 hagarth samppah: interesting, did you notice any functional impact when the warnings were seen?
13:41 samppah hagarth: nothing on client side.. on server side we had problems with self healing after hardware failure but it was before those warnings
13:44 mkzero ocholetra: replace like replace the server but set IP/name, etc. to the old values, so the heal process(heal full or find from client) copies all files from the other brick of that replication-brick-group.
13:46 ocholetras ok mkzero change  disk same stuff
13:46 ocholetras change disk
13:46 ocholetras remount same mount point
13:46 ocholetras and restart gluster
13:46 ocholetras then it realizes that needs to heal
13:47 ocholetras nother thing is that when i try to create a file bigger than the free space on glusterfs mountpoint at the client
13:47 ocholetras it launches a io error
13:50 getup- joined #gluster
14:02 rwheeler joined #gluster
14:02 ndk joined #gluster
14:03 vpshastry joined #gluster
14:06 zerick joined #gluster
14:06 mattapp__ joined #gluster
14:06 mkzero ocholetra: that seems about right?! files can never be bigger that the free space of the fs (except, of course, if the FS supports encryption)
14:16 japuzzo joined #gluster
14:28 rwheeler_ joined #gluster
14:29 plarsen joined #gluster
14:30 plarsen joined #gluster
14:31 ocholetras mkzero: it is a little bit ugly end wit io error
14:31 ocholetras instead of
14:31 ocholetras no space left on device
14:32 ocholetras ^^
14:33 X3NQ joined #gluster
14:39 bennyturns joined #gluster
14:40 mkzero yeah, true, that could be handled better
14:49 mohankumar joined #gluster
15:01 davinder4 joined #gluster
15:02 dbruhn joined #gluster
15:11 _pol joined #gluster
15:12 kaptk2 joined #gluster
15:16 glusterbot semiosis: I'm not happy about it either
15:16 glusterbot semiosis: Error: You don't have the owner capability. If you think that you should have this capability, be sure that you are identified before trying again. The 'whoami' command can tell you if you're identified.
15:17 nooby joined #gluster
15:18 nooby does anyone know how to fix this error -> Initialization of volume 'management' failed, review your volfile again
15:18 nooby ?
15:20 Technicool joined #gluster
15:20 ndevos nooby: that error happens when /var/lib/glusterd/peers/ contains incorrect files - but there could be other reasons too
15:20 ndevos nooby: I've had it happening when /var was 100% full, glusterd could not update the peer-files correctly and on a reboot it failed to start
15:21 nooby same here
15:21 wushudoin joined #gluster
15:21 gmcwhistler joined #gluster
15:21 dbruhn I've had it corrupt the files when the partition containing /var/lib fills up too
15:22 nooby i had a 9gb glusterd logfile which filles up all the space
15:22 nooby gluster killed itself :)
15:22 dbruhn the gluster logs get really chatty when there is an issue, I've had some in excess of 16gb
15:24 kiwikrisp joined #gluster
15:26 nooby how can i fix this?
15:26 dbruhn Do you have a server that isn't messed up?
15:26 nooby the files in peer has a size of 0 byte
15:26 nooby sure
15:27 dbruhn What i've done in the past is grab all the volume files from one of the good servers and replaced the ones on the bad server.
15:27 dbruhn make sure you keep you /var/lib/glusterd/glusterd.info file
15:27 dbruhn and you'll need to correct your peer files
15:28 dbruhn also /var/lib/glusterd/vols/<vol>/bricks will need to be adjusted if I remember right
15:29 MrNaviPacho joined #gluster
15:31 bugs_ joined #gluster
15:35 neofob joined #gluster
15:39 _BryanHm_ joined #gluster
15:40 georgeh|workstat joined #gluster
15:40 qstep joined #gluster
15:41 qstep does anyone know how to restrict access to a gluster volume?
15:45 glusterbot <http://goo.gl/AegWo7> (at access.redhat.com)
15:46 verywiseman joined #gluster
15:48 samppah qstep: i'm using iptables to allow access per volume
15:48 rwheeler joined #gluster
15:49 mattapp__ joined #gluster
15:52 mattap___ joined #gluster
15:54 qstep smppah thanks! but even if you restrict access to a volume (also using gluster volume set auth.allow ....), what prevents other people in the network to set up a new volume and accessing that one
15:55 kiwikrisp Need some help confirming a corruption problem. 2 node replicate mode glusters providing a Storage Repository via NFS to XenServer. One of the .vhd (2TB) files seems unable to heal. I've disconnected the SR and have initiated the heal process 2 days ago, monitored via iotop and heal info. When the glusterfs processes disappeared out of iotop i checked the heal info and the .vhd file was still listed. Waited a
15:55 kiwikrisp while but id didn't clear out of the heal info. initiated the heal process again on the volume which hs been running for over a day with no write entries in iotop or volume top write. An thoughts on what I'm missing? why this volume won't heal?
15:55 qstep so i would like to restrict acces to a volme (gluster supports that as pointed out above) but i also want to restrict access to the management layer preventing other users to gluster peer probe the servers. how can i do that?
16:00 dusmant joined #gluster
16:00 samppah qstep: afaik you need to do peer probe from host that's included in cluster
16:00 mattapp__ joined #gluster
16:01 samppah of course it's also possible to use iptables for that.. gluster management listens port 24007
16:01 kkeithley_ ,,(ppa)
16:01 mattap___ joined #gluster
16:02 verywiseman when i run this `gluster volume create test-vol2  lab2:/stg2 lab3:/stg3 lab4:/stg4` , this error appear "volume create: test-vol2: failed"  , and no logs , what is problem ? though i run with me first time with another volume name
16:04 qstep smappah i have 3 computers available. i will check that out now. so the way its meant to be is just blocking the ports using iptables? seems to me as if there would be a feaure missing.
16:04 mattappe_ joined #gluster
16:05 verywiseman ok , i solved it :)_
16:06 _pol joined #gluster
16:06 qstep samppah you a right! Thanks for that! peer probing to an already existing cluster leads to:
16:06 qstep peer probe: failed: xxx.xxx.xxx.xxx is already part of another cluster
16:07 samppah qstep: good, thanks for the info :)
16:08 qstep Thanks for that! i didnt see this documented somewhere. that makes things a lot saver! And then it also makes sense to restrict volume access per volume using glusterfs's features. that means that iptables isnt needed at all.
16:08 mattap___ joined #gluster
16:12 mattapp__ joined #gluster
16:14 mattappe_ joined #gluster
16:18 _pol joined #gluster
16:19 dylan_ joined #gluster
16:24 vpshastry joined #gluster
16:30 vpshastry joined #gluster
16:34 glusterbot Title: #8523375 - Pastie (at pastie.org)
16:37 mattapp__ joined #gluster
16:37 StarBeast joined #gluster
16:39 mattap___ joined #gluster
16:44 jag3773 joined #gluster
16:48 dylan_ joined #gluster
16:48 jbd1 joined #gluster
16:56 getup- joined #gluster
16:59 Keawman joined #gluster
17:00 g4rlic anyone here have information on why `gluster volume create` would exit with -1 and say "volume create: aq-shared-vol: failed" ?
17:00 g4rlic This is on a fresh cluster, 3.4.1, running on CentOS 6.4
17:02 qstep joined #gluster
17:02 qstep i have trouble starting gusterd using systemctl
17:02 qstep systemctl start glusterd.service works fine
17:03 qstep but systemctl enable glusterd.service shows errors on next reboot that gluster coudnt be started.
17:03 brosner joined #gluster
17:03 skered- joined #gluster
17:05 davinder4 joined #gluster
17:07 mattapp__ joined #gluster
17:11 partner hello again, long time no see. i wonder if you might know the reason behind MKNOD / UNLINK permission denied issues? i recall seeing some bug listed but weird is this is 6th server i've added to a distributed setup and now struggling on not getting almost any data to this server.. running on debian wheezy with 3.3.2 gluster
17:12 LoudNoises joined #gluster
17:15 partner before upgrading to 3.4.1 or whatever will be the stable version at that point can you think anything to do to fix this situation?
17:15 qstep left #gluster
17:20 _polto_ joined #gluster
17:20 _polto_ joined #gluster
17:25 mattapp__ joined #gluster
17:27 mattapp__ joined #gluster
17:32 partner ups, made it private, doesn't matter: http://pastie.org/private/8d4raxhrbopof7djoyfxxg
17:35 JonnyNomad joined #gluster
17:41 Mo___ joined #gluster
17:42 vpshastry joined #gluster
17:42 glusterbot Title: nopaste.info - free nopaste script and service (at nopaste.info)
17:52 _pol joined #gluster
17:53 g4rlic Poking around in the code for the gluster client, I find that the reason it exits with no error message, is because the gluster servers themselves don't ever set one.
17:53 g4rlic SO it just exits and says "Sorry, I haven't the foggiest idea why it doens't work."
17:53 g4rlic This is frustrating.
17:54 shani joined #gluster
17:54 shani anyone there ?
17:54 shani im having a problem please
17:54 shani help me
17:55 shani the clients are disconnecting
17:55 shani after sometime
17:56 shani JoeJulian there ?
17:56 shani anybody here ?
17:57 g4rlic I think there's a lot of folks on holiday atm.  Channel's been pretty dead the last couple days.
17:57 shani i think so
17:57 shani can you help ?
18:00 Keawman shani, how do you know that they are disconnecting
18:00 shani actually the clients are getting disconnected
18:00 shani when they are using alot of resource
18:01 shani i have this in my Logs
18:01 shani :client_query_portmap_cbk] 0-datavol-client-0: failed to get the port number for remote subvolume
18:01 shani [2013-11-30 07:01:47.456361] I [client.c:1883:client_rpc_notify] 0-datavol-client-0: disconnected
18:04 raghu joined #gluster
18:04 shani and this error
18:04 shani client-0: remote operation failed: Stale file handle
18:05 g4rlic shani: probably not.  We're 10 seconds away from switching back to NFS.
18:05 shani i didnt understand
18:06 shani i wanna know how to fix it ?
18:07 dylan_ shani: i'm not an expert. but thinking that client might lost the connection to the gluster-node. maybe the volume port is not reachable to the client?
18:08 shani ports are already forwarded
18:08 shani actually all the clients and the servers are connected to a Gigabit switch
18:08 _pol joined #gluster
18:08 dylan_ so you are able to telnet to the port..?
18:09 shani yeah
18:09 g4rlic gluster peer status shows every node connected with unique UUID's?
18:10 shani yes
18:10 shani it happens to any client
18:10 shani when im under a huge traffic
18:10 shani its a high availability model
18:11 shani when Http requests are almost 50,000 i get this error
18:11 shani maybe glusterfs cant handle that huge sum of Input/output of data
18:11 g4rlic you're positive it's not a local brick I/O problem?
18:11 glusterbot Title: nopaste.info - free nopaste script and service (at nopaste.info)
18:12 shani yes
18:12 shani im forwarding the ports again
18:12 shani maybe i get luck this time
18:12 shani i;ll come later when im having the same problem again
18:23 skered- So I think we ran into this issue a while ago
18:23 skered- https://bugzilla.redhat.com/show_bug.cgi?id=764331
18:23 skered- and we turned off quota and everything just worked after that
18:24 skered- However, the bug is a dup of another and the other one isn't public
18:24 skered- How do I know if this was fixed and if so when?
18:25 skered- I see someone CCing themself on the bug 2 months ago even though it was closed 2 years ago
18:29 zaitcev joined #gluster
18:30 rotbeard joined #gluster
18:33 cogsu joined #gluster
18:33 partner can't access that either, must be super secret bug
18:33 g4rlic security bug?
18:33 partner Status: CLOSED DUPLICATE of bug 764339
18:34 partner sorry about that, was not supposed to paste anything
18:40 kkeithley_ they're both imports from the old gluster.com bugzilla. Off hand I don't see anything in 764339 that would necessitate it being locked/private, but I guess it came in that way so it stayed that way.
18:40 hagarth was accidentally marked as private. have unblocked it.
18:41 kkeithley_ JoeJulian: would you please restart glusterbot? Thanks
18:43 partner heh, i've had my share of fun trying to track how much its lagging behind :)
18:44 Keawman I'm using glusterfs-3.4.0-8.el6.x86_64.rpm from the gluster.org repos, but now i'm seeing that CentOS base has glusterfs.x86_64   3.4.0.36rhs-1.el6 does anyone know if this is a compatible upgrade or what issue i might run into if upgrading to the base version?
18:45 elyograg managed to duplicate https://bugzilla.redhat.com/show_bug.cgi?id=820518 ... looks like it's the cause of all my rebalance problems.
18:47 partner still trying to track my issue of not getting the newest brick into use, just loads of MKNOD permission denied errors and what not
18:50 skered- hagarth: Thanks
18:50 mattapp__ joined #gluster
18:51 partner its like the volume would be confused of its bricks and their hash shares
18:59 theron joined #gluster
18:59 kkeithley_ Keawman: it's compatible, but AFAIK CentOS, as with real rhel (updating my centos vm to confirm), only has client-side RPMs.  They're compatible in the sense that you could run a client box with rhs RPMs with a gluster.org 3.3 or 3.4 server. I wouldn't mix rhs and gluster.org RPMs on the same box, if only because if you run into problems it'll be easier to describe and debug.
19:02 g4rlic fwiw, when we had gluster 3.3 rpm's from Centos on the bricks, and gluster 3.4 RPM's on Fedora clients, we had all kinds of problems.  I wouldn't mix 3.3 and 3.4.
19:03 kkeithley_ CentOS has some other extras-like repo where they're going to have the full gluster.org set of RPMs too. kbsingh has more details.
19:03 kkeithley_ ,,(repo)
19:03 kkeithley_ @repo
19:04 kkeithley_ glusterbot, glusterbot, where art thou glusterbot
19:04 kbsingh kkeithley_: we paused that repo work, waiting to see what comes down the river with 6.5, now that its there, we should have a chat at somepoint, workout an action plan and just do it
19:05 kkeithley_ indeed, yes. johnmark ^^^
19:10 ricky-ti1 joined #gluster
19:13 * tqrst prods glusterbot
19:14 kkeithley_ And just to confirm, the glusterfs-3.4.0.36rhs-1 RPMs in CentOS 6.5 are in fact client-side only.
19:19 cogsu joined #gluster
19:21 zerick joined #gluster
19:26 kkeithley_ For clients, use either. For servers you'll want edit /etc/yum.repos.d/CentOS-Base.repo and add a line "exclude=glusterfs*"  for the base repo along with continuing to use the gluster.org repo.
19:27 vpshastry left #gluster
19:30 _pol joined #gluster
19:38 bgpepi joined #gluster
19:43 semiosis i sent JoeJulian a friendly email asking for help with glusterbot
19:44 gdubreui joined #gluster
19:44 semiosis ,,(meh)
19:56 g4rlic left #gluster
19:57 LoudNoises joined #gluster
20:01 jag3773 joined #gluster
20:05 nhm joined #gluster
20:09 cjh973 joined #gluster
20:14 RedShift2 joined #gluster
20:17 calum_ joined #gluster
20:21 jdarcy joined #gluster
20:26 Keawman kkeithley, thank you that was the exact answer I was looking for. I think I will stay with the gluster repos for both clients and servers.
20:59 partner on 3.3.2 series the remove-brick should migrate the data off, right? the docs are not exactly up to date anywhere so its a bit guesswork/googling around and then trying to figure out which source to believe.. i could of course try it out on some testing env
21:00 partner not sure whats up with the sixth added brick but its just not getting the data as it should so i was thinking of maybe getting rid off it, cleaning up and adding back
21:01 dbruhn partner, did you rebalance, or at least run fix-layout when you added the brick?
21:01 partner yeah, i've run fix-layout
21:02 dbruhn Did it finish?
21:03 partner added fifth one not so long ago and it started to suck in files without any issues and its actually getting most of them currently as rest are topping the limits
21:03 dbruhn If you run a rebalance it will redistribute the files in the file system
21:04 partner according to their hashes yes
21:04 partner i'm getting lots of MKNOD/UNLINK permission denied entries to brick log
21:04 partner 19:32 < partner> ups, made it private, doesn't matter: http://pastie.org/private/8d4raxhrbopof7djoyfxxg
21:05 cjh973 joined #gluster
21:05 partner i've done this same operation five times in a row and only now having any issues with the actions, not sure why so i'm a bit puzzled on how to resolve
21:06 partner i mean, adding new bricks to distributed setup
21:06 glusterbot <http://goo.gl/5ohqd> (at joejulian.name)
21:07 dbruhn sounds like from the error you maybe have a permissions issue on your brick?
21:08 partner what sort of permission issue there can be?
21:08 dbruhn A lot, if you look at that log you posted it's failing to set the extended attributes
21:08 dbruhn Gluster uses extended attributes for a lot of its operations
21:09 partner yup
21:09 dbruhn check and make sure that the user gluster is running under has permissions to write/modify the data it's trying to store
21:09 dbruhn also
21:09 dbruhn have you disabled selinux? Assuming you are running a RHEL variant here.
21:10 partner debian wheezy on all 6 servers
21:10 B21956 joined #gluster
21:10 partner no selinux or anything such, all identical servers
21:10 dbruhn file system you are running under the bricks?
21:11 partner commands copypasted from the installation docs i've made, all the way from raid creation to filesystems and stuff
21:11 partner its xfs
21:12 gdavis331 joined #gluster
21:12 dbruhn what user is the gluster service running under?
21:14 dbruhn and more specifically the glusterfs service
21:14 partner runs under root
21:14 partner all of its services
21:15 partner installed from the packages available from download.gluster.org
21:15 dbruhn ok, what are the permissions on the brick directory?
21:15 dbruhn You'll have to forgive me I am in RHEL land so I have no idea what get's installed for the deb/ubuntu packages
21:16 partner brick permissions are same on all servers, the disk / brick is root:root and stuff under that is owned by the service user responsible on reading/writing the actual data
21:16 semiosis dbruhn: pretty much the same, slightly different initscript
21:17 dbruhn I assumed as much, but just wanted to be clear on where there might be a gap in knowledge.
21:17 jag3773 joined #gluster
21:17 partner the puzzling part here really is its all copypasted from the docs i've created for the fellow colleagues for installing a fresh glusterfs server to join this/these volume(s)
21:18 partner i wonder if i managed to do something wrong anyways at some point, i was very tired and having fever when i was forced to proceed with these.. just can't figure out what
21:20 dbruhn partner, you are talking about a set of documents that no one in this channel has seen, so I am not sure if there is a problem with the guide you are following or what. But the errors you are seeing clearly show a permissions error, so if you want we can try and work through the issue.
21:21 tqrst fix-layout aborts completely if a folder it was working on gets deleted from the volume in the background. Is there any way to resume fix-layout after it aborts? I'm not a huge fan of the "keep running it until it doesn't abort" option given that it takes ages for just one run through it.
21:21 tqrst (already filed a bug for this, just wondering if there is a workaround I could use in the meantime)
21:22 partner dbruhn: the document i am talking about is about our internal instructions on how this whole thing was set up, there is nothing fancy there, just a bunch of commands with comments with which you will end up doing identical setups
21:23 partner googling around i've found some similar cases, maybe not exactly the same but: https://bugzilla.redhat.com/show_bug.cgi?id=913699
21:24 partner comment 10 especially
21:25 partner dbruhn: its almost midnight but sure if you guys can help pointing out my mistake i'm more than happy to work through it
21:27 dbruhn partner, what version are you running?
21:27 partner for somehow there's 276 GB data there which is valid and can be accessed via client mount, the puzzling part is why does not the rest of the data end up there, should be already terabytes of stuff around
21:27 partner 3.3.2
21:28 partner there was remote window today to upgrade to 3.4.1 but i was lacking the firewall rules for new brick ports so i couldn't..
21:34 partner hmm it seems all the servers are currently complaining same stuff on brick logs MKNOD or UNLINK fails to permission denied
21:36 dbruhn thats no good
21:36 nikk_ joined #gluster
21:36 partner well at least its still online and storing files..
21:39 nikkk i'm setting up some gluster test hosts.. how do i prevent the delay/timeout associated with losing a server?
21:39 nikkk all clients hang until the timeout is reached
21:39 nikkk sorry to jump in with a question, been pounding my head on this :]
21:40 semiosis nikkk: use 'gluster volume set help' to see about changing options.  the one you're interested in is called network.ping-timeout
21:40 nikkk so just to clarify, if i lose one of the server nodes there will always be a delay on all clients, right?
21:41 nikkk i saw that option, also saw people saying not to set it too low
21:43 nikkk one other question and i'll be out of everyone's hair, it's concerning architecture - clients always connect to a single server, right?  i can't have host1 aware of more than one server.  just wondering what the point of having additional servers is other than for additional storage space.
21:44 dbruhn nikkk, with the fuse client it actually connects to all of the gluster peers that create a volume
21:45 dbruhn it connects to the first server and is provided a manifest of all of the servers that make up the volume as well as the brick information
21:47 elyograg on my testbed, I can't seem to mount via NFS.  Connection times out.  what PEBCAK things should I check?
21:48 elyograg gluster volume status shows that the NFS server is running on all machines.
21:49 nikkk dbruhn: alright that makes sense, i saw they were connected w/netstat.  i've had a hard time finding good documentation for gluster in general, it's mostly just basic how-to's so these little things get overlooked.
21:49 dbruhn I am with you there, and then once you know, you are past the need for the info so it gets forgotten.
21:49 semiosis elyograg: use mount options -t nfs -o tcp,vers=3
21:50 semiosis elyograg: on the server side, need to have rpcbind/portmap running & no other nfs server running for the gluster nfs server to work
21:50 elyograg semiosis: thanks.  mounted right up.
21:50 semiosis great
21:52 elyograg I was about to try that, although it seems like I didn't need to do that the last time.  been a while, could be wrong.
21:58 nikkk dbruhn: i'll just start making my own gluster wiki :]
21:59 dbruhn nikkk, do one better and just add to the one that's there!
21:59 abyss^ I'd like to move first gluster server to another localization (whole VM, so id disc etc will not change) and turn off the second gluster server, then prepare new gluster server (new virtual machine) and I'd like to sychronize the first server with that third server. Can I find any tutorial how to do it in polite way?:)
21:59 dbruhn all you have to do is sign up on it to be able to edit and modify
22:00 dbruhn nikkk, here is the link http://www.gluster.org/community/documentation/index.php/Main_Page
22:01 nikkk dbruhn: yeah i had been through that, it doesn't really go too in depth but it got me to the point i'm at now
22:03 nikkk what do you think a safe value for network.ping-timeout is?  when i tested before i had reduced to 10 seconds and sometimes my clients would get disconnected for no apparent reason and i'd have to unmount/remount
22:03 nikkk (this was a few months ago)
22:04 theron joined #gluster
22:04 RicardoSSP joined #gluster
22:04 RicardoSSP joined #gluster
22:05 rwheeler joined #gluster
22:06 sac`away` joined #gluster
22:11 cogsu joined #gluster
22:13 cogsu joined #gluster
22:13 sprachgenerator joined #gluster
22:15 partner ohwell, i've ran out of ideas, it "works" so i shall see what happens once the remaining bricks fill up, fix-layout will run probably another few weeks (while it was no obstacle for the previous brick to be taken into use)
22:16 partner thanks for your assistance dbruhn & al
22:17 elyograg if I have entries in "gluster volume heal testvol info" how do I fix?  I can find plenty of info on fixing split-brain, but not this.
22:20 cogsu joined #gluster
22:20 elyograg I deleted all the files (via the fuse mount) that I know got screwed up by my rebalance attempt, I recreated bug 820518 or the bug that it duplicates.
22:20 neofob left #gluster
22:22 elyograg those deletes didn't get rid of the problem.
22:23 elyograg here's the heal info output: http://fpaste.org/58796/61093871/
22:27 mkzero joined #gluster
22:28 bet_ joined #gluster
22:29 SpeeR joined #gluster
22:33 _pol joined #gluster
22:34 _polto_ joined #gluster
22:35 theron joined #gluster
22:59 semiosis @qa releases
23:00 semiosis oh right
23:00 * semiosis observes 3.5.0qa2 is available - http://bits.gluster.org/pub/gluster/glusterfs/3.5qa2/
23:04 tqrst yay
23:05 tqrst before I waste my time doing it, has anyone written a script to anonymize gluster logs?
23:05 tqrst (something to scrub all paths/hostnames before submitting bug reports)
23:06 devoid joined #gluster
23:07 devoid what's the best way to do fstab entries for gluster?
23:08 diegows joined #gluster
23:10 bgpepi joined #gluster
23:10 devoid e.g. if I have 100 servers, hostnames server[1-100], and this entry in fstab:
23:10 devoid server1:7997:/test-volume /mnt/glusterfs glusterfs defaults,_netdev 0 0
23:10 tqrst devoid: round robin dns
23:11 devoid round robin will work? the gluster client won't freak out?
23:11 tqrst for dubious values of work
23:11 tqrst if you try to mount and the host name resolves to a server that happens to be dead, you're out of luck
23:12 elyograg you only have to talk to a shared name or IP address at mount time.  Once the FUSE client is connected, it downloads the volume info and connects directly to all the bricks in the volume.
23:12 tqrst but then you can just try again
23:12 tqrst also, what elyograg said
23:12 devoid ah ok
23:12 devoid elyograg, that was what I was concerned about
23:12 tqrst and with that many servers, P(server is dead) is pretty low unless you work at the same company as I do
23:13 devoid tqrst we wanted to avoid setting up a local dns, could we do haproxy for the initial mountpoint?
23:14 devoid or to put another way, elyograg, how do bricks indicate how to reach them?
23:14 elyograg What I would do is set up ucarp on some of the servers so they maintain a shared IP for mounting.  I mount virtually all clients via NFS, so I have to use this method.
23:14 tqrst devoid: not sure (and I have to go idle for a bit)
23:14 devoid ah ok
23:17 elyograg semiosis: what can you tell me about fixing problems shown in heal info?  Searching, I can only find info about split-brain, but split-brain shows no entries.
23:18 semiosis elyograg: not much
23:21 cogsu joined #gluster
23:23 semiosis elyograg: i dont have much experience with that command.  my production cluster is.... older
23:23 tqrst devoid: if you don't mind hackish things, you could always create a script /sbin/mount.glusterhack which tries to mount from server 1 through 100 until it finds one that works. Then just mount -t glusterhack [parameters].
23:24 dbruhn doesn't that command show the files that are needing healing, I think the only ones you need to worry about are the failed ones or the split-brain ones
23:24 tqrst (mount delegates -t foo to /sbin/mount.foo)
23:24 dbruhn if you stat the files it should trigger a manual heal on the file and correct it
23:24 semiosis .... and check client log file for info on that
23:25 tqrst I'm somewhat tempted to do that, actually...
23:28 dbruhn tqrst, What would be really cool is if it could auto populate on the first connection with all of the servers it can connect to, and update every time it connect.
23:28 semiosis how is this any different than RR-DNS?
23:28 dbruhn So first time you run the mount command it gets a list of servers it can connect to in the future.
23:28 tqrst semiosis: rr-dns doesn't keep trying if it fails
23:29 semiosis what?!
23:29 tqrst or does it?
23:29 semiosis hmmm
23:29 semiosis should check that
23:29 dbruhn nope rrdns just returns an ip address
23:29 tqrst I'm guessing it goes "gluster client resolves $hostname, tries to connect, fails, tries again on same ip or gives up"
23:29 dbruhn it just loops through the addresses on each request
23:29 duende joined #gluster
23:30 semiosis hmm, we've been telling people to use rr-dns for this for a long time, i assumed it worked
23:30 semiosis but never tried it myself
23:30 semiosis have any of your tried using rr-dns?
23:30 dbruhn you need to use something like netdev_ to make it retry, but I am not sure if that works for anything more than NFS
23:30 tqrst I always assumed it didn't work, but still use it to get rid of hardcoded server names
23:30 semiosis s/your/you/
23:30 dbruhn I used to use it as a poor mans load balancer for a website
23:31 dbruhn never for gluster
23:31 semiosis i mean, tried with glusterfs mount server address
23:31 duende hey guys.. quick question... correct me if im wrong... if i create a gluster replica volume.... is it load balanced? like... it does indeed choose which server has lesser load? or do i need other configurations for that?
23:31 semiosis i've heard that the gluster client can take advantage of rr-dns.  it's worth a try if anyone has time to do it
23:32 dbruhn duende, the client requests from all replicants and uses whatever data is presented first
23:32 semiosis duende: writes go to all replicas, reads are "intelligently" balanced
23:32 tqrst unfortunately all booked out on experimentation time - if I do anything involving poking around with gluster, it will be tracking down the rebalance memory leak :p
23:32 tqrst maybe one day I will be able to rebalance my volume
23:33 semiosis no matter how solid they can make rebalance, i will always think of it as throwing all my data up in the air like a game of 52 pickup
23:34 semiosis not appealing
23:34 tqrst it's still better than what I have right now, namely 10 bricks with 20% usage and 40 with >80%
23:34 duende semiosis: thanks!
23:34 ctria joined #gluster
23:34 semiosis tqrst: build a new cluster & migrate to it?
23:34 semiosis ok i know that's probably impractical
23:35 tqrst semiosis: I don't have 30 spare hard drives
23:36 tqrst my best guess was fix-layout + moving files off and copying them back, but even fix-layout never finishes running because of another glitch
23:37 tqrst on top of being silly
23:40 georgeh|workstat joined #gluster
23:41 tqrst maybe rebalance doesn't have a leak per se but does something silly like keeping the list of all processed files in memory - I do have 30-35 million files across 250k folders
23:43 partner sounds like my volume except there's probably 10 times the files :o
23:48 partner i was never able to rebalance due to bug with leaking the file handlers, thats now fixed, i shall try again once/if the fix-layout finishes
23:49 tqrst if you notice memory usage issues, please chime in to bug 985957
23:49 partner what version are we talking about?
23:49 tqrst all of them
23:49 partner ah, ok, i'll have a look
23:49 tqrst I've had that problem since 3.2.x through 3.4.1
23:54 georgeh|workstat joined #gluster

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary