Perl 6 - the future is here, just unevenly distributed

IRC log for #gluster, 2014-03-03

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:02 chirino joined #gluster
00:14 mattappe_ joined #gluster
00:37 mattappe_ joined #gluster
00:47 diegows joined #gluster
00:57 mattapperson joined #gluster
01:12 doekia does rpcbind and nfs-common has to be started before glusterfs-server/client or the orther way arround?
01:26 tokik joined #gluster
01:51 harish_ joined #gluster
02:02 kevein joined #gluster
02:05 cjanbanan joined #gluster
02:15 jporterfield joined #gluster
02:23 haomaiwang joined #gluster
02:34 jporterfield joined #gluster
02:41 harish_ joined #gluster
02:50 harish_ joined #gluster
02:55 bala joined #gluster
02:56 georgeh|workstat joined #gluster
03:02 mattapperson joined #gluster
03:19 bharata-rao joined #gluster
03:51 cjanbanan joined #gluster
03:53 gdubreui joined #gluster
03:57 ira joined #gluster
04:04 aravindavk joined #gluster
04:05 cjanbanan joined #gluster
04:06 CheRi joined #gluster
04:08 bala joined #gluster
04:19 hagarth joined #gluster
04:19 cjanbanan joined #gluster
04:22 mohankumar__ joined #gluster
04:24 ndarshan joined #gluster
04:29 jporterfield joined #gluster
04:31 shylesh joined #gluster
04:32 georgeh|workstat joined #gluster
04:33 shubhendu joined #gluster
04:34 mohankumar__ joined #gluster
04:34 cjanbanan joined #gluster
04:35 ppai joined #gluster
04:47 kaushal_ joined #gluster
04:48 mohankumar__ joined #gluster
04:53 nshaikh joined #gluster
04:56 mohankumar__ joined #gluster
04:57 davinder joined #gluster
05:00 cjanbanan joined #gluster
05:05 mohankumar__ joined #gluster
05:11 saurabh joined #gluster
05:13 mohankumar__ joined #gluster
05:14 rjoseph joined #gluster
05:16 vravn joined #gluster
05:23 meghanam joined #gluster
05:23 meghanam_ joined #gluster
05:26 aravindavk joined #gluster
05:26 mohankumar__ joined #gluster
05:33 jporterfield joined #gluster
05:37 sputnik1_ joined #gluster
05:37 sputnik13 joined #gluster
05:37 rahulcs joined #gluster
05:38 jporterfield joined #gluster
05:39 CheRi joined #gluster
05:40 CheRi joined #gluster
05:42 deepakcs joined #gluster
05:49 haomaiw__ joined #gluster
05:49 lalatenduM joined #gluster
05:51 hagarth joined #gluster
05:53 gdubreui joined #gluster
05:53 ajha joined #gluster
05:55 kdhananjay joined #gluster
05:57 cjanbanan joined #gluster
05:59 sprachgenerator joined #gluster
06:07 jporterfield joined #gluster
06:09 chirino_m joined #gluster
06:10 cjanbanan joined #gluster
06:12 ngoswami joined #gluster
06:13 vpshastry1 joined #gluster
06:13 glusterbot New news from newglusterbugs: [Bug 1065631] dist-geo-rep: gsyncd in one of the node crashed with "OSError: [Errno 2] No such file or directory" <https://bugzilla.redhat.co​m/show_bug.cgi?id=1065631>
06:14 vimal joined #gluster
06:15 raghu` joined #gluster
06:16 hchiramm_ joined #gluster
06:16 wgao joined #gluster
06:18 Philambdo joined #gluster
06:18 rahulcs joined #gluster
06:20 rastar joined #gluster
06:20 Alex "New news from new".
06:26 cjanbanan joined #gluster
06:26 hagarth joined #gluster
06:27 rahulcs joined #gluster
06:31 criticalhammer joined #gluster
06:33 rfortier joined #gluster
06:35 rfortier1 joined #gluster
06:42 cjanbanan joined #gluster
06:54 ngoswami joined #gluster
06:55 vimal joined #gluster
07:00 nshaikh joined #gluster
07:00 ngoswami joined #gluster
07:00 vimal joined #gluster
07:00 hagarth joined #gluster
07:03 criticalhammer Hi, anyone home?
07:03 criticalhammer No sleep for the wicked
07:04 aravindavk joined #gluster
07:07 RameshN joined #gluster
07:08 rahulcs joined #gluster
07:12 ngoswami joined #gluster
07:12 haomai___ joined #gluster
07:18 vimal joined #gluster
07:21 ThatGraemeGuy joined #gluster
07:21 CheRi joined #gluster
07:23 jtux joined #gluster
07:25 cjanbanan joined #gluster
07:27 rgustafs joined #gluster
07:30 pvh_sa joined #gluster
07:38 deepakcs joined #gluster
07:38 cjanbanan joined #gluster
07:41 rossi_ joined #gluster
07:42 ekuric joined #gluster
07:47 vravn joined #gluster
07:48 hagarth joined #gluster
07:52 ngoswami joined #gluster
07:53 ctria joined #gluster
08:03 aravindavk joined #gluster
08:05 jporterfield joined #gluster
08:05 sputnik13 joined #gluster
08:05 sputnik13net joined #gluster
08:08 nshaikh joined #gluster
08:10 chirino joined #gluster
08:11 ricky-ti1 joined #gluster
08:14 glusterbot New news from newglusterbugs: [Bug 1071800] 3.5.1 Tracker <https://bugzilla.redhat.co​m/show_bug.cgi?id=1071800>
08:15 keytab joined #gluster
08:17 criticalhammer Anyone here, alive, have any infiniband and glusterfs experiences?
08:21 CheRi joined #gluster
08:23 ngoswami joined #gluster
08:25 nshaikh joined #gluster
08:26 nshaikh joined #gluster
08:30 vimal joined #gluster
08:30 rahulcs joined #gluster
08:31 T0aD we're all dead
08:31 T0aD sorry
08:35 haomaiwa_ joined #gluster
08:36 ngoswami joined #gluster
08:37 nshaikh joined #gluster
08:44 rastar joined #gluster
08:45 lalatenduM joined #gluster
08:47 andreask joined #gluster
08:56 unlocksmith joined #gluster
08:57 circ-user-Ld79z joined #gluster
08:59 liquidat joined #gluster
09:01 morse joined #gluster
09:04 rahulcs joined #gluster
09:06 Pavid7 joined #gluster
09:09 X3NQ joined #gluster
09:12 hagarth joined #gluster
09:12 chirino_m joined #gluster
09:14 Norky joined #gluster
09:18 cjanbanan joined #gluster
09:20 aravinda_ joined #gluster
09:36 qdk joined #gluster
09:43 harish_ joined #gluster
09:52 gdubreui joined #gluster
09:58 aravinda_ joined #gluster
10:06 bala joined #gluster
10:07 hagarth joined #gluster
10:19 ricky-ticky joined #gluster
10:23 Slash joined #gluster
10:27 social hagarth: ping
10:34 davinder2 joined #gluster
10:36 cjanbanan joined #gluster
10:48 aravinda_ joined #gluster
10:50 YazzY hi guys
10:53 YazzY i was playing with gluster and KVM. I have two gluster nodes mounted on the KVM server with FUSE. Everything was running fine when I rebooted the secondary server
10:54 YazzY but when I rebooted the primary server with the secondary running, the virtual guest broke. The file system was totally damaged
10:54 reuss left #gluster
10:55 YazzY any idea why that happened? Isn't glusterfs suppose to give me a consistent file system to run VMs on?
10:59 gmcwhistler joined #gluster
11:00 jporterfield joined #gluster
11:03 gmcwhistler joined #gluster
11:03 jurrien joined #gluster
11:06 kanagaraj joined #gluster
11:10 andreask YazzY: sound like the vm image was in use concurrently
11:12 hagarth joined #gluster
11:14 ricky-ticky joined #gluster
11:17 P0w3r3d joined #gluster
11:23 shyam joined #gluster
11:24 shyam left #gluster
11:25 shyam joined #gluster
11:33 YazzY andreask: it was in use, running
11:34 andreask I mean by more than more kvm processes
11:34 YazzY andreask: it was running only on one server
11:36 YazzY i only have one KVM server running
11:39 andreask then it should not be damaged
11:39 aravindavk joined #gluster
11:40 YazzY andreask: but it was...
11:40 YazzY the entire boot sector was destroyed
11:41 YazzY I'm using glusterfs 3.4.2
11:45 social is there way for glusterfs brick to copy file/directory from another brick?
11:45 criticalhammer left #gluster
11:45 hchiramm__ joined #gluster
11:47 chirino joined #gluster
11:52 YazzY social: mount them and rsync
11:53 Pavid7 joined #gluster
11:54 liquidat_ joined #gluster
11:55 samppah YazzY: did you let self heal finish after booting first server?
11:55 ricky-ticky joined #gluster
11:59 rgustafs joined #gluster
12:01 liquidat joined #gluster
12:02 harish_ joined #gluster
12:07 rahulcs joined #gluster
12:10 calum_ joined #gluster
12:14 stickyboy I think I'm having an issue with locking... getting sporadic issues on my compute cluster with ~/.Xauthority* and ~/.cpan/.lock etc... if I wait a few minutes the errors go away.
12:14 Philambdo joined #gluster
12:22 CheRi joined #gluster
12:26 hagarth joined #gluster
12:30 edward1 joined #gluster
12:32 social hagarth: I'm looking into storage/posix.c when there is add brick probably the one taking care of subdirs permissions is self heal of dht/afr? if yes shouldn't storage/posix translator in reconfigure drop the posix_do_chmod and shouldn't be selfheal forced uppon reconfigure even on brick root? this way there wouldn't be need to store brick-gid and brick-uid options and acls wouldn't break with add-brick
12:36 ccha joined #gluster
12:37 rwheeler joined #gluster
12:40 cjanbanan joined #gluster
12:41 bala joined #gluster
12:51 cjanbanan joined #gluster
12:51 CheRi joined #gluster
12:54 criticalhammer joined #gluster
12:56 criticalhammer anyone, alive, have working experience with QRD infiniband and gluster?
12:56 criticalhammer err QDR*
13:00 criticalhammer left #gluster
13:00 jclift criticalhammer: There are some people around..
13:00 jclift Doh
13:05 cjanbanan joined #gluster
13:08 rahulcs joined #gluster
13:10 bala joined #gluster
13:10 YazzY samppah: how do I check status of self heal?
13:12 ricky-ticky joined #gluster
13:19 ctria joined #gluster
13:32 vpshastry joined #gluster
13:37 sroy joined #gluster
13:37 davinder joined #gluster
13:38 Kins joined #gluster
13:44 nage joined #gluster
13:44 nage joined #gluster
13:45 atrius` joined #gluster
13:53 swat30 joined #gluster
13:54 doekia Got issue unable to mount gluster volume thru nfs client at startup ... Debian Wheezy... mount -a once booted does the job. Seems like startup dependency problem... anyone to advise?
14:05 ctria joined #gluster
14:20 japuzzo joined #gluster
14:23 keytab joined #gluster
14:24 bennyturns joined #gluster
14:24 andreask left #gluster
14:26 theron joined #gluster
14:29 aixsyd joined #gluster
14:30 shyam joined #gluster
14:31 aixsyd Hey guys - I'm having a bit of speed issues with Glusterfs - I'm running KVM/QEMU VM's off a cluster, and some VMs boot up quickly, but others take 15-20 minutes. watching IOtop on the gluster nodes, its just sitting there churning away at reading data in the 10-20MB/s range.
14:32 aixsyd only thing I know ive done is set the amount of threads for low, medium, normal and high processed to more than 16 - I have 8 cores. Is this, perhaps, a source of the slowness? =\
14:32 rahulcs joined #gluster
14:32 aixsyd wow, i set them to 64, each
14:33 keytab joined #gluster
14:39 rahulcs joined #gluster
14:39 aixsyd wow, reset the cluster, getting rid of those options, and holy hell is it quicker
14:41 rahulcs joined #gluster
14:42 an_ joined #gluster
14:56 rpowell joined #gluster
14:58 cjanbanan joined #gluster
14:58 aixsyd just to confirm - with GlusterFS sharing out NFS, there is no automatic failover in case of a brick failure, correct?
15:00 lalatenduM aixsyd, yes thats right. if you want failover u should try CTDB+NFS
15:01 lalatenduM aixsyd, wait, what do you mean by automatic failover in case of a brick failure? you mean automatic failover of gluster nodes? or "bricks"?
15:01 aixsyd either.
15:01 meghanam joined #gluster
15:02 meghanam_ joined #gluster
15:03 lalatenduM aixsyd, in case of brick failure (but not node failure) if you have a replicated volume , the other brick in the replica pair would serve the requests, so from application/client point of view there is no difference
15:03 aixsyd right right, but a node failure would cause issues
15:03 aixsyd specifically the node that is being actively read/written
15:03 aixsyd (if thats the one a client is connected to)
15:04 lalatenduM aixsyd, yes, as you have mounted the nfs export using the node's IP
15:04 lalatenduM aixsyd, in this scenario CTDB will help
15:04 aixsyd reading about CTDB now - sounds similar to heartbeat/pacemaker and drbd
15:04 lpabon joined #gluster
15:04 lalatenduM aixsyd, yes thats right
15:04 aixsyd any reason not to use those vs CTDB?
15:05 tdasilva joined #gluster
15:06 plarsen joined #gluster
15:06 Philambdo joined #gluster
15:06 lalatenduM aixsyd, I have not used heartbeat/pacemaker and drbd. So can't tell you about those
15:07 aixsyd Gotcha gotcha.
15:07 ndevos aixsyd: I use pacemaker for IP-failover in my home-setup that does not have samba - systems that provide samba shares usually have CTDB already, it's easier to re-use that
15:07 aixsyd ndevos: i'm looking to use NFS for virtual machine storage - that its hopefully a bit quicker than Gluster API/FUSE
15:08 ndevos aixsyd: qemu+libgfapi will give you best performance
15:08 lmickh joined #gluster
15:09 aixsyd er, as opposed to NFS
15:09 aixsyd over NFS?
15:10 ricky-ticky joined #gluster
15:10 rpowell1 joined #gluster
15:10 ndevos aixsyd: yeah, its like [VM/qemu]-[vfs]-[nfs/client]-[nfs/server]-[brick] vs [VM/qemu/libgfapi]-[brick]
15:10 aixsyd that makes sense
15:11 aixsyd and thats what i'm already using/doing.
15:12 doekia Got issue unable to mount gluster volume thru nfs client at startup ... Debian Wheezy... mount -a once booted does the job. Seems like startup dependency problem... anyone to advise?
15:12 ndevos I think the advise for VMs is to disable write-behind in the volume, have you done that too?
15:12 aixsyd ndevos: no i havent.
15:13 aixsyd ndevos: its "performance.write-behind off" ya?
15:14 aixsyd holy shit - i should NOT have done that while VMs were running
15:15 ndevos aixsyd: yes, sounds like it
15:15 aixsyd all ym VMs powered off
15:15 aixsyd *my
15:16 ndevos aixsyd: I think it causes a reload of the client, so in case of qemu+libgfapi it may trigger a restart of the process - but I dont really know how its handled
15:16 lalatenduM doekia, it seems the nfs services on the client side are starting after it tries to mount it using /etc/fstab, are u using "_netdev, vers=3, proto=tcp " as the options?
15:16 aixsyd ndevos: it sure did :P
15:16 doekia yes i did
15:17 Pavid7 joined #gluster
15:17 doekia localhost:/www  /var/www        nfs vers=3,proto=tcp,nobootwait,defaults,_netdev      0 0
15:17 doekia the node it either gluster server and nfs client if that is having an impact
15:18 aixsyd ndevos: oh wow, i'm seeing better performance
15:18 ndevos doekia: its not advised to run an nfs-client and the gluster-nfs server on the same system - mounting localhost over nfs introduces conflicts between the locking-implementation of the nfs-client and server
15:18 ndevos aixsyd: nice!
15:19 doekia I tried shuffling rc sequence removing $network_fs from init.d/glusterfs-server script (which make no sense in my mind) but this time gluster starts before rpcbind and it is not good either
15:19 * ndevos disables write-behind in his CloudStack test system too now
15:19 aixsyd i was getting about 50MB/s read 30MB/s write. gonna test again once all the vms stop loading
15:19 lalatenduM doekia, yup agree with ndevos
15:21 doekia ?? it is done almost as per the documentation... each node have replicated copy of the brick of the volume and can in disaster scenario live on their own
15:21 ndevos doekia: the issue is that both the nfs-server and the nfs-client need to register NLM (for locking) in the portmapper/rpcbind, only one is able to register at the same time, so the other breaks
15:22 doekia appart from the startup sequence I have seen no problem so far (and hope this will stays like that :-=)
15:22 ndevos doekia: the locking of files on nfs will likely not work in your environment, that can cause data corruption
15:23 doekia actually I have mount -a + apache2 start in my rc.local  and unattended shutdown reboot ... seems not to causes any brick carruption ...
15:24 ndk joined #gluster
15:24 doekia To be honest my node sparsly write on the volume ... and usually it is alway the same
15:25 Peanut joined #gluster
15:26 doekia I was thinking that lock I case of NFS client be handled by the rpc nlockmanager
15:26 doekia then write occurs in some sort of atomic on the volume
15:27 doekia I case = in case sorry
15:28 aixsyd ndevos: 33MB/s write, 67MB/s read
15:28 aixsyd so about 17MB/s better read =O
15:28 ndevos aixsyd: thats impressive :)
15:28 aixsyd ndevos: very happy about that. thank you!
15:29 aixsyd and thats with running ZFS under GlusterFS
15:29 ndevos aixsyd: you're welcome!
15:29 seapasulli joined #gluster
15:29 ndevos what, zfs?!
15:30 jruggiero joined #gluster
15:30 jruggiero left #gluster
15:31 jobewan joined #gluster
15:32 lalatenduM aixsyd, zfs ?
15:33 doekia nedos: where have you seen this recommendation not to run server/client nfs on the same node?
15:33 ndevos doekia: that is my recommendation ;)
15:34 doekia ndevos: ?? any real case you faced or indepth knowlege of the code?
15:35 ndevos doekia: yes, the basics is that only one NLM implementation can get registered at rpcbind/portmap
15:35 rgustafs joined #gluster
15:36 doekia but I only got on... the local machine nfs client
15:36 ndevos doekia: the easiest is to mount the nfs export with "-o nolock", that at least keeps the NLM from gluster/nfs working correctly
15:37 ndevos doekia: locking is optional for NFSv3, but well, if you do not use it, you can get data corruption - locking availeble in the NFS-server, does not help if an NFS-client does not use it
15:38 doekia does nolock also affect the nfs caching ? since the main reason to go nfs client vs native client was cache ... php app
15:38 doekia I am a bit lost with what you says...
15:39 doekia The machine mount the local gluster end-point
15:39 ndevos I dont think it affects caching, but not using locking will increase performance a little (no need to do the locking procedures)
15:39 doekia and the machine is the only one accessing this port/lock map service
15:40 ndevos well, the NFS-server registers a NLM handler at portmap/rpcbind, so does the NFS-client
15:40 doekia I'll do as you say... add the nolock option ... this howerver does not address my initial problem ... startup sequence
15:41 doekia You misunderstood me ... I have no nfs server at all ... gluster server presenting nfs aware volume
15:41 ndevos the NFS-client and NFS-server in the Linux kernel both use the same NLM implementation (lockd kernel module), and that works for these kernel NFS services
15:41 mkzero joined #gluster
15:42 ndevos teh problem occurs when there are two different NLM implementations needed - Gluster/NFS has a userspace NLM implementation, and the NFS-client has the kernel-space lockd module
15:43 ndevos so, only Gluster/NFS *or* the Linux kernel NFS-client can register their NLM handler in rpcbind/portmap - requests to the not-registered handler will get lost
15:44 zerick joined #gluster
15:45 doekia I'm lost again ... I got the feeling there is not GlusterNFS client ... only the glusterserver which exhibit and NFS end-point ... for NFS client we rely on the Linux NFS-client ... am I wrong?
15:46 ndevos that is correct, there is a Gluster/NFS-server and a Linux kernel NFS-client
15:47 ndevos both communicate about locks through the NLM protocol, and rpcbind/portmapper is used to locate the NLM-protocol-handler (Gluster/NFS-server or lockd-kernel-module) through rpcbind/portmap
15:47 verdurin joined #gluster
15:48 theron joined #gluster
15:48 bennyturns joined #gluster
15:48 ndevos so, if the nfs-client needs to do some locking, the nfs-client checks in rpcbind/portmap what port the NLM-handler on the NFS-server is listening on
15:50 ndevos this will not work corretly when the kernel-module lockd is loaded, the NFS-client will try to connect to it's own NLM-handler, and not to the one from teh Gluster/NFS-server
15:51 doekia sounds good. The nolock get implemented and bench look pretty similar ... not significant changes either write/read/stat/delete
15:52 doekia I use this benchmark http://www.gluster.org/pipermail/gl​uster-users/2013-April/035931.html which is pretty good for my case and may be worth to be mentionned on the factoids
15:52 glusterbot Title: [Gluster-users] Low (<0.2ms) latency reads, is it possible at all? (at www.gluster.org)
15:52 rahulcs joined #gluster
15:53 RayS joined #gluster
15:54 doekia @latency
15:54 glusterbot doekia: 0.25 seconds.
15:54 doekia @low latency
15:55 doekia ~latency | doekia
15:55 glusterbot doekia: I do not know about 'latency', but I do know about these similar topics: 'latest'
15:55 doekia ~latest | doekia
15:55 glusterbot doekia: The latest version is available at http://download.gluster.org/p​ub/gluster/glusterfs/LATEST/ . There is a .repo file for yum or see @ppa for ubuntu.
16:04 rahulcs joined #gluster
16:04 ricky-ticky joined #gluster
16:07 Pavid7 joined #gluster
16:07 lmickh joined #gluster
16:07 toad_ joined #gluster
16:08 daMaestro joined #gluster
16:08 sprachgenerator joined #gluster
16:09 an_ joined #gluster
16:16 ricky-ticky joined #gluster
16:17 Guest19520 joined #gluster
16:20 saurabh joined #gluster
16:25 rwheeler joined #gluster
16:28 glusterbot New news from resolvedglusterbugs: [Bug 1003626] java.lang.NullPointerException is raised during execution of multifilewc mapreduce job <https://bugzilla.redhat.co​m/show_bug.cgi?id=1003626>
16:28 semiosis :O
16:29 semiosis anyone see this?  https://github.com/spajus/libgfapi-ruby
16:29 glusterbot Title: spajus/libgfapi-ruby · GitHub (at github.com)
16:37 kanagaraj joined #gluster
16:38 jclift semiosis: Interesting.  We should probably create a "Language Bindings" page on the wiki to list the various integration things
16:38 johnbot11 joined #gluster
16:41 rossi_ joined #gluster
16:42 johnbot11 joined #gluster
16:46 YazzY joined #gluster
16:46 YazzY joined #gluster
16:47 failshell joined #gluster
16:49 semiosis jclift: great idea.  i hereby deputize you to edit the wiki ;)
16:49 B21956 joined #gluster
16:50 semiosis jclift: or if you need me to do it, i can
16:54 jclift semiosis: Go for it.  I have to get down the street in a min, and not feeling well. (not a productive day :/)
16:54 semiosis ok will do.  feel better
16:58 glusterbot New news from resolvedglusterbugs: [Bug 916371] Hadoop benchmark TestDFSIO task fails with "java.io.IOException: cannot create parent directory: /mnt/gluster-hdfs0/benchmarks/TestDFSIO/io_data" <https://bugzilla.redhat.com/show_bug.cgi?id=916371> || [Bug 927284] Code Hygiene - GlusterFileSystem.lock() is an empty method <https://bugzilla.redhat.com/show_bug.cgi?id=927284> || [Bug 927288] Code Hygiene - GlusterFileSystem.re
16:59 shruti joined #gluster
16:59 shruti left #gluster
17:00 an_ joined #gluster
17:01 vpshastry joined #gluster
17:05 theron_ joined #gluster
17:06 semiosis http://www.gluster.org/community/docu​mentation/index.php/Language_Bindings
17:06 glusterbot Title: Language Bindings - GlusterDocumentation (at www.gluster.org)
17:06 unlocksmith joined #gluster
17:11 mattappe_ joined #gluster
17:25 kanagaraj joined #gluster
17:29 rahulcs joined #gluster
17:32 Mo_ joined #gluster
17:33 rahulcs joined #gluster
17:36 rossi_ joined #gluster
17:38 calum_ joined #gluster
17:52 rahulcs joined #gluster
17:52 rossi_ joined #gluster
17:52 edward1 joined #gluster
17:55 edward1 joined #gluster
17:59 shyam left #gluster
17:59 failshel_ joined #gluster
18:03 failshell joined #gluster
18:08 cjanbanan joined #gluster
18:09 harish_ joined #gluster
18:13 pvh_sa joined #gluster
18:14 Matthaeus joined #gluster
18:19 cfeller joined #gluster
18:19 zaitcev joined #gluster
18:21 mattapperson joined #gluster
18:28 harish_ joined #gluster
18:29 aixsyd joined #gluster
18:29 aixsyd hey guys I'm getting a ton of the following on one my clusters: "no active sinks for performing self-heal"
18:31 aixsyd halp D:
18:32 JoeJulian That means that the file shows that there are pending changes from the active replica that are needed on the inactive replica.
18:33 aixsyd gotcha - so it IS currently healing? but i'm seeing no activity lights on either node's hard drives, and IOtop on each is only reading in kb/s
18:34 JoeJulian No, it's not.
18:34 aixsyd no writes, according to iotop - maybe a few b/s
18:34 JoeJulian Because it's not connected to the brick that needs healing.
18:34 semiosis aixsyd: you have a brick down
18:34 JoeJulian Or if it's not down, it's blocked from the client.
18:34 aixsyd volume status is showing that theyre both online and connected
18:35 ThatGraemeGuy joined #gluster
18:35 aixsyd both nodes say both bricks are online
18:35 semiosis aixsyd: "no active sinks for performing self-heal" and "both nodes say both bricks are online" cant both be right
18:36 aixsyd lemme copypasta
18:36 semiosis what log produced "no active sinks for performing self-heal"?  a client log or a shd log?
18:36 aixsyd shd log
18:37 aixsyd http://fpaste.org/81953/13938717/
18:37 glusterbot Title: #81953 Fedora Project Pastebin (at fpaste.org)
18:37 semiosis kill the shd & restart glusterd on that server, that should restart the shd, which should give you better logs
18:37 aixsyd roger that
18:38 an_ joined #gluster
18:39 aixsyd still no activity lights on either node
18:39 JoeJulian That phrase doesn't work internationally... rogering something in England does not mean acknowledgment.
18:39 aixsyd JoeJulian:  LOL
18:40 Pavid7 joined #gluster
18:41 aixsyd new shd log: http://fpaste.org/81954/38720541/
18:41 glusterbot Title: #81954 Fedora Project Pastebin (at fpaste.org)
18:43 aixsyd this is really strange
18:43 aixsyd youd think if it was healing, the drives would be reading and or writing
18:45 vpshastry joined #gluster
18:51 rotbeard joined #gluster
19:00 diegows joined #gluster
19:03 JoeJulian aixsyd: line 2 I think shows that glustershd was not stopped. "pkill -f glustershd" then restart glusterd.
19:03 pvh_sa joined #gluster
19:03 JoeJulian That log should show it connecting to all the bricks.
19:04 aixsyd roger that
19:04 aixsyd er.. affirmative
19:05 aixsyd http://fpaste.org/81958/13938735/
19:05 aixsyd oy.
19:05 glusterbot Title: #81958 Fedora Project Pastebin (at fpaste.org)
19:06 aixsyd now it says SHD is not running on localhost, even though i set the daemon to on
19:07 aixsyd now its on, after a gluster-server restart
19:09 aixsyd still getting no activity lights, still readin and writing at sub-MB/s on both nodes, and getting the no active sinks again in the logs
19:16 tdasilva semiosis: can you add https://github.com/gluster/libgfapi-python to the language bindings page?
19:16 glusterbot Title: gluster/libgfapi-python · GitHub (at github.com)
19:16 RayS joined #gluster
19:21 aixsyd JoeJulian: it still isnt healing
19:22 chirino_m joined #gluster
19:22 aixsyd after stopping the VMs running on that cluster, theres now no IO activity in iotop at all.
19:27 aixsyd this is quite crappy :(
19:30 rahulcs joined #gluster
19:33 aixsyd well, i'm definitely stuck.
19:34 semiosis tdasilva: anyone can, even you!  but sure i'll do it for you
19:34 aixsyd nothing seems to be crawling now, either
19:35 tdasilva semiosis: just realized that after sending that msg...
19:35 semiosis done
19:35 tdasilva cool, thanks!
19:36 semiosis yw
19:36 semiosis johnmark is going to have a fit when he sees all these github links :(
19:36 aixsyd i just started a VM off this cluster, and node #1 is reading and writing at full speeds - node #2 is just dead. no reads or writes.
19:37 vu joined #gluster
19:37 semiosis aixsyd: is iptables allowing ,,(ports)?
19:37 glusterbot aixsyd: glusterd's management port is 24007/tcp and 24008/tcp if you use rdma. Bricks (glusterfsd) use 24009 & up for <3.4 and 49152 & up for 3.4. (Deleted volumes do not reset this counter.) Additionally it will listen on 38465-38467/tcp for nfs, also 38468 for NLM since 3.3.0. NFS also depends on rpcbind/portmap on port 111 and 2049 since 3.4.
19:38 aixsyd iptables is disabled
19:38 aixsyd again, volume status is saying both bricks are online
19:39 aixsyd i can ping each node from eachother by name, etc
19:40 aixsyd at this point i'm ready to kill the offending brick and do a complete heal - i dont know what else to check or try
19:40 semiosis can you connect to the glusterd & brick ports with telnet?
19:40 aixsyd one sec
19:40 Yaz joined #gluster
19:41 aixsyd yep
19:41 Yaz Hello, anyone here have experience with infiniband and glusterfs?
19:41 semiosis hello
19:41 glusterbot semiosis: Despite the fact that friendly greetings are nice, please ask your question. Carefully identify your problem in such a way that when a volunteer has a few minutes, they can offer you a potential solution. These are volunteers, so be patient. Answers may come in a few minutes, or may take hours. If you're still in the channel, someone will eventually offer an answer.
19:41 semiosis Yaz: ^^
19:42 samppah <offtopic> last night I was watching the academy awards and wondering if danmons and glusterfs is "behind" the special effects of any of those movies.. eventually i fell asleep on the couch and had a dream that JoeJulian was awarded for his contribution to the gluster community.. it was weird as those thank you speeches got mixed into that dream </offtopic>
19:42 Yaz How sensitive is glusterfs to latency
19:42 semiosis samppah: rofl!
19:42 Yaz it looks like 40Gb infiniband costs are almost on par with 10GBe
19:43 semiosis Yaz: replication is sensitive to latency because it involves several round trips between the client & the servers
19:43 semiosis Yaz: this is amortized for larger writes
19:43 semiosis and reads
19:44 aixsyd semiosis: any harm to kill this brick and remake it and resilver?
19:44 Yaz hmmm
19:44 Yaz Thanks semiosis
19:44 aixsyd i just dunno which is a longer process - let it sit here and hope it eventually heals or kill it and reheal
19:45 aixsyd ((meanwhile, i have servers down, and management bitching))
19:46 semiosis Yaz: yw
19:46 semiosis aixsyd: hard to say
19:46 semiosis aixsyd: try making a new client mount point & creating a file in it.  check the log for that mount
19:46 aixsyd semiosis: this is just not good all around. nothing went down, nothing changed - and for a whole serves brick to just go FUBAR? that doesnt sound good.
19:46 semiosis sounds like you have some kind of connection problem, a new client mount should reveal this if it is the case
19:46 aixsyd alright. one sec
19:47 mkzero joined #gluster
19:48 aixsyd i can create a new file through the mountpoint and it appears on both bricks
19:49 semiosis did/can you do this on the server with the shd problem?
19:50 aixsyd not sure I follow - i created a file on a server that mounts to the cluster and the file shows up in both nodes bricks, including the one that has the SHD problem
19:51 aixsyd i'm wgetting a debian ISO image, and as its downloading to the mount point, iotop shows its transfering on both nodes of the cluster
19:53 aixsyd should i try to wget the iso directly to node #1, then issue a heal command and see if it shows up on #2?
19:57 mattapperson joined #gluster
20:01 semiosis aixsyd: no, try the test mount on the server with the shd problem
20:02 aixsyd mount the server to itself?
20:05 semiosis yes
20:12 aixsyd semiosis: got it.
20:12 semiosis ??
20:12 aixsyd you were right. it was an IB connection issue
20:12 semiosis woot!
20:12 aixsyd semiosis: 1 aixsyd: 0
20:12 semiosis haha
20:13 semiosis did the test mount log help you determine that?
20:13 aixsyd yeper
20:13 semiosis cool
20:14 aixsyd lets put it this way, both nodes are reading like a mofo now
20:14 aixsyd so i assume a crawl. no vms are live
20:15 aixsyd JoeJulian: whatever happened to your proposed gluster volume crawlinfo command?
20:23 chirino joined #gluster
20:23 larsks Is there reference documentation out there for libgfapi?
20:25 cjanbanan joined #gluster
20:28 semiosis larsks: not afaik.  there is the API, which you can find here: https://github.com/gluster/glust​erfs/blob/master/api/src/glfs.h
20:28 glusterbot Title: glusterfs/api/src/glfs.h at master · gluster/glusterfs · GitHub (at github.com)
20:29 semiosis larsks: as for the functions themselves, you might want to look up the corresponding syscalls in linux
20:30 larsks semiosis: Thanks.  I'll go back to reading through the header file :)
20:30 semiosis larsks: for example, to find out about glfs_stat functions, see 'man 2 stat'
20:31 larsks semiosis: Yup.  I was just noticing that the gfapi stuff parallels the standard syscalls.  Thanks again.
20:31 semiosis yw
20:32 semiosis larsks: curious, what are you going to use libgfapi for?
20:33 larsks Mostly curiousity right now.  I've started looking at things like the libvirt/gfapi integration and just wanted to understand how things worked.
20:56 andreask joined #gluster
21:12 JoeJulian samppah: Just got in my office and started reading scrollback. That's awesome! :D
21:14 JoeJulian aixsyd: After you did the pkill and restart of gluster-server you never posted a good log that showed it connecting (or not) to each brick.
21:14 aixsyd JoeJulian: I figured it out - it was a bad IB connection between the nodes. that, or opensm failed
21:14 JoeJulian Nice!
21:15 aixsyd rebooting the offending node showed that no ping connection to the other node was possible. checked the cables, and then restarted opensm, and it all works. both nodes are reading like a mofo
21:15 JoeJulian I love it when stuff gets figured out.
21:15 aixsyd which brought me to this question
21:15 aixsyd JoeJulian: whatever happened to your proposed gluster volume crawlinfo command?
21:16 JoeJulian I've made so many suggestions, I have no idea where they all end up.
21:16 aixsyd LOL
21:17 aixsyd that seems like a good one, though. cause i have NO idea what its doing, just that both nodes are reading their own disks at break-neck speeds
21:17 JoeJulian That's why I did my recent splitmount tool. I've asked for a way of managing split-brain from the client since 2.0. When I finally thought up a way with existing tools, I had to make it happen.
21:18 aixsyd noice
21:22 Matthaeus joined #gluster
21:25 chirino_m joined #gluster
21:41 mattappe_ joined #gluster
21:44 tdasilva left #gluster
21:46 mattappe_ joined #gluster
21:50 Philambdo joined #gluster
21:50 pvh_sa joined #gluster
21:54 fooooooooo123 joined #gluster
21:54 fooooooooo123 purpleidea: wow puppet-gluster is sooo cool!
21:59 purpleidea fooooooooo123: thanks!
21:59 fooooooooo123 fooooooooo123: i gotta go! later
21:59 purpleidea bye
22:04 chirino joined #gluster
22:09 JoeJulian +1
22:11 fidevo joined #gluster
22:14 seapasulli joined #gluster
22:20 delhage joined #gluster
22:24 kris joined #gluster
22:25 cjanbanan joined #gluster
22:43 bennyturns joined #gluster
22:46 elyograg joined #gluster
22:47 elyograg I started a rebalance on my recently upgraded cluster today (3.4.2).  A short time later, we started getting errors in the logs, and now I find that there are TONS of files showing up in the 'heal info' output.
22:48 elyograg well, dozens.  I guess TONS is a bit of an overstatement.
22:48 elyograg anywhere from 13 to 78 entries on each brick.  all 32 bricks in the volume have entries.
22:49 elyograg It's going to take forever to manually fix these.
22:53 elyograg I did not check the heal info before i started the rebalance, but as of a couple of days ago, it was empty.
22:55 elyograg oh, crap.  a lot of these are gfids, not paths.  what do I do for those?
22:56 chirino_m joined #gluster
22:58 JoeJulian Are they just in "heal info" or "heal info heal-failed"/"heal info split-brain"?
23:00 elyograg looks like they are in heal-failed (or at least some are, hard to compare the lists)
23:00 mattappe_ joined #gluster
23:00 elyograg split-brain too.
23:01 elyograg looks like there are more in split-brain than info.
23:01 Matthaeus1 joined #gluster
23:01 elyograg this thing is just giving me fits.
23:01 elyograg i feel like I must have done something wrong, but I don't know what it could have been.
23:01 JoeJulian Are the split-brains old?
23:02 elyograg As of a couple of days ago, all these lists were clear.  I didn't think to check them before I started the re4balance.
23:02 JoeJulian Maybe you've already fixed them and they're just still in the log is what I'm hoping for.
23:02 JoeJulian I don't blame you. I might have done the same thing.
23:03 elyograg I don't know whether this stuff is newly copied data or stuff that was already on there.
23:04 elyograg The one file that I have fixed so far is dated Jan 7 2012, but that probably had its date set from the original file before it was put on gluster.
23:05 JoeJulian Mmm... maybe see if it's in a log file somewhere?
23:05 mattappe_ joined #gluster
23:05 * JoeJulian plugs logstash again for log aggregation and search.
23:07 elyograg I was first made aware of the problem because of email alerts generated by xymon from monitoring the logs.
23:07 elyograg looks like all of those messages are just gfid entries.
23:08 elyograg glustershd.log
23:08 elyograg .2014-03-03 22:47:45.317563] E [afr-self-heal-common.c:197:a​fr_sh_print_split_brain_log] 0-mdfs-replicate-3: Unable to self-heal contents of '<gfid:5e7d6dd3-532d-4a7c-8272-4d9aedbfb87e>' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix:  [ [ 0 1 ] [ 1 0 ] ]
23:09 johnbot1_ joined #gluster
23:18 elyograg some of the data that is showing these problems was put on the system in october last year.  it wasn't on the heal reports after the gluster upgrade.  that data does not get changed.
23:21 elyograg the files I've checked that are on the info report seem to be fine, available through the mount.
23:21 elyograg W
23:23 johnbot11 joined #gluster
23:23 elyograg some of the entries on the report are directries with thousands of files.  how does one go about fixing that?
23:24 elyograg today feels like two steps forward, six steps back.
23:24 JoeJulian I hear you.
23:25 JoeJulian Use splitmount. Check the stats and sha1sum between the split-brain files. I'm hoping they're identical.
23:25 elyograg I used splitmount to fix the first file on the first brick.  both copies were identical using diff to compare them.
23:26 elyograg the next entry is a directory, and I don't know what to do about that.
23:28 zerick joined #gluster
23:28 FrodeS splitmount? nifty... :)
23:36 elyograg hmm.  the second copy of that first file hasn't come back.  if I remove one and then stat the file through the mount, shouldn't it once again show up on both of the splitmounts?
23:37 JoeJulian Yes, but maybe not right away.
23:37 JoeJulian ... I should probably mount with different timeout options.
23:38 elyograg ok.  well, the stat should have healed it on the actual brick back end.
23:39 JoeJulian right
23:39 elyograg I'm afraid to do anything, and overwhelmed by the amount of @!$@! that I have to look into and fix now.
23:39 velladecin joined #gluster
23:39 JoeJulian The client log where you ran stat is another resource you can check that at.
23:40 velladecin Hi all, hope you're having a good day/night :)
23:41 JoeJulian I would take the heal info outputs, massage that into a list that you can run things against, such as diff. If everything checks out, I would probably just zero out the xattrs on one side. Unfortunately, that's not something that can be done in splitmount, but has to be done on the bricks.
23:43 elyograg what do I do about the entries that are directories, and the ones that are gfid?
23:44 JoeJulian directories, I'd definitely zero the trusted.afr attributes (or remove them entirely).
23:45 mattapperson joined #gluster
23:47 diegows joined #gluster
23:49 elyograg sitting in the splitmount mount location, I am doing this: for i in $(cat ~/heal-doclist) ; do diff -u */$i ; done
23:50 elyograg no output.
23:50 MacWinner joined #gluster
23:52 JoeJulian At least that's good.
23:53 elyograg entries are still there, but I guess I can delete the copy on r2 for all those files without worrying too much.
23:54 JoeJulian yep
23:54 elyograg directories and gfid entries will still remain even after they clear out.
23:55 elyograg the info split-brain output has even more entries than that info report does.
23:55 JoeJulian Right. directories I'd "setfattr -z trusted.afr.{whatever}" where whatever is gained from "getfattr -m trusted.afr $file" (all this is done on the brick).
23:56 JoeJulian I might even go so far as to script that, and use a "find -type d" to implement it.
23:57 elyograg grabbing a similar file list from the split-brain report, those also diff OK.
23:58 JoeJulian I wish I could tell you what happened. Rebalance tests are done as part of the build tests and (obviously) this hasn't come up.

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary