Perl 6 - the future is here, just unevenly distributed

IRC log for #gluster, 2014-02-20

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:03 JoeJulian james__: And those were working before the upgrade?
00:03 james__ the OL 6.5 servers are new, they weren't upgraded from anything
00:04 james__ ive also tried downgrading the glusterfs package versions
00:04 james__ same warnings
00:05 JoeJulian Format and install CentOS! ;)
00:05 JoeJulian j/k
00:05 james__ actually scratch that i dont get the warnings with downgraded packages
00:05 cjanbanan joined #gluster
00:05 james__ but it still fails
00:07 JoeJulian I'm not sure why something is sending a SIGTERM to the fuse client, but that's what "received signum (15), shutting down" means.
00:08 james__ that's the final msg in the log after every mount fail
00:08 james__ i dunno maybe it has something to do with oracle linux's unbreakable enterprise kernel
00:08 JoeJulian Right. And that signal comes from somewhere. sigwaiter is just a signal hook.
00:09 JoeJulian Maybe up the dmesg logging to debug and see if there's anything there.
00:11 JoeJulian Seems like it would be worth booting into the Red Hat Compatible Kernel to find out...
00:12 james__ i dont want to do anything too crazy on a production server hah
00:13 james__ more like i cant reboot it right now, it's running our oracle rac
00:18 mattapperson joined #gluster
00:22 mattapperson joined #gluster
00:25 nightwalk joined #gluster
00:27 aquagreen joined #gluster
00:27 B21956 joined #gluster
00:36 mattapperson joined #gluster
00:38 nightwalk joined #gluster
00:39 cp0k joined #gluster
00:44 mattapperson joined #gluster
00:49 mattapperson joined #gluster
00:51 plarsen joined #gluster
00:52 vpshastry joined #gluster
01:01 cjanbanan joined #gluster
01:06 mattappe_ joined #gluster
01:10 sarkis joined #gluster
01:11 bala joined #gluster
01:13 cp0k joined #gluster
01:15 cjanbanan joined #gluster
01:18 mattappe_ joined #gluster
01:18 sroy joined #gluster
01:19 mattappe_ joined #gluster
01:30 shyam joined #gluster
01:34 mattappe_ joined #gluster
01:38 mattap___ joined #gluster
01:44 cp0k joined #gluster
01:52 kdhananjay joined #gluster
02:01 tokik joined #gluster
02:02 mattappe_ joined #gluster
02:03 nightwalk joined #gluster
02:03 haomaiwa_ joined #gluster
02:19 delhage joined #gluster
02:21 mattappe_ joined #gluster
02:22 mattappe_ joined #gluster
02:30 cp0k joined #gluster
02:36 mattapperson joined #gluster
02:38 aquagreen joined #gluster
02:38 mattappe_ joined #gluster
02:40 Frankl joined #gluster
02:44 Frankl All, we have a node which had some kind of hardware issue, we let it be offline for some days, when the node comes back online, the node's disk io is very intensive (iostat -x shows the util is 100%), at the same time, clients report that the volumes all hang, it seems the self-heal DOS the cluster. How could I make the node back online more gracefully?
02:47 gdubreui joined #gluster
02:51 bharata-rao joined #gluster
02:59 45PAAHQ9Z joined #gluster
03:15 glusterbot New news from newglusterbugs: [Bug 1067256] ls on some directories takes minutes to complete <https://bugzilla.redhat.com/show_bug.cgi?id=1067256>
03:17 inodb joined #gluster
03:17 cp0k joined #gluster
03:20 shubhendu joined #gluster
03:30 nightwalk joined #gluster
03:31 dusmant joined #gluster
03:33 mattappe_ joined #gluster
03:39 itisravi joined #gluster
03:40 mattapperson joined #gluster
03:41 mattappe_ joined #gluster
03:42 kdhananjay joined #gluster
03:46 semiosis joined #gluster
03:55 neurodrone__ joined #gluster
03:56 cp0k joined #gluster
03:56 haomaiwang joined #gluster
04:02 askb joined #gluster
04:04 KaZeR_ joined #gluster
04:15 davinder joined #gluster
04:20 hagarth joined #gluster
04:22 saurabh joined #gluster
04:24 shylesh joined #gluster
04:24 semiosis hagarth: attribute-timeout=0 solved my problem (bug 1065705).  thanks!!
04:24 glusterbot Bug https://bugzilla.redhat.com:443/show_bug.cgi?id=1065705 unspecified, unspecified, ---, vraman, CLOSED NOTABUG, ls & stat give inconsistent results after file deletion by another client
04:26 hagarth semiosis: good to know!
04:27 KaZeR_ joined #gluster
04:27 hagarth Frankl: you can possibly enable least-prio-threads with io-threads for better behavior with self-healing
04:28 hagarth Frankl: another related article http://www.andrewklau.com/controlling-glusterfsd-cpu-outbreaks-with-cgroups/
04:28 glusterbot Title: Controlling glusterfsd CPU outbreaks with cgroups | (at www.andrewklau.com)
04:29 glusterbot New news from resolvedglusterbugs: [Bug 1065705] ls & stat give inconsistent results after file deletion by another client <https://bugzilla.redhat.com/show_bug.cgi?id=1065705>
04:32 cp0k joined #gluster
04:32 semiosis joined #gluster
04:34 semiosis_ joined #gluster
04:36 semiosis joined #gluster
04:42 satheesh1 joined #gluster
04:50 hagarth joined #gluster
04:51 dusmant joined #gluster
04:52 prasanth joined #gluster
04:53 ndarshan joined #gluster
04:56 shyam joined #gluster
04:57 bala joined #gluster
04:57 ajha joined #gluster
05:02 ppai joined #gluster
05:05 ndarshan joined #gluster
05:08 kanagaraj_ joined #gluster
05:10 harish joined #gluster
05:14 rjoseph joined #gluster
05:17 KaZeR joined #gluster
05:19 JMWbot joined #gluster
05:19 JMWbot I am JMWbot, I try to help remind johnmark about his todo list.
05:19 JMWbot Use: JMWbot: @remind <msg> and I will remind johnmark when I see him.
05:19 JMWbot /msg JMWbot @remind <msg> and I will remind johnmark _privately_ when I see him.
05:19 JMWbot The @list command will list all queued reminders for johnmark.
05:19 JMWbot The @about command will tell you about JMWbot.
05:20 neurodrone__ joined #gluster
05:23 cjanbanan joined #gluster
05:26 _NiC joined #gluster
05:31 aravindavk joined #gluster
05:35 Frankl hagarth: thanks for your nice reply. least-prio-threads is used for self-healing?
05:40 rastar joined #gluster
05:41 lalatenduM joined #gluster
05:44 vpshastry1 joined #gluster
05:47 Frankl hagarth: I remember least-prio-threads is used for rebalance?
05:47 semiosis joined #gluster
05:47 _NiC joined #gluster
05:49 hagarth Frankl: yes, least-prio-threads is used for rchecksum used in self-healing
05:49 hagarth and other self-heal fops too
05:50 semiosis_ joined #gluster
05:51 semiosis joined #gluster
05:54 askb joined #gluster
05:57 eshy joined #gluster
05:59 inahandizha joined #gluster
05:59 mohankumar joined #gluster
05:59 cjanbanan joined #gluster
06:02 inahandizha left #gluster
06:05 raghu joined #gluster
06:10 Frankl hagarth: thanks, I should set it to very small value? by default least-prio-threads is 16, could I set it to small value such as 2
06:11 dusmant joined #gluster
06:12 hagarth joined #gluster
06:27 JoeJulian Frankl: My guess is that you're hitting a different problem than the one hagarth was thinking. Depending on how many unique files your clients are trying to access, they may be exceeding the background self-heal queue, at which point they'll hang until the heal is finished.
06:28 JoeJulian The default background self-heal queue is 16.
06:29 Frankl JoeJulian: What I saw was the disk utilization is very high in the buggy node while not in other normal nodes
06:30 Frankl JoeJulian: could I tune the background self-heal queue?
06:30 JoeJulian On a vm host where I have 30 vm images open, this could cause 24 of them to hang until the queue is available again.
06:30 JoeJulian Yes, there's a volume setting for that. cluster.background-self-heal-count
06:31 JoeJulian er, 14... not 24...
06:31 JoeJulian I don't think I should be doing maths tonight. :D
06:32 benjamin_____ joined #gluster
06:34 Frankl Joejulian: my glusterfs is 3.3, I don't think there is such a option named cluser.backgroup-self-heal-count, I could not get it from gluster v set help
06:34 JoeJulian Try setting it anyway. I think it might have been undocumented in 3.3
06:35 Frankl good I will try to enlarge the value. thanks Joejulian
06:41 vimal joined #gluster
06:57 spandit joined #gluster
07:15 glusterbot New news from newglusterbugs: [Bug 1067291] Description of the port numbers are incorrect in Troubleshooting page <https://bugzilla.redhat.com/show_bug.cgi?id=1067291>
07:20 jtux joined #gluster
07:24 jtux joined #gluster
07:28 rossi_ joined #gluster
07:29 RameshN joined #gluster
07:41 ctria joined #gluster
07:43 ekuric joined #gluster
07:45 mgebbe_ joined #gluster
07:46 mgebbe_ joined #gluster
07:46 16WAARPDM joined #gluster
07:46 mgebbe_ joined #gluster
07:54 vpshastry joined #gluster
07:57 rgustafs joined #gluster
07:58 ccha2 when volume heal info take long time and return nothing, what you should do ?
07:59 abhi_ joined #gluster
08:05 abhi_ I am setting up geo sync. This is the error I get on the master. http://paste.ubuntu.com/6964192/
08:05 glusterbot Title: Ubuntu Pastebin (at paste.ubuntu.com)
08:05 tziOm joined #gluster
08:05 eseyman joined #gluster
08:06 cjanbanan joined #gluster
08:10 kdhananjay joined #gluster
08:10 franc joined #gluster
08:10 franc joined #gluster
08:16 tziOm joined #gluster
08:16 tokik joined #gluster
08:17 cjanbanan joined #gluster
08:18 ngoswami joined #gluster
08:20 abhi_ aravindavk: any ideas
08:22 aravindavk abhi_, checking
08:23 keytab joined #gluster
08:23 abhi_ aravindavk: my master is behind a firewall, only port 22 is open for the world on both slave and master.
08:25 meghanam joined #gluster
08:25 aravindavk abhi_, that is fine, it should work.. checking the traceback
08:26 tziOm joined #gluster
08:27 abhi_ aravindavk: hey, just checked, master cluster is running 3.4.1 and slave cluster is running 3.4.2.
08:30 aravindavk abhi_, ok, going out now, will drop mail later
08:32 andreask joined #gluster
08:41 tziOm joined #gluster
08:49 fsimonce joined #gluster
08:50 aquagreen joined #gluster
08:50 meghanam joined #gluster
08:52 vpshastry joined #gluster
08:55 tziOm joined #gluster
08:55 ppai joined #gluster
09:06 tziOm joined #gluster
09:09 hybrid512 joined #gluster
09:11 liquidat joined #gluster
09:15 X3NQ joined #gluster
09:19 aravindavk joined #gluster
09:20 tziOm joined #gluster
09:22 ajha joined #gluster
09:27 abhi_ aravindavk: back ?
09:28 bharata_ joined #gluster
09:30 abhi_ aravindavk: on my client http://paste.ubuntu.com/6964494/
09:30 glusterbot Title: Ubuntu Pastebin (at paste.ubuntu.com)
09:30 tziOm joined #gluster
09:32 shubhendu joined #gluster
09:43 hagarth joined #gluster
09:45 tziOm joined #gluster
09:46 ndarshan joined #gluster
09:49 meghanam joined #gluster
09:55 tziOm joined #gluster
09:57 dusmant joined #gluster
10:00 bharata__ joined #gluster
10:01 mohankumar__ joined #gluster
10:15 tziOm joined #gluster
10:26 ccha2 how can you getfattr on a symlink ?
10:29 Frankl_ joined #gluster
10:30 abhi_ ccha2: are you referring to my traceback
10:32 ccha2 abhi_: nope
10:33 ccha2 ok same thing with -h
10:36 ccha2 hum I create a file on my replicated volume and getfattr -d -e hex -m- test seems look missing attr
10:37 ccha2 # file: test
10:37 ccha2 trusted.gfid=0x46a25d6ddc7744b09d01c5ac74c735f1
10:37 ccha2 trusted.glusterfs.76053997-bcb1-4bee-8e8d-96a7aa74cf48.xtime=0x5305da31000b256e
10:38 ccha2 mising trusted.afr.MyVOL-client-0 right ?
10:38 ccha2 but the file is well replicated
10:40 dusmant joined #gluster
10:42 tziOm joined #gluster
10:43 ndarshan joined #gluster
10:47 ajha joined #gluster
10:48 rastar joined #gluster
10:50 aravindavk joined #gluster
10:51 aravindavk abhi_, did you get chance to try new geo-rep in 3.5?
10:52 abhi_ 3.4 is in production for me. How difficult is it to migrate
10:52 tziOm joined #gluster
10:53 abhi_ aravindavk:
11:04 social kkeithley: I'm looking into 1063832 and I don't think it'll have any easy fix, add-brick happends on server level and the path is created in glusterd-utils.c in glusterd_validate_and_create_brickpath and the permissions are being set in recinfigure in posix.c xlator
11:19 hagarth joined #gluster
11:21 mgebbe_ joined #gluster
11:23 tziOm joined #gluster
11:34 aquagreen joined #gluster
11:36 tziOm joined #gluster
11:38 burn420 joined #gluster
11:46 marcoceppi joined #gluster
11:46 marcoceppi joined #gluster
11:49 RicardoSSP joined #gluster
11:49 ira joined #gluster
11:50 marcoceppi joined #gluster
11:50 marcoceppi joined #gluster
11:50 tdasilva joined #gluster
11:58 NuxRo ndevos: any ideas re yesterday's errors?
12:00 jag3773 joined #gluster
12:01 marcoceppi joined #gluster
12:01 itisravi_ joined #gluster
12:06 tziOm joined #gluster
12:09 edward2 joined #gluster
12:12 pk1 joined #gluster
12:13 hagarth joined #gluster
12:16 tziOm joined #gluster
12:35 tziOm joined #gluster
12:36 kdhananjay joined #gluster
12:37 pk1 joined #gluster
12:38 rgustafs joined #gluster
12:46 bazzles joined #gluster
12:58 Slash joined #gluster
12:58 kkeithley1 joined #gluster
13:03 sputnik13 joined #gluster
13:18 glusterbot New news from newglusterbugs: [Bug 1041109] structure needs cleaning <https://bugzilla.redhat.com/show_bug.cgi?id=1041109>
13:20 B21956 joined #gluster
13:23 portante joined #gluster
13:32 psyl0n_ joined #gluster
13:39 ^rcaskey is there ever a reason to use a RAID with gluster (other than just needing to put a brick on a pre-existing volume as a concession to practicality)
13:40 pk1 left #gluster
13:45 benjamin_____ joined #gluster
13:46 plarsen joined #gluster
13:46 neurodrone__ joined #gluster
13:50 mattappe_ joined #gluster
13:52 Slash__ joined #gluster
13:52 shyam1 joined #gluster
13:53 _VerboEse joined #gluster
13:53 davinder joined #gluster
13:53 cyberbootje1 joined #gluster
13:53 dusmantkp_ joined #gluster
13:53 spandit_ joined #gluster
13:54 X3NQ joined #gluster
13:54 jtux1 joined #gluster
13:54 JonathanS joined #gluster
13:55 tru_tru_ joined #gluster
13:55 hagarth1 joined #gluster
13:56 juhaj_ joined #gluster
13:56 sarkis_ joined #gluster
13:56 jag3773 joined #gluster
13:58 mattapperson joined #gluster
13:58 JoeJulian_ joined #gluster
14:00 sroy_ joined #gluster
14:01 georgeh|workstat joined #gluster
14:02 [o__o] joined #gluster
14:05 quique joined #gluster
14:11 aquagreen1 joined #gluster
14:11 semiosis joined #gluster
14:11 mohankumar__ joined #gluster
14:12 mtanner_ joined #gluster
14:12 borreman_123 joined #gluster
14:12 benjamin_____ joined #gluster
14:12 harish_ joined #gluster
14:13 mgebbe__ joined #gluster
14:13 JonathanD joined #gluster
14:13 ctria joined #gluster
14:15 tjikkun_work joined #gluster
14:15 jiffe98 joined #gluster
14:16 tziOm joined #gluster
14:16 junaid joined #gluster
14:16 james__ joined #gluster
14:16 semiosis joined #gluster
14:16 X3NQ_ joined #gluster
14:17 badone joined #gluster
14:18 rwheeler joined #gluster
14:22 prasanth joined #gluster
14:22 burnalot joined #gluster
14:23 dusmantkp_ joined #gluster
14:39 ilbot3 joined #gluster
14:39 Topic for #gluster is now Gluster Community - http://gluster.org | Patches - http://review.gluster.org/ | Developers go to #gluster-dev | Channel Logs - https://botbot.me/freenode/gluster/ & http://irclog.perlgeek.de/gluster/
14:40 andreask joined #gluster
14:44 benjamin_ joined #gluster
14:49 dbruhn joined #gluster
14:58 johnmilton joined #gluster
15:03 rpowell joined #gluster
15:06 lalatenduM joined #gluster
15:09 purpleidea ^rcaskey: yes, use RAID (eg RAID6) under each brick so that simple drive failures don't require a whole brick rebuild...
15:10 bugs_ joined #gluster
15:11 ndevos \o/ getting there slowly: http://thread.gmane.org/gmane.comp.apache.cloudstack.devel/34076/focus=38448
15:11 glusterbot Title: Gmane Loom (at thread.gmane.org)
15:11 ndevos (support for Gluster in CloudStack, store your VMs on a gluster volume)
15:13 radez \home
15:15 ctria joined #gluster
15:18 plarsen joined #gluster
15:19 dbruhn Why does the rebalance have to be so painful
15:19 dbruhn no answer needed, just crabby
15:22 * ndevos doesnt like rebalancing either
15:28 dbruhn Seriously, first pass took my cluster down... restart, second pass, 4 of 12 servers failed because they dropped over night, third pass, the volume is unusable.
15:37 abhi_ joined #gluster
15:41 NuxRo ndevos: should i send smth to the mailing list re my current problems with ACS on gluster?
15:42 ndevos NuxRo: is that the problem you showed me yesterday, or is there something else?
15:42 ndevos the patch has been merged for 4.4, so I hope it works there...
15:43 KaZeR joined #gluster
15:43 NuxRo ndevos: it's the problems from yesterday
15:43 NuxRo mainly http://fpaste.org/78698/92834955/ i think
15:43 glusterbot Title: #78698 Fedora Project Pastebin (at fpaste.org)
15:43 NuxRo maybe current libvirt doesnt support gluster via gfapi?
15:44 NuxRo current libvirt in EL6 that is
15:45 ndevos NuxRo: that is correct, but libvirt should use fuse to mount the volume, qemu has the libgfapi support, and uses the file directly (no fuse)
15:45 NuxRo then do you have any idea why i get that error? seems more like kvm specific rather than cloudstack
15:46 ndevos NuxRo: hmm, that "Unable to read from monitor" sounds like a qemu error, do you have the /var/log/libvirt/qemu/<vm>.log?
15:47 ndevos NuxRo: the http://www.ovirt.org/Features/GlusterFS_Storage_Domain#Important_Pre-requisites apply for CloudStack too, make sure a non-root user (like kvm) can access the volume and files
15:47 glusterbot Title: Features/GlusterFS Storage Domain (at www.ovirt.org)
15:47 NuxRo let's see
15:47 jbrooks left #gluster
15:51 NuxRo ndevos: right, this is the log bit http://paste.fedoraproject.org/78937/92911366/
15:51 glusterbot Title: #78937 Fedora Project Pastebin (at paste.fedoraproject.org)
15:51 NuxRo going to check the ovirt stuff shortly
15:52 ndevos NuxRo: yeah, do that, the logs says "Gluster connection failed for server=cluster1...."
15:52 sarkis_ joined #gluster
15:52 NuxRo seems like permission problem indeed
15:53 ndevos NuxRo: likely the insecure-port options, or whatever they are called
15:53 * ndevos hits that every time again, and again, and again, and again, and ...
15:55 NuxRo the uid is not an issue i think, all kvm procs run as root
15:55 NuxRo I'll apply the insecure thingie shortly
15:57 REdOG why would heal info show 0 files when I see the file in the brick directory?
15:58 * REdOG cannot seem to follow the heal process
16:00 lmickh joined #gluster
16:00 NuxRo rpc-auth-allow-insecure is no longer an option for 3.4
16:01 ndevos NuxRo: one option should be set in the glusterd.vol file, the other is a volume option
16:04 jobewan joined #gluster
16:05 sputnik13 joined #gluster
16:13 NuxRo ndevos: if i edit one node, will rpc-auth-allow-insecure propagate to all?
16:13 NuxRo i have a production setup, dont want customers ruining my day :)
16:14 ndevos NuxRo: that option allows unprivileged users (source port < 1024) to connect to glusterd, glusterd passes the .vol file to the clients
16:15 hagarth NuxRo: gluster volume set <volname> server.allow-insecure on  is the other option
16:15 ndevos NuxRo: I'm not entirely sure what other things an unpriviledged user could do, maybe you should firewall port 24007 and only allow access from systems that need to mount volume (hopefully clients dont mount volumes directly?)
16:16 ndevos hagarth: that is required as well, how should a client otherwise get the .vol file?
16:17 ndevos well, you could pass that as option to mount.glusterfs, but I have no idea how that can be done by CloudStack
16:19 REdOG is there some way to watch the heal progress?
16:20 NuxRo hagarth: gluster volume set cloudtest server.allow-insecure on <- this worked ok, it's the rpc-auth-allow-insecure that can't be "volume set"
16:20 NuxRo gluster volume set cloudtest  rpc-auth-allow-insecure on
16:21 NuxRo volume set: failed: option : rpc-auth-allow-insecure does not exist
16:21 NuxRo Did you mean allow-insecure or rpc-auth-allow?
16:21 NuxRo ndevos: this insecure thing confuses me, since both libvirt and kvm processes run as root, why do we need it?#
16:21 hagarth NuxRo: unfortunately that requires manual editing of glusterd's volfile everywhere and needs restart of glusterd
16:21 ndevos NuxRo: thats the options that should be set in /etc/glusterfs/glusterd.vol
16:22 NuxRo hagarth: is it applied globally? i have clients using nfs, wouldnt want to compromise their volumes
16:23 ndevos NuxRo: you could verify in /var/log/glusterfs/etc-glusterfs-glusterd.log and see if there is a clearer error message, but the error from libgfapi definitely suggests that you need it
16:24 NuxRo ndevos: /etc/glusterfs/glusterd.vol only contains stuff for the management volume, is it there that i must apply rpc-auth-allow-insecure? i'm confused
16:24 NuxRo ndevos: going through the logs now
16:24 ndevos NuxRo: yes, the management volume is the functionality that glusterd provides
16:27 NuxRo ndevos: so you were right, logs show: Request received from non-privileged port. Failing request
16:28 DannyFS joined #gluster
16:28 DannyFS so I have a little performance problem here
16:28 DannyFS I have a 2 node distribute running 3.3.2
16:28 DannyFS each has 1 brick
16:29 DannyFS and that brick, running some local benchmarks can do read/writes in the 2 GB/s
16:29 DannyFS I then mount the gluster volume, and do t he same tests and end up with performance in the couple hundred MBs
16:30 DannyFS I'm sure this can be a thousand things... but are there any really common causes to this problem?
16:30 REdOG ok ive been waiting 30 minutes for a heal...I see the file in the brick. heal info still says 0. How can I tell if its working or broken? I don't see anything about it in the logs
16:30 daMaestro joined #gluster
16:30 REdOG I tried stating it in fuse mount
16:30 NuxRo ndevos: ca you confirm the vol file looks ok? http://fpaste.org/78953/13718139/
16:30 glusterbot Title: #78953 Fedora Project Pastebin (at fpaste.org)
16:31 zerick joined #gluster
16:32 NuxRo hagarth: are there any security concerns for my clients if i enable rpc-auth-allow-insecure in /etc/glusterfs/glusterd.vol ?
16:33 ndevos NuxRo: I guess the difference in security is very small if you now already have VMs where clients are root and that can connect to the storage servers
16:33 ndevos NuxRo: yes, that looks good
16:34 jag3773 joined #gluster
16:34 NuxRo ndevos: thanks. i have clients mounting volumes via nfs, does this mean non-root users on their servers will be able to mount the volums?
16:35 hagarth NuxRo: hmm, in an untrusted network somebody could make use of gluster --remote CLI to perform disruptive things (till 3.4.x). But that can be done even if somebody has root on a node outside the cluster.
16:36 NuxRo I'm on 3.4.0 now
16:36 hagarth NuxRo: unprivileged clients can mount a volume with these options - so it is better to block untrusted clients
16:36 ndevos NuxRo: nfs is unrelated, nothing will change with that
16:36 NuxRo this change is making me nervous, i wouldnt like to introduce new security issues in this environment but also wont/cant build a new cluster for cloudstack
16:37 NuxRo right, so NFS is not affected
16:37 zerick joined #gluster
16:37 NuxRo do I still need server.allow-insecure or the cloudtest volume?
16:37 NuxRo s/or/on
16:38 ndevos NuxRo: you cna firewall port 24007 (glusterd) and only allow storage servers and glusterfs-clients (fuse, gfapi) to access that
16:38 kmai007 joined #gluster
16:38 NuxRo ok, this sounds reasonable
16:38 NuxRo need to liaise with the network engineering team
16:38 ndevos NuxRo: yes, you also need server.allow-insecure on the volume, volumes(well, bricks) without that will not allow unprivileged ports
16:39 NuxRo ndevos: for NFS access only, what ports should I allow?
16:39 ndevos ~ports | NuxRo
16:39 glusterbot NuxRo: glusterd's management port is 24007/tcp and 24008/tcp if you use rdma. Bricks (glusterfsd) use 24009 & up for <3.4 and 49152 & up for 3.4. (Deleted volumes do not reset this counter.) Additionally it will listen on 38465-38467/tcp for nfs, also 38468 for NLM since 3.3.0. NFS also depends on rpcbind/portmap on port 111 and 2049 since 3.4.
16:40 NuxRo right, so NFS is 111, 2049 and 38465-38467
16:41 semiosis ,,(ports)
16:41 glusterbot glusterd's management port is 24007/tcp and 24008/tcp if you use rdma. Bricks (glusterfsd) use 24009 & up for <3.4 and 49152 & up for 3.4. (Deleted volumes do not reset this counter.) Additionally it will listen on 38465-38467/tcp for nfs, also 38468 for NLM since 3.3.0. NFS also depends on rpcbind/portmap on port 111 and 2049 since 3.4.
16:41 ndevos NuxRo: yes, those are the ports a nfs-client needs
16:41 semiosis oh ndevos just said that
16:41 NuxRo roger
16:42 NuxRo and thanks
16:42 DannyFS hey guys any ideas about the crazy low performance I mentioned earlier?
16:43 semiosis DannyFS: you really can't compare local fs performance to a cluster fs.  it's comparing apples to orchards
16:43 NuxRo DannyFS: sounds about right, there are complex operations going on over tcp, it's bound to introduce penalty
16:44 semiosis DannyFS: also what tool are you using for benchmarking?  it is multithreaded/parallel?  if so, are you running it on several client machines at the same time?
16:44 DannyFS but going down from 2 GB/s to 200 MB/s?
16:44 DannyFS iozone
16:44 DannyFS tried anything from 1 to 16 threads
16:45 DannyFS running on one client
16:45 semiosis is 200/s single thread or aggregate?
16:45 DannyFS aggregate :)
16:45 semiosis what kind of network connects these machines?
16:45 DannyFS 10 gb/s
16:45 semiosis hmm
16:46 DannyFS you know what
16:46 DannyFS hold on
16:46 dbruhn joined #gluster
16:46 * semiosis holds
16:46 DannyFS let me go test a simple copy frm one brick to the other
16:46 DannyFS brb
16:48 ndk joined #gluster
16:49 kmai007 should this fuse client log file concern me? [fuse-bridge.c:3526:fuse_setlk_cbk] 0-glusterfs-fuse: 164355073: ERR => -1 (No such file or directory)
16:50 kmai007 it doesn't tell me what file or diretory
16:50 kmai007 and it keeps logging it every 4 mins.
16:52 * REdOG thinks he's hosed his data
16:52 REdOG now nothing starts
16:54 Slash__ hello, I'm looking for performance and caching and using glusterfs, it seems that my glusterfs client (using fuse or nfs) don't cache the file to RAM, is there any way this could happen? the files are accessed several time a minute and there is plenty of ram
16:57 mattappe_ joined #gluster
16:58 daMaestro joined #gluster
16:58 dusmantkp_ joined #gluster
16:59 XpineX_ joined #gluster
16:59 rwheeler joined #gluster
17:02 mattappe_ joined #gluster
17:04 DannyFS ok ok
17:04 psyl0n joined #gluster
17:04 DannyFS so mounted the brick from node 1 on node 2
17:04 DannyFS and did a straight up copy
17:05 DannyFS and the performance was actually only 400 MB/s
17:05 ndevos NuxRo: oh, and after changing the volume option, stopping and starting the volume may be required too...
17:05 DannyFS so better than the Gluster Volume
17:05 DannyFS but not 1 GB/s :)
17:05 DannyFS I think what I'll do first is get the 40 Gb/s NICs working
17:06 DannyFS so the network between the nodes is as quick as possible
17:06 DannyFS then we'll see how performance goes
17:06 mattapperson joined #gluster
17:07 kmai007 that is 1 big straw
17:07 NuxRo ndevos: so glusterd AND volume restart
17:07 NuxRo roger
17:08 seapasulli joined #gluster
17:08 mattappe_ joined #gluster
17:11 andreask joined #gluster
17:12 mattapperson joined #gluster
17:16 mohankumar__ joined #gluster
17:16 cp0k joined #gluster
17:18 Mo__ joined #gluster
17:21 ndevos NuxRo: volume restart may not be required, if it is required, I'd say it is a bug, but I'm not sure if one is filed for it yet (or if its fixed in newer version)
17:21 kmai007 if i restart the glusterd service on the bricks will that have an affect on my mounted fuse clients?
17:22 KyleG joined #gluster
17:22 KyleG joined #gluster
17:26 mattappe_ joined #gluster
17:28 nage joined #gluster
17:28 nage joined #gluster
17:31 REdOG joined #gluster
17:34 zaitcev joined #gluster
17:37 REdOG is there a method to manually heal?
17:37 REdOG something is definietly wrogn here
17:38 REdOG what does number of entries mean?
17:38 REdOG some bricks show 1 others show 0
17:40 cp0k I have a pesky old .glusterfs.broken/ backup dir in the root of my gluster volume which appears to confuse Gluster and report a ton of split-brain, removing this dir from the FUSE mount point is taking a long long time. At the moment I am running low on available disk space. I know it is not a good idea to write to the storage bricks directly, but can I still safely do this if I do not care for this data?
17:44 mattappe_ joined #gluster
17:46 REdOG everything was working so well yesterday
17:47 REdOG now its all fubar
17:47 REdOG [2014-02-20 17:47:30.385690] W [socket.c:514:__socket_rwv] 0-management: readv failed (No data available)
17:52 jobewan joined #gluster
17:53 jobewan joined #gluster
17:53 ctria joined #gluster
17:56 jbrooks joined #gluster
17:59 rpowell kmai007:  I have never noticed any issue with currently running file accesses on the fuse client and restarting glusterd
18:00 vpshastry joined #gluster
18:00 vpshastry left #gluster
18:02 qubit joined #gluster
18:02 qubit is a fsync needed on a directory to ensure that all nodes will be able to see a new file written to a glusterfs volume?
18:03 qubit I know it is for NFS, but can't find anything concrete about glusterfs (using normal glusterfs mount, not nfs)
18:07 mattappe_ joined #gluster
18:09 kaptk2 joined #gluster
18:10 jobewan joined #gluster
18:16 semiosis REdOG: stat $file
18:16 semiosis REdOG: gluster checks if a heal is needed (and starts one if it is) whenver a file is opened/stated
18:17 semiosis kmai007: restarting glusterd should not affect clients that are already mounted
18:22 mattappe_ joined #gluster
18:23 JoeJulian qubit: Mounting with attribute-timeout=0 may be what you're looking for if you're not getting the results you expect.
18:24 cjanbanan joined #gluster
18:25 qubit I don't know what to expect. If it behaves like NFS I expect to need fsync. But it might not. just wondering :-)
18:26 JoeJulian I would expect to require fsync. That would even be true on a local filesystem.
18:26 qubit because if fsync is needed on directories, then fsync will likely also be needed on files. and I certainly don't want to set synchronous IO on the whole mount :-)
18:26 Matthaeus joined #gluster
18:27 qubit JoeJulian: it would be necessary on a local filesystem to commit to disk yes. but since you're accessing the file from the same host, it doesn't matter if it picks it up from disk, or the write cache
18:29 JoeJulian I thought I remembered otherwise, but I got up way too early and am only just now starting my coffee so I may just be off... :D
18:29 mattappe_ joined #gluster
18:29 partner huh, i've got my first beer already ;)
18:32 partner but while on the topic, please feel free to point me to a functional nagios plugin, spent my time today evaluating the existing ones, maybe i missed something but the offering wasn't exactly broad
18:33 JoeJulian Isn't there one on the forge? I thought I remembered seeing one...
18:33 REdOG semiosis: i tried that
18:33 REdOG many times
18:33 REdOG now I fear I need to rebuild half my replica
18:33 REdOG one peer won't connect now
18:34 REdOG and the one that seems to be working cant start the vms
18:34 REdOG i don't even know where to begin atm
18:34 partner there is one from kkeithley
18:35 REdOG 2014-02-20 17:59:32.475585] E [glusterd-rpc-ops.c:762:__glusterd_stage_op_cbk] 0-management: Received stage RJT from uuid: 269658e3-fed4-4899-b77c-7af9cfb37036
18:36 mattapperson joined #gluster
18:36 JoeJulian REdOG: check the log on 269658e3-fed4-4899-b77c-7af9cfb37036 to see why it rejected.
18:37 partner but probably (?) was just pushed there by him, the script says other things and mainly is focused on checking the LSI/SuperMicro stuff, and is hardcoded to a one single volume
18:37 partner just asking, there are petabytes of stuff on top of gluster, one would thought some decent plugins would exist. but if not no worries, not complaining, just makes sense to ask before spending time to writing own from scratch :)
18:38 JoeJulian I know what you mean.
18:38 REdOG JoeJulian: Mostly I see 0-management: readv failed (No data available)
18:38 partner happened to notice today one of my earliest, soon-to-die, test installations had one side of replica all processes down, was wondering why the disk usage does not go down.. no wonder..
18:39 partner i tend not to trust monitoring processes, on this case that would have pointed out the problem. but seen too many times procs up but nothing working
18:41 REdOG and if i try to turn that host's gluster off then I get those messages on the other host
18:41 ktosiek joined #gluster
18:42 cp0k 12:40 < cp0k> I have a pesky old .glusterfs.broken/ backup dir in the root of my gluster volume which appears to confuse Gluster and report a ton of split-brain, removing this dir from the FUSE mount point is taking a long long time. At the moment I am running low on available disk space. I know it is not a good idea to write to the storage bricks directly, but can I still safely do this if I do not care for this data?
18:45 JoeJulian cp0k: Is your brick directory a subdirectory of the mount point?
18:45 rwheeler joined #gluster
18:45 cp0k JoeJulian: /gluster/{4/5/6/7/8}/
18:45 cp0k JoeJulian:  is what my gluster brick stucture looks like
18:46 JoeJulian I'm assuming, then, that /gluster/4 is a mounted filesystem.
18:46 cp0k yes, ext4
18:47 JoeJulian bummer... If it was structured more like /gluster/4/brick where /gluster/4 was the mounted filesystem and brick/ was the brick directory it gives you the ability to mv things outside of your brick on an experimental basis.
18:47 JoeJulian So yeah, if there's nothing in the .broken directory tree that you want, I would delete it from all bricks.
18:49 cp0k JoeJulian: good to know that for the future :)
18:50 cp0k JoeJulian: so I can wipe data which has zero value to me directly from Gluster storage bricks, but what happens to all the hashes in the healthy /gluster/{4,5,6,7,8}/.glusterfs for those .glusterfs.broken files? will gluster 3.4.2 heal / delete them on its own?
18:50 JoeJulian No. :(
18:51 JoeJulian hmm...
18:51 cp0k JoeJulian: this should not be a big deal though right? unless in the future I actually create a valuable .glusterfs.broken dir and it will see exiting hashes, in which case it will start to get confused?
18:52 cp0k JoeJulian: otherwise the hashes of the directly deleted data will just remain, the lookup willl never happen on them as that dir will never be accessed, and all is good?
18:52 JoeJulian I think what I would do, if I were doing it, would be to write a python script that would build a list of all the gfids for that tree. Then I would walk the tree and unlink the files, then walk the list and unlink the gfid files.
18:52 cp0k JoeJulian: just thinking logically here based on my overall understanding of gluster thus far
18:53 JoeJulian on the other hand...
18:53 JoeJulian no, that works....
18:54 JoeJulian After doing that, be sure and do a heal...full.
18:54 cjanbanan joined #gluster
18:55 JoeJulian For that matter, you could even wipe the .glusterfs.broken and the .glusterfs tree and do a heal...full. Would be simpler.
18:55 JoeJulian The full heal will rebuild the .glusterfs tree.
18:56 JoeJulian Ok, I'm waffling again... I should be a politician...
18:56 cp0k haha
18:56 JoeJulian The difference between the crap I'm saying and your reality is that I only have gigabytes of stuff to heal.
18:57 cp0k JoeJulian: so wait, you can wipe out the .glusterfs tree entirely and gluster will still be able to find all the files? how is that possible? isnt that wiping its metadata / mapping to everything?
18:57 cp0k JoeJulian: or did you mean doing it on one brick at a time?
18:58 JoeJulian The files themselves have all the metadata you'll need. The .glusterfs tree allows the tracking of deletions/renames when a brick is offline, and hardlinks.
19:00 REdOG at least one server is hosed
19:02 REdOG im fucked
19:03 JoeJulian Just a sec, REdOG, let me finish this phone call and we'll get you un-copulated.
19:03 cp0k JoeJulian: understood, as always thank you for your help!
19:05 JoeJulian REdOG: check your /etc/glusterfs/glusterd.vol files and ensure they're all identical. Have you made any customizations to that file?
19:06 REdOG just     option rpc-auth-allow-insecure on
19:06 REdOG they are identical
19:07 JoeJulian rsync -a --delete /var/lib/glusterd/vols from one known good server to all the others.
19:07 JoeJulian Then stop all glusterd and start them all again.
19:07 REdOG I don't know for sure which is the problem
19:08 REdOG i have 2
19:08 REdOG rhel0 and kvm0
19:08 REdOG rhel0 shows kvm0 peer as connected
19:08 REdOG kvm0 shows rhel0 as not connected
19:09 JoeJulian Just do the one that's working then. Shouldn't be any differences anyway. My concern is that a volume change occurred on one server and didn't propagate to the other.
19:10 REdOG so the one from kvm0?
19:12 REdOG ok..done
19:12 JoeJulian stopped both glusterd and started them?
19:13 REdOG about to kill the second .. it may kick my irc connction
19:13 REdOG Starting glusterd:                                         [FAILED]
19:13 REdOG crap
19:13 JoeJulian Just glusterd. Shouldn't affect anything.
19:13 REdOG k
19:14 JoeJulian If it won't start, run "glusterd --debug"
19:14 JoeJulian Then fpaste that
19:16 cp0k JoeJulian: I have some weird stuff going on with my peers...the peer im probing had its /var/lib/glusterd dir corrupted by a human. I detached the bad peer from the rest of gluster, wiped /var/lib/glusterd and fired up gluster again. For some reason, after re-probing the corrupted peer it is not showing me the full peer list of all the members...only the single peer from which I probed it. I hope that makes sense
19:16 REdOG it came up this time
19:16 REdOG vols had gone vols/vols
19:16 JoeJulian oops
19:16 RobertLaptop joined #gluster
19:17 cp0k JoeJulian: on the peer from which I probed it, I am seeing: State: Peer Rejected (Connected)
19:17 REdOG statuse look the same
19:18 JoeJulian btw... that's one of my favorite things about rsync vs scp (or even cp) if you end your directory name with a / it's smart enough to know to merge the directory instead of doing that.
19:18 REdOG ya
19:18 REdOG I alwasy forget which to do tho
19:18 JoeJulian cp0k: stop all glusterd and start them again.
19:19 ira joined #gluster
19:20 JoeJulian REdOG: so peer status is ok now?
19:20 REdOG no
19:20 REdOG same as before
19:20 REdOG one connected other not
19:21 cp0k JoeJulian: restart glusterd on storage nodes only? or everywhere? including all clients?
19:21 JoeJulian cp0k: The clients don't run glusterd.
19:22 JoeJulian Unless you've done something different than normal.
19:22 cp0k JoeJulian: ok
19:23 JoeJulian REdOG: Check the contents of /var/lib/glusterd/peers. It should only have files named the uuid of the other server.
19:23 REdOG looks correct
19:24 JoeJulian double check glusterd -V
19:24 REdOG 3.4.2?
19:24 JoeJulian On both?
19:24 REdOG ya
19:25 REdOG woah new error
19:25 REdOG [2014-02-20 19:24:58.688592] E [afr-self-heal-common.c:197:afr_sh_print_split_brain_log] 0-aspartameD-replicate-0: Unable to self-heal contents of '<gfid:55ecbbcf-8f30-4958-a2a9-4d4f7e44c5fd>' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix:  [ [ 0 835 ] [ 34 0 ] ]
19:25 JoeJulian nifty...
19:25 REdOG it gave me one for each of that servers drives
19:26 JoeJulian Curious.. they're still peer rejected?
19:26 REdOG disconnected
19:27 JoeJulian and you can telnet to port 24007 from each to the other...
19:27 REdOG 0-management: readv failed (No data available)
19:27 rossi_ joined #gluster
19:28 REdOG hmm
19:30 REdOG no
19:30 REdOG can now
19:30 REdOG soemthing reset in the fw
19:30 REdOG now im getting a bunch of different readv failed no data avail
19:31 REdOG but they're connected
19:31 JoeJulian excellent.
19:31 cp0k I am in the same boat when probing the corrupt peer
19:31 JoeJulian cp0k: same telnet question...
19:32 JoeJulian REdOG: you upgraded recently, didn't you?
19:32 REdOG :/ yea
19:33 JoeJulian REdOG: I suspect one or more of the bricks haven't restarted since then. They would still be running the old version.
19:34 REdOG on status it shows rhel0 bricks as ofline
19:34 JoeJulian Since you already have some ,,(split-brain) we can't really safely restart one brick, heal, then restart the other brick. I would either fix the split-brain first, or schedule a short downtime.
19:34 glusterbot To heal split-brain in 3.3+, see http://joejulian.name/blog/fixing-split-brain-with-glusterfs-33/ .
19:35 JoeJulian And it seems like the downtime option is the safest, imho.
19:35 REdOG im good with down time
19:35 JoeJulian gluster volume stop $vol ; gluster volume start $vol
19:35 REdOG ok brb
19:36 dork left #gluster
19:39 partner Joe shot me with the spamalot, i will have to put back with: Aaron Neville - Stand by me ;)
19:41 cjanbanan joined #gluster
19:44 REdOG ok I see all Y's in status now
19:44 kmai007 JoeJulian:what is the best method to address the split-brain output if the are 1,000 files
19:45 kmai007 when most of the files are listed as <gfid:62ef4c5f-9bd1-4611-883b-21dae4a6c8ed>
19:45 JoeJulian kmai007: I generally choose backups, scripting and punting. I choose one that I feel is the most likely to be the good one and just delete from the other.
19:46 JoeJulian The gfid files are a bit trickier. That file would be in .glusterfs/62/ef/62ef4c5f-9bd1-4611-883b-21dae4a6c8ed
19:46 NuxRo ndevos: i installed another server for gluster and enabled the insecure thingy, no more error, however the KVM machine will not boot "Booting from hard disk. Boot failed: could not read the boot disk"
19:47 ccha2 is there any gfid files cleanup ? like cleanup orphaned gfid files ?
19:47 REdOG JoeJulian: with regards to split brain how do I know which one is the good one?
19:47 JoeJulian it's a hardlink to the actual file. Check the link count on it through stat. If it's 1, then you can delete it.
19:48 JoeJulian REdOG: I usually check stat and see which one is more recent. Then I mv the file out of the brick path, delete the gfid file and pray that I was right.
19:48 ccha2 aren't any glusterfs process which do that ?
19:48 klaas joined #gluster
19:48 ccha2 +there
19:48 REdOG k
19:49 JoeJulian ccha2: not that I'm aware of.
19:50 KaZeR__ joined #gluster
19:50 JoeJulian kmai007: if the gfid file being reported is actually a symlink, you can just delete it. That's supposed to be a symlink to the parent directory.
19:51 REdOG yikes I don't know which
19:51 ccha2 what happended if 2 differents gfid files link to same file, and the attr gfid is on 1 ? I mean what will happended to the other gfid file ? any auto cleanup ?
19:51 JoeJulian REdOG: sha1sum them. If they're identical then you're golden. You can safely delete either.
19:52 JoeJulian ccha2: Read the ,,(extended attributes) of the file. They will show which gfid is the correct one. Delete the other.
19:52 glusterbot ccha2: (#1) To read the extended attributes on the server: getfattr -m .  -d -e hex {filename}, or (#2) For more information on how GlusterFS uses extended attributes, see this article: http://hekafs.org/index.php/2011/04/glusterfs-extended-attributes/
19:54 ccha2 I got another problem, when I added my brick in replicated volume, heal volume info too long time
19:54 ccha2 and display nothing
19:54 JoeJulian I'm pretty sure there's no auto-cleanup of the .glusterfs tree. No normal circumstances can cause it to be in need of cleaning. These are all abnormal things and abnormality generally involves some sort of human intervention to ensure nothing is getting lost.
19:55 JoeJulian ccha2: Which version?
19:55 ccha2 3.3.2
19:56 partner if you interrupt and run it again does it say "operation failed"
19:56 JoeJulian hmm, I haven't seen that with 3.3. Obviously the connection timed out, but I don't know why. With 3.4.1 that would cause glusterd to stop responding to the cli entirely and you would have to restart glusterd to get it to answer again.
19:57 ccha2 partner: yes for volume status
19:57 partner just today i was googling for a similar/identical bug, i can't get out that info..
19:57 ccha2 but after waiting ferw minutes volume status is ok
19:58 partner my interest on the command was from a monitoring point of view, i have the 10 second time window..
19:58 ccha2 isn't it because of sefl heal on the new brick the heal volume info list is too long and took time ?
19:59 ccha2 but now I think self heal is done, since new brick got same files number
19:59 ccha2 but heal vol info got same problem
20:01 daMaestro joined #gluster
20:02 partner not sure if this applies? https://bugzilla.redhat.com/show_bug.cgi?id=829170
20:02 glusterbot Bug 829170: medium, medium, ---, pkarampu, ON_QA , gluster cli returning "operation failed" output for  command "gluster v heal vol  info"
20:04 partner just sounds like quite a bit similar i bumped into today with the monitoring tool stuff, can't get info out, at least not on timely fashion
20:04 partner anyways, time for sauna now
20:04 ccha2 yes same thing
20:04 ccha2 partner: which version ?
20:04 kmai007 manual cleaning of split-brain on a volume, can i still keep the volume running?
20:05 ccha2 yes you can
20:05 kmai007 ccha2: was that answer to me?
20:06 ccha2 yes
20:07 partner ccha2: today i got it for the old 3.3.1 setup which was quite badly out of sync with the replicas (all procs dead)
20:07 kmai007 coincidence, all 4 bricks show 1023 files from the split-brain report
20:07 B21956 joined #gluster
20:09 kmai007 omg, so just trying to pick a gfid, i find that <ecfe5234-e2ba-45bd-a188-fa39c6502f84>  is listed multiple times in a brick...i hope its not doomsday
20:09 ccha2 partner: how to manualy tempory fix the heal volume info ?
20:10 ccha2 stop and start the vol,... that should be the lastest way
20:12 ccha2 restart glusterfs on 1 replica then the second
20:15 kmai007 JoeJulian: when I run find /static/content/ -name ecfe5234-e2ba-45bd-a188-fa39c6502f84 on all 4 bricks i don't get ay data files, only /static/content/.glusterfs/ec/fe/ecfe5234-e2ba-45bd-a188-fa39c6502f84
20:15 kmai007 is it safe to delete on all bricks?
20:15 kmai007 it returns the hardlink i mean
20:17 REdOG JoeJulian: im not sure that worked
20:17 REdOG I removed one and heal  info split-brain shows even more listed now
20:17 cjanbanan joined #gluster
20:17 REdOG 4 on one and 5 on the other
20:18 REdOG theres only 1 file
20:19 JoeJulian kmai007: ls -i /static/content/.glusterfs/ec/fe/ecfe5234-e2ba-45bd-a188-fa39c6502f84| cut -d' ' -f1 | xargs find /static/content -inum
20:19 JoeJulian REdOG: note that the info split-brain is a log. There are timestamps.
20:19 NuxRo ndevos: btw if i edit the vm to use the fuse mount point file instead of gluster+tcp path, then the VM works great
20:20 JoeJulian REdOG: If you can read the file through the client, then it's healed.
20:20 kmai007 ls: cannot access /static/content/.glusterfs/ec/fe/ecfe5234-e2ba-45bd-a188-fa39c6502f84: Too many levels of symbolic links
20:20 kmai007 that is a new output for me
20:20 kmai007 all 4 bricks report that
20:21 JoeJulian make that ls -id
20:21 REdOG the mount fails
20:21 kmai007 oh its a directory
20:21 kmai007 gosh darnit, i had no clue
20:21 kmai007 JoeJulian:thx
20:21 JoeJulian kmai007: it shouldn't be.
20:22 JoeJulian kmai007: if it is, delete it.
20:22 JoeJulian It's supposed to be a symlink or a normal file. That's all that should be in .glusterfs.
20:22 REdOG well the split-brain log seems to just add more entries over time
20:22 kmai007 JoeJulian:ls -id returned a value of: /static/content/.glusterfs/ec/fe/ecfe5234-e2ba-45bd-a188-fa39c6502f84
20:22 kmai007 no inum
20:22 kmai007 i guess its safe to delete
20:23 JoeJulian sounds like it.
20:23 kmai007 JoeJulian:thanks a million
20:23 REdOG and the brick is empty on the other host
20:24 JoeJulian REdOG: Well that's one way to fix it... So you have a wiped brick and a good brick and it's still producing split-brain logs? That doesn't make sense...
20:26 REdOG do I need to wipe the .glusterfs directory also?
20:26 REdOG I didn't touch that
20:26 REdOG just the files in the brick
20:26 JoeJulian yes
20:26 REdOG pf
20:26 REdOG k
20:28 REdOG 0-aspartameD-replicate-0: Could not create the task for 1 ret -1
20:29 JoeJulian Never seen that one. What's the rest of it?
20:29 REdOG E [afr-self-heald.c:1243:afr_start_crawl] 0-aspartameD-replicate-0: Could not create the task for 1 ret -1
20:29 REdOG all but timestamp
20:31 JoeJulian REdOG: was that from a heal...full?
20:31 REdOG info
20:31 JoeJulian Do a heal full
20:31 JoeJulian Oh, right... that makes sense for info...
20:31 REdOG ok
20:31 REdOG full gave no such error
20:31 JoeJulian We deleted the directory it uses to create the heal table.
20:31 REdOG ah
20:32 kmai007 JoeJulian:shall i delete the associate file also from my find cmd:  /static/content/.glusterfs/indices/xattrop/ecfe5234-e2ba-45bd-a188-fa39c6502f84
20:32 kmai007 since the hash matches
20:32 JoeJulian kmai007: Yes
20:33 JoeJulian Seems like the day for split-brain...
20:33 JoeJulian is there a full moon/
20:33 JoeJulian ?
20:33 kmai007 damn that is 1 out of 1023, no but we got snow/rain out of no where in the midwest
20:33 kmai007 split-brain activity on a cold day, what a combo
20:34 JoeJulian kmai007: ugh... 1023 means there's probably more. That's where it stops counting.
20:35 KaZeR joined #gluster
20:35 kmai007 (x_x) i'm crying
20:36 JoeJulian You're either going to need to do some scripting, or do like REdOG did and wipe a brick and let afr repair it.
20:36 kmai007 it produced a report on all 4 bricks as split-pea
20:37 JoeJulian Which is really nice to have on a cold day.
20:37 kmai007 ok so if i run heal volume info, is that real-time updated?
20:37 kmai007 i'd like to eventually see i'm making progress
20:37 REdOG JoeJulian: ok one of those drives looks ok... in info it shows 1 entry on one brick and 0 on the other
20:37 kmai007 chizzle down mt. rushmore
20:37 cjanbanan joined #gluster
20:38 JoeJulian no, it crawls like every 5 minutes.
20:38 REdOG what's that mean?
20:39 REdOG still gives the could not create task if I run it
20:39 JoeJulian kmai007: but you can force it by reading that file through the client. I use head to just grab the first 1k and dump that to /dev/null.
20:39 REdOG but the file looks like its back in the brick and the split-brain msg is gone
20:39 kmai007 ok
20:39 JoeJulian REdOG: I think you're good. It may take a while since it has to walk the whole disk and heal everything.
20:40 REdOG k
20:40 rpowell Does the gluster.fuse client mount a volume under /tmp under certain situations ?   I have a /tmp/mntkMVJIT on a node I I do not remember mounting.
20:40 REdOG ill give it time
20:40 REdOG im hitting it pretty hard on io moving the other disk
20:40 daMaestro joined #gluster
20:41 JoeJulian rpowell: yes. Looks like it might have something to do with quota and/or the cli.
20:41 rpowell ah
20:41 rpowell thanks
20:42 JoeJulian fyi, I came to that conclusion by "git grep 'tmp/mnt'" on the source.
20:45 ccha2 is there any documenation about the ouput of "gluster system:: fsm log" ?
20:47 JoeJulian None that I've seen...
20:48 JoeJulian That was added as part of oldbug 1966
20:48 glusterbot Bug https://bugzilla.redhat.com:443/show_bug.cgi?id=763698 medium, medium, 3.1.1, pkarampu, CLOSED CURRENTRELEASE, Unnecessarily verbose logs at the default log level
20:49 ccha2 ok thanks
20:49 ccha2 git grep again ?
20:51 nightwalk joined #gluster
20:52 sadler joined #gluster
20:53 rotbeard joined #gluster
20:53 sadler anybody know what causes "0-mem-pool: invalid argument"?
20:54 rwheeler joined #gluster
20:54 sadler glusterd fails to start
20:54 cjanbanan joined #gluster
21:00 Matthaeus joined #gluster
21:01 aquagreen1 joined #gluster
21:04 atoponce joined #gluster
21:06 glusterbot joined #gluster
21:06 VeggieMeat joined #gluster
21:06 ccha2 joined #gluster
21:06 ultrabizweb joined #gluster
21:06 smellis joined #gluster
21:06 eryc joined #gluster
21:06 PacketCollision joined #gluster
21:06 hflai joined #gluster
21:06 m0zes joined #gluster
21:06 divbell joined #gluster
21:06 Cenbe joined #gluster
21:06 stigchristian joined #gluster
21:06 wica joined #gluster
21:06 wcchandler joined #gluster
21:06 JordanHackworth joined #gluster
21:06 primusinterpares joined #gluster
21:06 atrius` joined #gluster
21:06 Kins joined #gluster
21:06 crazifyngers_ joined #gluster
21:06 klerg joined #gluster
21:06 mattapperson joined #gluster
21:06 REdOG joined #gluster
21:06 plarsen joined #gluster
21:06 JoeJulian ccha2: git grep, git log... :D
21:06 yosafbridge joined #gluster
21:06 JoeJulian And the fact that I programmed glusterbot to be able to look up the old bug numbers.
21:07 kmai007 joined #gluster
21:07 JoeJulian Wow, sadler didn't stay long. Must need Ritalin.
21:07 kmai007 would the best approach to tackle split-brain is to focus on the files reported by the brick?
21:08 semiosis joined #gluster
21:08 wgao joined #gluster
21:08 partner joined #gluster
21:08 radez joined #gluster
21:08 zerick joined #gluster
21:10 Matthaeus joined #gluster
21:10 JoeJulian kmai007: There's an old post on my blog for finding files with dirty xattrs. You could use that against the good brick to produce a list of files to check, then script checking that list against the xattrs on the other brick and delete (or however you're handling that) the files necessary to heal the split-brain.
21:11 ccha2 in split-brain(which self-heal can't heal), aren't those files from clients not enabled ?
21:11 REdOG had to unload and reload fuse module
21:11 JoeJulian http://joejulian.name/blog/quick-and-dirty-python-script-to-check-the-dirty-status-of-files-in-a-glusterfs-brick/
21:11 glusterbot Title: Quick and dirty python script to check the dirty status of files in a GlusterFS brick (at joejulian.name)
21:11 * REdOG is a little more confident now
21:12 JoeJulian ccha2: right. A client will not be able to access a file that is split-brain.
21:12 JoeJulian s/access/read or modify/
21:12 glusterbot What JoeJulian meant to say was: ccha2: right. A client will not be able to read or modify a file that is split-brain.
21:14 james__ joined #gluster
21:14 Slash__ joined #gluster
21:14 georgeh|workstat joined #gluster
21:14 ccha2 good to know when heal volume info not working
21:14 tjikkun_work joined #gluster
21:14 jiffe98 joined #gluster
21:15 RobertLaptop joined #gluster
21:15 ccha2 about heal volume info healed, does the list autopurge ?
21:15 Matthaeus joined #gluster
21:16 ccha2 expiration purge entries ?
21:16 cp0k joined #gluster
21:18 ccha2 because that list could be urge, when addbrick for replication
21:18 JoeJulian No, it doesn't. One of my pet peeves. You can purge it by restarting all glusterd.
21:18 sputnik13 joined #gluster
21:22 kmai007 JoeJulian: excuse the noob, but when i run the qwikndirty i get ImportError: No module named xattr
21:23 JoeJulian yum install pyxattr
21:23 REdOG_ joined #gluster
21:23 Kins_ joined #gluster
21:23 glusterbot` joined #gluster
21:23 kmai007 JoeJulian:thxxxxtr
21:24 gmcwhistler joined #gluster
21:25 smellis_ joined #gluster
21:29 crazifyngers_ joined #gluster
21:31 klerg_ joined #gluster
21:31 mattapp__ joined #gluster
21:31 divbell_ joined #gluster
21:31 ccha3 joined #gluster
21:31 calum_ joined #gluster
21:31 stigchri1tian joined #gluster
21:32 VeggieMeat_ joined #gluster
21:32 bit4man joined #gluster
21:32 B21956 joined #gluster
21:32 JMWbot joined #gluster
21:32 JMWbot I am JMWbot, I try to help remind johnmark about his todo list.
21:32 JMWbot Use: JMWbot: @remind <msg> and I will remind johnmark when I see him.
21:32 JMWbot /msg JMWbot @remind <msg> and I will remind johnmark _privately_ when I see him.
21:32 JMWbot The @list command will list all queued reminders for johnmark.
21:32 JMWbot The @about command will tell you about JMWbot.
21:32 PacketC0llision joined #gluster
21:33 P0w3r3d joined #gluster
21:33 lman4821 joined #gluster
21:33 REdOG_ joined #gluster
21:34 cjanbanan joined #gluster
21:34 lkoranda joined #gluster
21:38 lman4821 Apologizes to the group if this is already answered, but I'v had a hard time getting definitive answers via google and other docs.  I am looking to setup Gluster in EC2 for shared home directories…  I have tested using 3.4.2, which was easy to get working.  I can easily mount (and use automount) with a client via NFS.. which is great. However I am not sure if the node that the client mount goes away (dies) if the NFS m
21:40 JoeJulian lman4821: Too long for irc, but I get the gist... Unfortunately the only way to achieve that with nfs would be to use a floating virtual ip. I'm not personally aware of any way of doing that in EC2.
21:40 kmai007 JoeJulian:i ran your dirty script is there output somewhere? i just get my prompt back
21:40 JoeJulian lman4821: Is there a reason not to use a fuse mount?
21:40 JoeJulian kmai007: That would mean that there are no dirty xattrs.
21:41 kmai007 ok good i ran it in my R&D environment
21:42 JoeJulian The lines that start with "print" will print output to stdout.
21:42 lman4821 JoeJulian: thanks.  I am trying to understand the pieces.  If I need to run the fuse mount (you mean glusterfs mpunt right?) then I can do that ( I think..)
21:43 hflai joined #gluster
21:43 JoeJulian lman4821: yes. ,,(mount server)
21:43 glusterbot lman4821: The server specified is only used to retrieve the client volume definition. Once connected, the client connects to all the servers in the volume. See also @rrdns
21:44 Cenbe joined #gluster
21:45 lman4821 glusterbot: "server specified is only used to retrieve the client" I know that is true for the glusterfs fuse mount.  Is that true for the NFS mount if its the non-fuse mount (I am assuming not).  Just trying to get the full picture (thanks for patience)
21:47 JoeJulian lman4821: Right. nfs has no clustering ability. It just connects to the one ip address once the hostname is resolved and has to maintain that tcp connection.
21:48 REdOG JoeJulian: I don't think the heal is working
21:49 rwheeler joined #gluster
21:49 JoeJulian It does it as a low-priority action and in chunks every 10 minutes. You can force a heal by doing a find on your client mount if you want.
21:50 REdOG the .glusterfs directories get recreated automatically?
21:50 japuzzo_ joined #gluster
21:51 REdOG its been an hour and split-brain info looks the same
21:51 KaZeR__ joined #gluster
21:51 sarkis joined #gluster
21:52 hybrid5121 joined #gluster
21:53 Matthaeus1 joined #gluster
21:53 gmcwhist_ joined #gluster
21:53 rpowell1 joined #gluster
21:53 _VerboEse joined #gluster
21:53 PacketCollision joined #gluster
21:54 PacketCollision left #gluster
21:54 khushildep_ joined #gluster
21:54 cjanbanan joined #gluster
21:55 purpleid1a joined #gluster
21:55 james_ joined #gluster
21:55 __NiC joined #gluster
21:55 kmai007 so much pain
21:55 social_ joined #gluster
21:55 kmai007 the dirty didn't print anything on the reported brick
21:56 kmai007 but, i am making progress cleaning up the brick manually, and see it getting knocked out from the split-brain info output
21:56 Dave2_ joined #gluster
21:57 lman4821 Thanks for the help all.
21:58 mattapperson joined #gluster
21:58 lman4821 Second question.  Does anyone use automount for glusterfs fuse ?
21:59 rossi_ joined #gluster
22:00 hflai joined #gluster
22:01 cp0k joined #gluster
22:01 JoeJulian joined #gluster
22:01 mojorison joined #gluster
22:01 P0w3r3d joined #gluster
22:01 quique_ joined #gluster
22:02 jobewan joined #gluster
22:02 Oneiroi joined #gluster
22:02 hchiramm__ joined #gluster
22:03 harish_ joined #gluster
22:03 johnmilton joined #gluster
22:03 zaitcev joined #gluster
22:03 uebera|| joined #gluster
22:06 fyxim joined #gluster
22:06 Slasheri_ joined #gluster
22:06 eshy joined #gluster
22:06 ultrabizweb joined #gluster
22:06 eryc joined #gluster
22:06 m0zes joined #gluster
22:06 semiosis joined #gluster
22:06 wgao joined #gluster
22:06 partner joined #gluster
22:06 radez_g0n3 joined #gluster
22:06 REdOG JoeJulian: should I see the number of entries go down?
22:17 calum_ joined #gluster
22:20 Guest33286 left #gluster
22:22 refrainblue joined #gluster
22:23 tg2 joined #gluster
22:24 Matthaeus joined #gluster
22:24 cjanbanan joined #gluster
22:26 rossi__ joined #gluster
22:28 REdOG http://pastie.org/8753491
22:28 glusterbot Title: #8753491 - Pastie (at pastie.org)
22:28 REdOG still looks like that after an hour
22:28 REdOG I should have just stayed in bed this morning
22:29 JoeJulian Excellent. So no new entries in the last two ours.
22:30 REdOG do they ever go away?
22:30 sputnik13 joined #gluster
22:30 JoeJulian Maybe
22:30 * REdOG is lost
22:30 JoeJulian You can make them by restarting glusterd. Otherwise, no.
22:31 JoeJulian It's confusing, I know. I filed a bug report about that.
22:31 kmai007 REdOG: what # should be going down?
22:32 REdOG kmai007: I don't know I was asking...
22:32 REdOG for all my other drives its 0
22:32 REdOG im assuming a lot because this is new to me
22:32 kmai007 oh the vol heal vol. info output ?
22:33 Oneiroi joined #gluster
22:33 REdOG info split-brain
22:34 kmai007 oh, yeh mind is still 1023
22:34 REdOG ok
22:36 Guest26138 joined #gluster
22:36 radez_g0n3 joined #gluster
22:36 partner joined #gluster
22:36 wgao joined #gluster
22:36 semiosis joined #gluster
22:36 Slasheri_ joined #gluster
22:36 eshy joined #gluster
22:36 ultrabizweb joined #gluster
22:36 eryc joined #gluster
22:36 m0zes joined #gluster
22:37 kmai007 makes me sad
22:37 REdOG reading, https://access.redhat.com/site/sites/default/files/attachments/rhstorage_split-brain_20131120_0.pdf and it makes me think that number of entries is significant as to what needs to be done
22:38 REdOG It would be comforting if there was somewhere for me to verify that its finished or in progress or failed or something but stale entries
22:38 kmai007 no kidding, if there was some % counter or meter
22:39 harish_ joined #gluster
22:40 P0w3r3d joined #gluster
22:40 refrainblue joined #gluster
22:40 hybrid512 joined #gluster
22:42 sputnik13 joined #gluster
22:46 JoeJulian Of course it would have to know how many files there were first...
22:47 radez_g0n3 joined #gluster
22:47 partner joined #gluster
22:47 wgao joined #gluster
22:47 semiosis joined #gluster
22:47 eshy joined #gluster
22:47 Slasheri_ joined #gluster
22:47 ktosiek joined #gluster
22:48 sputnik13 joined #gluster
22:48 kmai007 any1 going to summit?
22:49 refrainblue anyone here using glusterfs on oracle linux 6.5?
22:49 kmai007 unbreakable?
22:50 refrainblue yea
22:50 refrainblue uek
22:50 refrainblue kernel
22:50 refrainblue or at least mounting a glusterfs volume with it
22:51 kmai007 i'd imagine it wouldn't be any different
22:51 kmai007 if you mounting FUSE or NFS ?
22:51 refrainblue been having trouble with it, wondering if anyone else has done it successfully with uek
22:51 gdubreui joined #gluster
22:51 refrainblue can mount the volume as nfs, but not as glusterfs with fuse
22:52 kmai007 whats the error you get?
22:52 sputnik13 joined #gluster
22:52 refrainblue but have it mounted using centos 6.4/6.5
22:52 refrainblue no errors or criticals, just warnings
22:52 Rydekull joined #gluster
22:52 kmai007 is FUSE module running?
22:52 kmai007 lsmod|grep fuse
22:52 refrainblue yes its running
22:52 kmai007 intersting
22:53 kmai007 you've probably checked everything... i've not had any exp. with uek
22:53 DannyFS joined #gluster
22:53 kmai007 unless the ports are privledged in uek
22:53 kmai007 for the volume you're trying to mount
22:53 refrainblue i've tested it with 3.4.0, 3.4.1, and 3.4.2 packages
22:54 refrainblue the only thing i havent tried is rebooting into rhck
22:54 refrainblue but its a production machine so ive been holding back on reboot
22:54 kmai007 mount -vvv -t glusterfs brick:/path /mnt/point doesn't show you anything useful?
22:54 kmai007 yikes, goodluck
22:55 kmai007 have you tried an strace, and see where it might die?
22:55 kmai007 or get denied?
22:55 refrainblue have not
22:56 refrainblue this is the final warning i get before it unmounts and fails:
22:56 refrainblue W [glusterfsd.c:1099:cleanup_and_exit] (-->/lib64/libc.so.6(clone+0x6d) [0x39076e8b6d] (-->/lib64/libpthread.so.0() [0x3d13e079d1] (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd) [0x4052ad]))) 0-: received signum (15), shutting down
22:57 JoeJulian did you try changing the dmesg level to debug and see if the kernel will tell you why it's killing that?
22:58 kmai007 JoeJulian: on 1 brick, i have 0 split-brain files, i'm make progress! <i think>
22:58 kmai007 3 more to go
22:58 JoeJulian refrainblue: You could also try glusterfs --debug --volfile-id=$volname --volfile-server=$mount_server $mount_path
23:00 REdOG woah glusterfsd is hammering my system now
23:00 REdOG load average: 63.05, 39.15, 19.36
23:00 JoeJulian yee-haw
23:00 REdOG its not even kind of slow :D
23:00 cjanbanan joined #gluster
23:01 refrainblue i changed the dmesg log level just now, ill test again
23:02 refrainblue hmm nothing showed up
23:02 sputnik13 joined #gluster
23:04 JoeJulian well that's not very helpful.
23:04 JoeJulian Does oracle offer support?
23:05 refrainblue i did ask someone from oracle, but their advice was to see if it works with rhck first as well.  trying to see if i can do anything else first because i havent received permission to reboot the machine yet
23:05 KyleG joined #gluster
23:05 KyleG joined #gluster
23:05 JoeJulian Not trying to pass the buck or anything, but since you've isolated oracle linux, it seems like a logical step.
23:06 JoeJulian Do you have any other unused oracle boxes?
23:06 refrainblue not with the same update pack & kernel version
23:06 JoeJulian or can you spin up a vm with that?
23:07 primechuck joined #gluster
23:07 refrainblue spose i could try that...
23:09 jobewan joined #gluster
23:10 refrainblue our vm server is already running like 16 vms though lol
23:14 daMaestro joined #gluster
23:15 kmai007 oh sh*t
23:15 kmai007 i'm down to my last server for my split-pea soup
23:20 zerick joined #gluster
23:26 JoeJulian @split-brain
23:26 glusterbot JoeJulian: To heal split-brain in 3.3+, see http://joejulian.name/blog/fixing-split-brain-with-glusterfs-33/ .
23:28 sputnik13 joined #gluster
23:28 diegows joined #gluster
23:31 cjanbanan joined #gluster
23:53 primechuck joined #gluster
23:54 cjanbanan joined #gluster
23:57 sprachgenerator joined #gluster
23:57 mattappe_ joined #gluster

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary