Perl 6 - the future is here, just unevenly distributed

IRC log for #gluster, 2013-11-08

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:08 kodiakFiresmith joined #gluster
00:08 g4rlic left #gluster
00:23 vpshastry joined #gluster
00:38 kPb_in joined #gluster
00:47 bennyturns joined #gluster
00:51 johnsonetti joined #gluster
00:55 neofob_home joined #gluster
00:55 dbruhn joined #gluster
01:08 yinyin joined #gluster
01:25 jake[work] i put the final mount command in /etc/rc.local.  is there any obvious issue with that?
01:25 jake[work] seems to work
01:26 dbruhn I do the same thing, because my gluster servers are also my clients
01:26 jake[work] right!  ok.  that sounds good
01:26 jake[work] because fstab didn't seem to work
01:27 jake[work] i guess because some of the modules were not loaded
01:29 elyograg localhost:mdfs /mnt/mdfs glusterfs defaults,_netdev 1 3
01:30 elyograg the _netdev causes redhat OSes to delay the mount until after other things like the network have loaded.
01:30 jake[work] really?  i'm going to try it
01:30 elyograg not sure what you'd need to do for a debian-derived OS.
01:30 jake[work] oh
01:34 jake[work] ok.  i'll just leave it where it is.  i just ran across another article saying that's where to put it for debian
01:44 failshell joined #gluster
01:50 failshell joined #gluster
01:54 harish_ joined #gluster
01:58 davidbierce joined #gluster
01:58 yinyin joined #gluster
02:08 bharata-rao joined #gluster
02:14 davinder joined #gluster
02:17 tomased joined #gluster
02:18 [o__o] joined #gluster
02:20 [o__o] left #gluster
02:22 [o__o] joined #gluster
02:28 DV joined #gluster
02:36 harish_ joined #gluster
02:40 diegows_ joined #gluster
02:54 bala joined #gluster
02:59 kshlm joined #gluster
03:07 shubhendu joined #gluster
03:20 sgowda joined #gluster
03:46 vpshastry joined #gluster
03:48 vpshastry left #gluster
03:49 mattappe_ joined #gluster
03:52 itisravi joined #gluster
04:03 sprachgenerator joined #gluster
04:05 kanagaraj joined #gluster
04:06 rjoseph joined #gluster
04:08 _NiC joined #gluster
04:11 shylesh joined #gluster
04:26 vpshastry joined #gluster
04:28 rastar joined #gluster
04:34 psharma joined #gluster
04:39 mattappe_ joined #gluster
04:56 dusmant joined #gluster
05:01 ppai joined #gluster
05:14 lalatenduM joined #gluster
05:19 aravindavk joined #gluster
05:20 raghu joined #gluster
05:20 ndarshan joined #gluster
05:23 ababu joined #gluster
05:24 bala1 joined #gluster
05:29 mohankumar joined #gluster
05:31 bala1 joined #gluster
05:44 CheRi joined #gluster
05:45 glusterbot New news from newglusterbugs: [Bug 1028281] Create Volume: No systems root partition error when script mode <http://goo.gl/iZbUIP>
05:59 sgowda joined #gluster
06:02 nshaikh joined #gluster
06:03 shireesh joined #gluster
06:06 bulde joined #gluster
06:17 vimal joined #gluster
06:18 mohankumar joined #gluster
06:22 vshankar joined #gluster
06:25 satheesh joined #gluster
06:38 RameshN joined #gluster
06:40 ngoswami joined #gluster
06:47 ngoswami joined #gluster
06:53 satheesh joined #gluster
06:53 ndarshan joined #gluster
06:54 ricky-ticky joined #gluster
07:01 pkoro joined #gluster
07:08 yinyin joined #gluster
07:10 davinder joined #gluster
07:17 warci joined #gluster
07:26 jtux joined #gluster
07:28 danci1973 I have a 10Gbps IB and with iperf & alike I'm getting IPoIB speeds of about 3.6-3.7 Gbps ...  But when I'm doing something on gluserfs (like 'dd if=zerd of=test ....'), glusterfs process consumes quite a lot of CPU (110-120% in 'top'), but only consumes about 1.1 - 1.2 Gbps of IPoIB link (monitored with 'iftop')... What gives?
07:29 mohankumar joined #gluster
07:30 samppah danci1973: have you tried running dd from several nodes at same times?
07:31 danci1973 samppah: Not yet...
07:39 samppah can you do that?
07:39 danci1973 samppah: With two nodes I'm getting 1.1 Gbps on the first and 1.7 Gbps on the second (they should be identical HW and SW wise).
07:41 danci1973 samppah: Glusterfs on the second node is also using less CPU - about 75% in 'top'...
07:41 samppah hmm, intresting
07:41 samppah danci1973: anything in log files?
07:42 danci1973 samppah: The only difference is that I have limited RAM to 512MB on Dom0 on node 1 for 'bonnie++' testing (less RAM = smaller files = faster), while the second node has 2048MB in Dom0...
07:42 samppah ahh
07:42 samppah danci1973: can you try fio for testing?
07:43 danci1973 samppah: fio ?
07:44 samppah http://freecode.com/projects/fio
07:44 glusterbot Title: fio – Freecode (at freecode.com)
07:44 danci1973 Thanks - just 'fio' gave too many hits in Google... :D
07:45 hagarth joined #gluster
07:45 danci1973 samppah: I'll try 'fio'.
07:45 samppah hehe
07:45 samppah what distribution you are using?
07:46 danci1973 samppah: OpenSuSE 12.3
07:47 samppah okay, i haven't used opensuse for few years but i think there is prebuilt package of fio available for it :)
07:47 danci1973 I generally use stuff already included in the kernel (3.7.10), and have tools and uses space stuff compiled from OFED 3.5 packages.
07:47 danci1973 samppah: Yes, fio is there. :D
07:48 danci1973 samppah: Any specific fio command line you'd like me to use?
07:51 samppah danci1973: on EPEL packages there are some example tests in /usr/share/doc/fio-2.0.13/examples
07:51 danci1973 samppah: Yup, got some here too.
07:52 samppah iometer-file-access-server is good for quick testing, but it shows IOPS
07:52 danci1973 Where do I start the 'fio --server' ? On my 'client' node or on the server?
07:53 samppah you can just run it on client with: fio testfile
07:54 samppah i think it's possible to test network performance with it too, but i haven't tried that out
07:54 danci1973 Aaa... no need for a server, then. GOod.
07:55 danci1973 ETA is 9 hours and rising fast ??
07:56 samppah huh
07:56 danci1973 On the second node it's 46m and falling...
07:56 samppah hmm
07:56 danci1973 Weird... Maybe low memory is limiting it...
07:57 samppah are you running it on same mount point?
07:57 danci1973 Yes, same Gluster volume if that's what you mean?
07:57 samppah yes.. fio writes testfile something like iometer.1.0 is it possible that node 1 and 2 are accessing same file?
07:58 samppah would be better to do /mnt/glusterVol/node1 and /mnt/glusterVol/node2
07:58 danci1973 samppah: Sure, if 'fio' doesn't take care of unique naming. :D
07:58 danci1973 I'll do it in separata dirs then...
07:59 samppah yeah, i don't they are aware of each other :)
08:00 danci1973 I guess I'll reboot the 'node1' to bring memory to the same level as 'node2' (2GB)...
08:02 samppah another quick test for sequential io with larger bytesize: fio --bs=64k --direct=1 --size=1g --ioengine=libaio --name=foo --rw=write --iodepth=32
08:03 samppah danci1973: yeah, it should use disk directly but it's possible that it's doing some caching before committing writes
08:11 keytab joined #gluster
08:14 danci1973 Weird... Using that command line (increased size to 10G) I see TCP traffic on 'ib0' jumping from 100Mbps to 1.1Gbps...  But it's at least the same on both nodes. :)
08:15 eseyman joined #gluster
08:17 danci1973 Interesting - 'fio' apparently can use rdma libs... But I'll have to compile my own version...
08:17 glusterbot New news from resolvedglusterbugs: [Bug 953694] Requirements of Samba VFS plugin for glusterfs <http://goo.gl/v7g29>
08:19 samppah danci1973: yeh, that iometer-file-access-server is mostly testing small bytesize mixed random writes/read, so it's kind of emulating real world situation
08:24 danci1973 samppah: Well... I'm going to use Gluster as VM image store... So basically - I have two servers as 'storage nodes', with HW RAID, 7 drives and dual port IB HCA. Then I'll have 2 or more XEN, which will mount a Gluster volume and use VM images on that... Is this a viable approach?
08:25 blook joined #gluster
08:29 samppah danci1973: there are some benefits using kvm with glusterfs, qemu has support for libgfapi with kvm so it's able to bypass fuse and it will give better performance
08:29 samppah if you want to test performance it would be better to install domU's and test performance inside those
08:29 danci1973 samppah: I did and it was 'not good', so I though I'd first test in Dom0...
08:29 samppah ach
08:30 danci1973 samppah: But I'm quite open to using KVM if that's beneficial...
08:30 samppah great!
08:30 samppah what test did you run inside domU btw?
08:30 danci1973 samppah: I was using bonnie++ ...
08:31 samppah danci1973: was there some specific test that was slow or was it overall?
08:31 danci1973 To be hones, I'd like to get some tools to actually test 'raw' IB performance...
08:31 samppah ah, okay
08:32 samppah rdma?
08:32 danci1973 samppah: It was slow overall, I also tried to test varios image formats / Xen backand (file:, tap:aio, tap:qcow2) and although I thought 'tap' driver should be better, it wasn't - file:... . did best.
08:32 danci1973 samppah: I'd love to try rdma, but not really sure how.
08:33 danci1973 I do use 'mount -t glusterfs -o transport=rdma ....' and my 'mount' shows it mounted 'gluster1:vm_store.rdma', but would I see TCP traffic if it was using RDMA? I guess not.
08:34 danci1973 Besides - somebody on here told me RDMA doesn't work (yet) in Gluster 3.4 ...
08:35 samppah i have heard that it has some problems
08:36 hngkr joined #gluster
08:37 vpshastry1 joined #gluster
08:43 mgebbe_ joined #gluster
08:44 DV joined #gluster
08:45 sgowda joined #gluster
08:50 zwu joined #gluster
08:52 bulde1 joined #gluster
09:01 danci1973 samppah: Do you know what one has to do to get 'rdma' working? As in - what kernel modules to load, user space tools to use to initialize and stuff? Cause I compiled 'fio' with '-libverbs -lrdmacm' and it still says they're not there...
09:04 hybrid5121 joined #gluster
09:08 satheesh3 joined #gluster
09:21 rastar joined #gluster
09:28 psharma joined #gluster
09:44 satheesh1 joined #gluster
09:45 saurabh joined #gluster
10:04 satheesh1 joined #gluster
10:10 dusmant joined #gluster
10:36 samppah danci1973: sorry, i don't have any experience about IB/RDMA..
10:37 hagarth joined #gluster
10:38 Norky joined #gluster
10:47 rjoseph joined #gluster
10:52 bulde joined #gluster
10:55 rotbeard joined #gluster
11:14 sgowda joined #gluster
11:16 glusterbot New news from newglusterbugs: [Bug 808073] numerous entries of "OPEN (null) (--) ==> -1 (No such file or directory)" in brick logs when an add-brick operation is performed <http://goo.gl/zQN2F>
11:19 vpshastry1 joined #gluster
11:29 rastar joined #gluster
11:48 davinder joined #gluster
11:57 bulde joined #gluster
12:02 diegows_ joined #gluster
12:04 rastar joined #gluster
12:05 Alpinist joined #gluster
12:10 vpshastry joined #gluster
12:15 itisravi_ joined #gluster
12:16 vpshastry left #gluster
12:21 ctria joined #gluster
12:29 bulde joined #gluster
12:29 itisravi joined #gluster
12:56 vpshastry joined #gluster
12:57 bulde joined #gluster
13:01 calum_ joined #gluster
13:07 ninkotech joined #gluster
13:10 ninkotech_ joined #gluster
13:12 pojir joined #gluster
13:13 pojir left #gluster
13:21 mattappe_ joined #gluster
13:30 hagarth joined #gluster
13:32 ndarshan joined #gluster
13:33 edward1 joined #gluster
13:33 davidbierce joined #gluster
13:34 rcheleguini joined #gluster
13:42 mattappe_ joined #gluster
13:47 ctria joined #gluster
13:48 mattapperson joined #gluster
13:54 mattappe_ joined #gluster
13:57 shireesh joined #gluster
13:59 mattappe_ joined #gluster
13:59 B21956 joined #gluster
14:00 mattappe_ joined #gluster
14:01 davidbierce joined #gluster
14:02 mohankumar joined #gluster
14:04 rwheeler joined #gluster
14:12 dbruhn joined #gluster
14:12 ctria joined #gluster
14:21 VerboEse joined #gluster
14:46 LoudNoises joined #gluster
14:48 lpabon joined #gluster
14:50 bennyturns joined #gluster
14:56 harish joined #gluster
14:57 RameshN joined #gluster
14:59 itisravi joined #gluster
15:01 [o__o] left #gluster
15:03 [o__o] joined #gluster
15:04 wushudoin joined #gluster
15:11 dbruhn Is there anyone here who wants to help me work through this weird issue I have been having with data being duplicated in my volume?
15:11 dbruhn it seems to persist after an unmount and remount
15:15 bugs_ joined #gluster
15:20 ira joined #gluster
15:20 ira joined #gluster
15:21 zerick joined #gluster
15:31 plarsen joined #gluster
15:35 DV joined #gluster
15:38 mattappe_ joined #gluster
15:48 T0aD joined #gluster
15:49 shireesh joined #gluster
15:49 mattappe_ joined #gluster
15:50 chirino joined #gluster
15:50 bulde joined #gluster
15:54 bulde joined #gluster
16:00 itisravi joined #gluster
16:08 hateya joined #gluster
16:15 homer5439 joined #gluster
16:15 homer5439 is there some documentation of the network protocols used by gluster?
16:27 VerboEse joined #gluster
16:31 kmai007 @homer5439
16:31 kmai007 http://www.jamescoyle.net/how-to/457-glusterfs-firewall-rules
16:31 glusterbot <http://goo.gl/bIuGaF> (at www.jamescoyle.net)
16:32 skered- joined #gluster
16:41 homer5439 that doesn't really say much about how the protocol work
16:41 kmai007 protocols
16:41 kmai007 TCP/IP
16:41 homer5439 as in "capture traffic with tcpdump and make sense of it"
16:41 kmai007 does not work with UDP
16:41 kmai007 it lists the ports
16:42 homer5439 I mean message types, who send which message to who, message format and contents
16:42 homer5439 the used ports are easily found out
16:43 homer5439 and I know wireshark has a dissector, but that doesn't help much without knowing how the protocol is supposed to work
16:44 homer5439 I've already seen http://www.gluster.org/community/documentation/images/a/af/Gluster_Wireshark_Niels_de_Vos.pdf, it's quite basic
16:44 glusterbot <http://goo.gl/6a3DMt> (at www.gluster.org)
16:45 kmai007 has anybody used hotcopy for snapshots?
16:45 shylesh joined #gluster
17:04 aliguori joined #gluster
17:05 cyberbootje joined #gluster
17:05 sprachgenerator joined #gluster
17:06 sprachgenerator joined #gluster
17:14 PatNarciso reboot time.
17:16 skered- left #gluster
17:26 elyograg does anyone know whether redhat consulting can provide help with gluster, and how long I should wait for a reply?
17:28 lalatenduM joined #gluster
17:28 NuxRo elyograg: no idea, might be tricky to get help if you are not a proper RedHat Storage customer, but these are just assumptions, give it a try
17:29 elyograg I sent them a message via their website late wednesday night (late in usa, anyway).
17:30 RameshN joined #gluster
17:32 Mo_ joined #gluster
17:32 SpeeR joined #gluster
17:36 bennyturns joined #gluster
17:36 XpineX joined #gluster
17:36 eXeC64 joined #gluster
17:53 bulde joined #gluster
17:59 rwheeler joined #gluster
18:03 davinder joined #gluster
18:05 cfeller can someone explain to me what is going on here, and how to mitigate this (if possible):
18:05 cfeller I've mirrored a few Linux repos here locally, and I want to move these from local disk, to Gluster.  The first of which is Debian Wheezy.
18:05 cfeller I mirrored Debian Wheezy overnight, using the debmirror tool, and then went to update one of my machines this morning using apt, and got errors.
18:05 cfeller the errors I got were 404 not found errors.
18:05 cfeller I ran apt-get update on another machine and saw ths same thing.
18:05 cfeller so I went into the directory structure and saw the file was there.  huh?  So I ran apt-get update a 3rd time, and this time there were no issues.  Looking in the glusterfs logs I see this:
18:06 cfeller [2013-11-08 17:56:00.886263] I [afr-self-heal-entry.c:2253:afr_sh_entry_fix] 0-gv0-replicate-0: /debian/dists/wheezy-updates/non-free/i18n: Performing conservative merge
18:06 cfeller So what happened, and why did it take a miss to fix it?
18:06 cfeller How do I prevent this from happening in the future?
18:06 cfeller (Also, as an aside, why are my glusterfs logs in UTC?)
18:07 calum_ joined #gluster
18:07 cfeller oh, I'm using 3.4.1
18:08 cfeller distribute-replicate, replica 2, 4x2
18:08 semiosis cfeller: did you write data into gluster through a client mount?
18:08 cfeller servers are running RHEL6, glusterfs fuse client and webserver is Fedora 18
18:08 semiosis or did you write directly to bricks
18:08 semiosis (bad)
18:09 cfeller no, through the mount point, using the fuse client.
18:09 cfeller its setup like so:
18:10 cfeller <gluster-server>:gv0 on /mnt/gluster/gv0 type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
18:10 cfeller <gluster-server>:gv0 on /var/www/mirror/debian type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
18:10 cfeller the second mount using bind.
18:10 cfeller as I want to only make a subdirectory structure appear there.
18:11 cfeller my fstab:
18:11 cfeller <gluster-server>:gv0    /mnt/gluster/gv0 glusterfs defaults,_netdev 0 0
18:11 cfeller /mnt/gluster/gv0/debian /var/www/mirror/debian  bind    defaults,bind   0 0
18:11 cfeller and that is on Fedora 18.
18:13 nage joined #gluster
18:15 sprachgenerator joined #gluster
18:22 diegows_ joined #gluster
18:28 kPb_in_ joined #gluster
18:30 rotbeard joined #gluster
18:33 cfeller Also, I ran into this same issue yesterday on my "test" Gluster system.  I had been testing the performance of having a Linux repo on Gluster for a while.
18:33 cfeller I had 3.3.2 on  the "test" system up until two days ago, when I also updated that system to 3.4.1.  The next subsequent debmirror (cron job at 3:00am), caused that same miss to appear on that system.
18:34 andreask joined #gluster
18:34 cfeller so it is reproduceable .
18:34 cfeller but I never bumped into that bug on 3.3.2, and I was testing the Debian mirror on that system for a month.
18:35 cfeller so if this appears to be a bug, and not something else, let me know and I can file a bug.
18:35 glusterbot http://goo.gl/UUuCq
18:36 dbruhn I have never seen this on 3.3.2 which I am running in my environment, I would assume that would be a bug, go ahead and file one, worse case scenario is someone deems it not a bug
18:43 cfeller OK.
18:46 cfeller What would you even title that bug report as?  It is kind of bizzarre behavior to succinctly put in a summary.
18:46 cfeller suggestions?
18:47 glusterbot New news from newglusterbugs: [Bug 1009134] sequential read performance not optimized for libgfapi <http://goo.gl/K8j2w2>
18:51 hateya joined #gluster
18:53 mrfsl joined #gluster
19:01 dbruhn I would say something like "Files missing from mount point that exist after multiple attempts to access them"
19:02 dbruhn Did you say you were setup with DHT and Replication on this volume?
19:02 cfeller yes
19:02 cfeller # gluster volume info
19:02 cfeller
19:02 cfeller Volume Name: gv0
19:02 cfeller Type: Distributed-Replicate
19:02 cfeller Volume ID: a86fbffd-408d-41f9-b2ed-a3816f09d924
19:03 cfeller Status: Started
19:03 cfeller Number of Bricks: 2 x 2 = 4
19:03 cfeller Transport-type: tcp
19:03 cfeller Bricks:
19:03 cfeller Brick1: gluster0:/export/brick0
19:03 cfeller Brick2: gluster1:/export/brick0
19:03 cfeller Brick3: gluster2:/export/brick0
19:03 cfeller Brick4: gluster3:/export/brick0
19:03 dbruhn are you sure all of your bricks/servers are connected at the times these failures are happening
19:03 dbruhn DHT volumes will simply be missing files if they are not connected properly
19:03 dbruhn Sorry a little late to the game on the questions
19:03 cfeller yes. the servers are in the same rack, They are Dell R515 servers, and are connected with an enterprise grade switch.
19:04 dbruhn run gluster volume status
19:04 cfeller # gluster volume status
19:04 cfeller Status of volume: gv0
19:04 cfeller Gluster process                                         Port    Online  Pid
19:04 cfeller ------------------------------------------------------------------------------
19:04 cfeller Brick gluster0:/export/brick0                           49152   Y       1922
19:04 cfeller Brick gluster1:/export/brick0                           49152   Y       1797
19:04 dbruhn I've had things take bricks and servers offline in that same configuration
19:04 cfeller Brick gluster2:/export/brick0                           49152   Y       1787
19:04 cfeller Brick gluster3:/export/brick0                           49152   Y       1776
19:04 cfeller NFS Server on localhost                                 2049    Y       2306
19:04 cfeller Self-heal Daemon on localhost                           N/A     Y       2311
19:04 cfeller NFS Server on gluster3                                  2049    Y       1783
19:04 cfeller Self-heal Daemon on gluster3                            N/A     Y       1787
19:04 cfeller NFS Server on gluster1                                  2049    Y       1803
19:04 cfeller Self-heal Daemon on gluster1                            N/A     Y       1810
19:04 cfeller NFS Server on gluster2                                  2049    Y       1794
19:05 cfeller Self-heal Daemon on gluster2                            N/A     Y       1798
19:05 cfeller
19:05 cfeller There are no active volume tasks
19:05 cfeller sorry for filling the screen guys - I'll use fpaste next time.
19:05 cfeller (coffee malfunction.)
19:05 dbruhn Next time you notice stuff missing try running that command and make sure everything is connected.
19:06 mrfsl Hate to break in here. Looking for some help.
19:06 mrfsl I have three gluster nodes setup as distributed-replicated. After a self-heal I see that I have heal-failed items. Can anyone help me understand what I need to do to remediate these?
19:06 dbruhn mrfs1, are you getting errors when trying to access the files from the mount point?
19:07 mrfsl I see entries such as:
19:07 mrfsl 2013-11-07 03:08:05 <gfid:9953499b-e207-422d-a51a-b8f284518c39>
19:07 dbruhn cfeller, are you getting any errors in the brick logs, or the etc logs, or the mount logs around those times, other than what you listed?
19:13 bulde joined #gluster
19:14 mrfsl Does anyone know... is there a "You have items which failed self-heal... no what?" style paper somewhere?
19:16 mrfsl or what these gfid items actually are and how they reference, etc.
19:17 cfeller dbruhn: Yes.  I ran a couple of manually sync's yesterday to try to debug what was going on.  I noticed this during one of the syncs: http://ur1.ca/g048y
19:17 glusterbot Title: #52754 Fedora Project Pastebin (at ur1.ca)
19:17 semiosis mrfsl: maybe the ,,(gfid resolver) will help
19:17 cfeller which is one of the files I was having problems with yesterday.
19:17 glusterbot mrfsl: https://gist.github.com/4392640
19:18 mrfsl thank you. I will start there.
19:19 semiosis also see ,,(self heal)
19:19 glusterbot I do not know about 'self heal', but I do know about these similar topics: 'targeted self heal'
19:19 semiosis hmm
19:20 semiosis http://joejulian.name/blog/what-is-this-new-glusterfs-directory-in-33/
19:20 glusterbot <http://goo.gl/j981n> (at joejulian.name)
19:20 semiosis http://joejulian.name/blog/fixing-split-brain-with-glusterfs-33/
19:20 glusterbot <http://goo.gl/FPFUX> (at joejulian.name)
19:20 dbruhn somethings not allowing you to set the attributes from the logs
19:20 dbruhn [dht-linkfile.c:213:dht_linkfile_setattr_cbk] 0-gv0-dht: setattr of uid/gid on /debian/dists/wheezy-updates/main/i18n/Translation-en.bz2 :<gfid:00000000-0000-0000-0000-000000000000> failed (Invalid argument)
19:21 dbruhn what file system are you running on your bricks?
19:21 cfeller xfs
19:21 semiosis selinux?
19:21 dbruhn semiosis, I think he is on debian from what I got earlier
19:21 semiosis ah
19:21 cfeller yes, enforcing. on the bricks and the client.
19:21 dbruhn not sure is selinux is on debian
19:22 cfeller server nodes on RHEL6, client is Fedora 18.
19:22 dbruhn Ahh, yeah selinux doesn't play well out of the box
19:22 cfeller I have a couple of legacy Debian servers that I am using this repo for.
19:22 cfeller I had to set this:
19:22 cfeller setsebool -P httpd_use_fusefs 1
19:22 cfeller for the server to play nice with gluster, and that worked fine while I was on 3.3.2
19:23 cfeller (for the *web* server to play nice with the gluster mount that is.)
19:23 dbruhn most people disable selinux on the gluster servers
19:24 dbruhn i'll be honest I am not familiar with it enough to tell you how to configure it for the servers
19:26 dbruhn but I do know it will block things from writing to the extended attributes
19:26 cfeller hmm...
19:29 cfeller wonder if there was there a change between 3.3.2 and 3.4.1 that selinux doesn't like, or something else.
19:29 cfeller grepping for the files in question in /var/log/audit.log on the servers doesn't turn anything up.
19:29 cfeller but that does give me something else to look at.
19:38 rotbeard joined #gluster
19:38 mrfsl @semiosis - thanks, I feel more educated on the problem
19:38 mrfsl but I don't know where to begin with the resulution
19:38 mrfsl I am facing this: http://ur1.ca/g04bi
19:38 glusterbot Title: #52756 Fedora Project Pastebin (at ur1.ca)
19:39 dbruhn mrfs1, find the file in question, and then use the split-brain script to fix it
19:39 mrfsl these don't show up with a "info split-brain" --- but with a "info heal-failed"
19:39 mrfsl does that matter?
19:42 dbruhn have you tried accessing the files in question?
19:42 mrfsl and split-brain script?
19:42 dbruhn http://joejulian.name/blog/fixing-split-brain-with-glusterfs-33/
19:42 glusterbot <http://goo.gl/FPFUX> (at joejulian.name)
19:42 dbruhn if the file really is messed up, typically you will get an input output error when trying to open it or modify it
19:42 hateya joined #gluster
19:43 mrfsl given the nature of the files it would be almost impossible for me to tell
19:43 dbruhn you can't open then file?
19:44 mrfsl Is there anyway to get additional information as to why these gfid files failed self-heal?
19:44 mrfsl no I would not be able to open them
19:44 mrfsl they are binary segments of files
19:44 mrfsl think large files split and encrypted
19:44 dbruhn just try and copy one of them
19:44 dbruhn it will throw the error
19:44 dbruhn what are you running on top of this?
19:45 dbruhn if there is really an issue
19:45 mrfsl I ran the gfid script and it hangs at 784155d5-e148-4730-b6f9-1f63ab20becc==File:
19:47 davidbierce In an enviornment where there are 100s of VMs running on a cluster, is it safe to confiured the node timeout low to like 2 seconds so writes don't stall for 45 seconds waiting for a crashed node to timeout?
19:47 dbruhn mrfs1, does it do that for all the files?
19:48 mrfsl Going through them now
19:48 mrfsl so far yes it does
19:48 dbruhn also those screens that show the self-heal stuff are the logs, so there can be old stuff there
19:48 mrfsl just says file: ---- and hangs
19:48 dbruhn are you seeing any other errors in your etc or mnt logs?
19:49 mrfsl assuming I am running this script correctly:
19:49 mrfsl ./gfid-resolver.sh /gluster/vol0/C1 5e291b0f-68dc-4018-b969-6ba8df434c68
19:50 mrfsl @dbruhn - errors in the logs on the client or on the server?
19:50 mrfsl Pardon my newbie-ness
19:50 dbruhn mnt is the client log I believe and etc is the server log
19:50 dbruhn you'll have to forgive me, my only clients are my servers so I get them mixed up
19:50 dbruhn mrfs1, are you checking each brick?
19:51 dbruhn assuming C1 is your brick?
19:51 dbruhn sorry, I mean checking the correct brick
19:52 mrfsl yes. If I pick the wrong path I get:
19:52 mrfsl ls: cannot access /gluster/vol0/C1/.glusterfs/20/ec/20ecfe......
19:53 cfeller dbruhn, semiosis: https://bugzilla.redhat.com/show_bug.cgi?id=1028582
19:53 glusterbot <http://goo.gl/jz4P4t> (at bugzilla.redhat.com)
19:53 glusterbot Bug 1028582: unspecified, unspecified, ---, amarts, NEW , GlusterFS files missing randomly - the miss triggers a self heal, then files appear.
19:54 cfeller (I'll keep looking into that and report any updates there - thanks for your help today.)
19:55 mrfsl This doesn't return anything useful:
19:55 mrfsl grep -iE 'error|warn' etc-glusterfs-glusterd.vol.log
19:56 dbruhn mrfs1, this is the output that script should output gfid-resolver.sh /var/brick20/ 3ee33657-db2a-41f6-bf00-1437b1b8781f
19:56 dbruhn 3ee33657-db2a-41f6-bf00-1437b1b8781f==Directory:/var/brick20/root/cluster
19:57 mrfsl Mine just says File: (then a tab) then a blinking cursor
19:57 elyograg mrfsl: If you were to do this instead, you might get something more useful:  egrep " E | W " etc-glusterfs-glusterd.vol.log
19:58 mrfsl I see from the script that this is calling a find command
19:58 mrfsl I have millions of files... perhaps I am not letting it run long enough.
19:59 dbruhn I am running it on my system right now with 40 million files and it didn't take very long
20:00 dbruhn it's only searching the .glusterfs at the root of the brick
20:09 dbruhn mrfsl, what's in the glustershd.log
20:13 neofob_home joined #gluster
20:16 mrfsl I have a lot of  inode link failed on the inode (00000000-0000-0000-0000-000000000000) from a night ago
20:16 mrfsl when the self-heal ran
20:18 glusterbot New news from newglusterbugs: [Bug 1028582] GlusterFS files missing randomly - the miss triggers a self heal, then files appear. <http://goo.gl/jz4P4t>
20:18 mrfsl When I search the log for a specific gfid that I see in the self-heal I get two entries
20:19 mrfsl remote operation failed: No such file or directory. Path: <gfid:9953499b-e207-422d-a51a-b8f284518c39> (00000000-0000-0000-0000-000000000000)
20:19 mrfsl open of <gfid:9953499b-e207-422d-a51a-b8f284518c39> failed on child vol0-client-11 (No such file or directory)
20:30 mrfsl no idea huh?
20:37 sprachgenerator joined #gluster
20:43 dbruhn sorry, was afk, I have my own split brain nightmare I am working on as well
20:44 dbruhn are you getting any output from the grid resolver
20:44 dbruhn or is your application on top of this giving you any feed back to files having issues?
20:59 mrfsl haven't heard any complaints yet
21:00 mrfsl perhaps the devs can shed some light. It seems like the heal-failed needs some elaboration. Like... "failed because..."
21:00 mrfsl I don't know if I have a problem or not
21:00 mrfsl I only know that I have heal-faileds
21:00 mrfsl and I don't know how to remediate it
21:07 davidbierce joined #gluster
21:12 dbruhn here is a question, will the gfid of a file be the same on a different brick?
21:13 mrfsl interesting
21:13 mrfsl let me do some investigation
21:13 dbruhn sorry, that was for my own purpose, I have a split brain nightmare going on with one of my systems
21:14 dbruhn mrfs1, remedy is easy once you figure out which file it is
21:14 dbruhn mrfs1 I know in my application stack we scan the files, and the application provides feedback on the errors
21:16 kmai007 does anybody have doc on how to restore files from the .glusterfs directory in fail node scenario?
21:17 semiosis use replication if you care about surviving a fail node scenario
21:18 glusterbot New news from newglusterbugs: [Bug 1028582] GlusterFS files missing randomly - the miss triggers a self heal, then missing files appear. <http://goo.gl/jz4P4t>
21:19 dbruhn semiosis have you ever seem a split-brain issue on a directory?
21:21 semiosis sure
21:22 semiosis i've seen things man, and stuff
21:22 kmai007 dbruhn http://www.joejulian.name/blog/fixing-split-brain-with-glusterfs-33/
21:22 glusterbot <http://goo.gl/FzjC6> (at www.joejulian.name)
21:22 semiosis lots of people get into a split-brian on /, the brick root dir, having to do with a perms, owners, timestamp, or mode mismatch
21:22 dbruhn kmai007, I've fixed plenty of split brain.... but directories can't really be split...
21:23 semiosis sure they can
21:23 dbruhn yeah I've been there, not seeing it on /
21:23 dbruhn ok true
21:23 semiosis also entries, the content of that dir (not deep content, just immediate), can also be split brain
21:24 semiosis though less likely, because there are more resolution strategies (merge) available to gluster
21:24 kmai007 i can't use replication becaue I need it to scale....so i figured distr. replica would be a better fit
21:25 semiosis dist-repl uses replicaiton
21:25 semiosis it's replication, done N times
21:25 kmai007 ok cornfused
21:26 semiosis the .glusterfs is necessary to heal the filesystem, but not sufficient.
21:26 JoeJulian I should update that blog entry for directories. When I get a split-brain directory, I just zero out the trusted.afr entries for all but the one I've chosen as "clean".
21:26 dbruhn crap that's right, I went through this a couple weeks ago on that
21:27 dbruhn thanks for the reminder JoeJulian
21:28 cfeller question... I just stumbled across this: http://gluster.org/community/documentation/index.php/Gluster_3.1:_Configuring_Time_Settings
21:28 JoeJulian honestly, I'm doing the same for files now. Most of the time they heal faster that way.
21:28 glusterbot <http://goo.gl/TGZsMw> (at gluster.org)
21:28 cfeller I'm not a GUI dependent guy, but what is that screenshot of?
21:28 semiosis kmai007: distributed-replicated volumes mean take N replica sets and distribute files evenly between them.  if you lose a brick in one of the replica sets, then you can replace it and its content will be replaced by pulling from replicas in its set
21:28 semiosis kmai007: the files don't get restored from the .glusterfs directory, they get restored from another brick
21:29 semiosis cfeller: ,,(gluster sp)
21:29 glusterbot semiosis: Error: No factoid matches that key.
21:29 kmai007 thank you for clearing that up
21:29 semiosis yw, hth
21:29 m0zes joined #gluster
21:30 jbd1 joined #gluster
21:31 semiosis @discontinued
21:31 semiosis @search discontinued
21:31 glusterbot semiosis: There were no matching configuration variables.
21:31 semiosis @factoid search discontinued
21:31 semiosis meh
21:31 semiosis cfeller: never mind that GUI, it's long gone
21:32 cfeller OK.  I stumbled across that to trying to find out how to get gluster to log using the system timezone, not UTC.
21:32 kmai007 when should i use the 'volume sync <hostname> all' command?
21:32 dbruhn getfattr /path/to/directory should output the extended attributes form the brick right?
21:34 jbd1 joined #gluster
21:37 cfeller semiosis: is there a way to set the logging timezone from the CLI?  most of the docs on gluster.org seem to reference that GUI (at least using my search params).
21:38 cfeller it is recording all logs in UTC, an my system TZ is set to something else.
21:39 jbd1 cfeller: if you're in a place that has daylight savings time, you may wish to leave the logs on UTC, as it's pretty confusing to have 1am come after 1:59am (or a hole between 1:59am and 3am) twice a year
21:41 dbruhn it's less confusing than having to do the math on times when trying to correlate log events...
21:42 cfeller exactly.
21:42 skered- joined #gluster
21:42 kmai007 i tried, i got the same answer
21:42 kmai007 tough.....
21:43 dbruhn put in a feature request
21:43 kmai007 i was told the reason why it is set to UTC
21:43 skered- Anyone using ctdb and gluster?  If so do you know how ctdb detects a failing node? ping?
21:43 failshell joined #gluster
21:43 kmai007 is because maybe a gluster cluster could be half way around the world and it wouldn't work well trying to figure out what logs is in what time zone...
21:44 kmai007 BUT
21:44 cfeller well I found this: https://bugzilla.redhat.com/show_bug.cgi?id=958062 for an older version of RHS.
21:44 cfeller ...but if the GUI could set the timezone it has to be exposed in the API somewhere?
21:44 glusterbot <http://goo.gl/kcQ8hp> (at bugzilla.redhat.com)
21:44 glusterbot Bug 958062: high, medium, ---, rhs-bugs, ASSIGNED , logging: date mismatch in glusterfs logging and system "date"
21:44 kmai007 if you find a solution please let me know, b/c i beat my head on the wall trying to make sense of it at 2AM.
21:44 mrfsl left #gluster
21:45 kmai007 JoeJulian, so i ran a 'volume heal info"
21:46 kmai007 and 1 of the bricks returnt his
21:46 kmai007 Brick omdx1445:/sapdev/BI
21:46 kmai007 Number of entries: 141
21:46 kmai007 <gfid:aed9dce3-671f-4397-95f6-e0b718d58d09>
21:46 kmai007 with a list,
21:46 kmai007 what actions am I suppose to take?
21:48 kmai007 i suppose i'll read this again http://www.joejulian.name/blog/what-is-this-new-glusterfs-directory-in-33/
21:48 glusterbot <http://goo.gl/wyiQQO> (at www.joejulian.name)
21:48 diegows_ joined #gluster
21:51 cfeller kmai007: absolutely.
21:52 semiosis imho all server logs should be UTC
21:52 semiosis or as i like to think of it, "server time"
21:52 kmai007 cfeller and thats the guy that schooled me
21:52 kmai007 i just accepted the UTC
21:52 kmai007 i took the blue pill
21:53 JoeJulian Yes, all logs should be in UTC. Your log aggregation display tool (kibana) should be the thing that converts your logs to your local time.
21:54 pravka joined #gluster
21:54 pravka hey all, anyone in here have experience w/ OpenStack cinder + glusterfs?
21:55 * JoeJulian isn't using cinder yet.
21:55 pravka I'm seeing some weird behavior that I'm fairly certain is a bug in cinder, but I wanted to see if anyone had experienced it as well
21:57 pravka we have cinder's shares.conf configured with gluster-host-1:gluster-vol -o backupvolfile=gluster-host-2, but if we down host-1, cinder doesn't failover to host-2
21:57 pravka (other mounted gluster volumes do failover, so we know that's working)
21:58 JoeJulian So it fails to mount?
21:58 pravka yes
21:58 semiosis there should be a client log file written by the cinder client somewhere
21:59 dbruhn well that's weird trusted.afr isn't set on two of the bricks for this directory
21:59 pravka there is. I'm getting it for you.
21:59 semiosis pastie.org please :)
21:59 failshell joined #gluster
22:00 JoeJulian dbruhn: It's not always set for directories.
22:01 dbruhn JoeJulian, maybe you can help me make some sense of why I am getting a split-brain on this directory then?
22:01 dbruhn http://fpaste.org/52823/39480471/
22:01 glusterbot Title: #52823 Fedora Project Pastebin (at fpaste.org)
22:01 dbruhn you'll see on .5 and .6 are the only ones that don't match
22:02 pravka semiosis: http://pastie.org/8466503
22:02 glusterbot Title: #8466503 - Pastie (at pastie.org)
22:03 dbruhn but I am sure the permissions, owner, and all of that are the same
22:03 pravka semiosis: note, cinder does not seem to be respecting backupvolfile, and is instead using the primary gluster host when it attempts to mount
22:03 SpeeR I'm using an NFS mount to store elasticsearch data, after a few minutes, I'm not able to ls the directory, anyone else have this type of issue? running 3.4 on a 4brick 20*2TB w/Areca raid  array
22:04 semiosis pravka: is there more above the part of a58f9221f3896f9324f1011ea3ce2264.log that you pasted?  a dump of the config?  if there is, please include from that until the end, otherwise please ,,(pasteinfo)
22:04 glusterbot pravka: Please paste the output of "gluster volume info" to http://fpaste.org or http://dpaste.org then paste the link that's generated here.
22:04 semiosis pravka: we (community) usually recommend using ,,(rrdns) instead of the backupvolfile option
22:04 glusterbot pravka: You can use rrdns to allow failover for mounting your volume. See Joe's tutorial: http://goo.gl/ktI6p
22:05 pravka semiosis: there is more, lemme get it for you. :)
22:06 pravka fwiw, this is the first I've heard of rrdns instead of backupvolfile -- is there a reason the latter's still being documented on glusterfs?
22:06 pravka *gluster.org?
22:06 semiosis idk
22:06 pravka :)
22:07 pravka 'gluster volume info': http://fpaste.org/52826/94845413/
22:07 glusterbot Title: #52826 Fedora Project Pastebin (at fpaste.org)
22:11 pravka I actually think I may've found it
22:11 pravka semiosis: [2013-11-08 21:17:39.586439] I [client-handshake.c:1468:client_setvolume_cbk] 0-gv-cinder-client-1: Server and Client lk-version numbers are not same, reopening the fds
22:11 glusterbot pravka: This is normal behavior and can safely be ignored.
22:11 pravka :)
22:12 pravka man, I love glusterbot
22:12 semiosis hahaha
22:13 pravka what's odd is that's when it seems to retry to mount it using gluster-host-1
22:13 pravka lemme get a paste for you semiosis
22:14 pravka semiosis: http://fpaste.org/52828/94885413/
22:14 glusterbot Title: #52828 Fedora Project Pastebin (at fpaste.org)
22:23 kmai007 joined #gluster
22:38 SpeeR is this a normal error? http://pastie.org/8466585
22:38 glusterbot Title: #8466585 - Pastie (at pastie.org)
22:40 dbruhn joined #gluster
22:41 JoeJulian SpeeR: I've never seen that, but then again I don't use NFS so I never would. Nobody's reported that here that I've seen. Does it cause a problem?
22:41 SpeeR Unknown, I'm having issues with directory listings, on my elasticsearch shares
22:42 SpeeR So, I'm trying to find where the issue might be
22:42 JoeJulian And we're avoiding the obvious question about elasticsearch being a self-clustering system and hosting it on clustered storage, right?
22:43 SpeeR clusters in clusters isn't good?
22:43 JoeJulian It's fine if you know why you're doing it. Overkill otherwise.
22:43 JoeJulian But that's all opinion.
22:44 Rav_ joined #gluster
23:20 elyograg for me, elasticsearch is the enemy.  I do Solr. :)  I would never put a search index on gluster.
23:26 nage elyograg: could you elaborate on why that is a bad idea
23:26 chriswonders joined #gluster
23:28 nage ?
23:28 chriswonders joined #gluster
23:29 kPb_in joined #gluster
23:30 nage (putting search indexes on gluster)
23:31 davidbierce joined #gluster
23:32 davidbie_ joined #gluster
23:45 chriswonders Is there any roadmap that anyone is aware of for Gluster having a proper Mac client (FUSE or otherwise)?
23:46 JoeJulian Can't a mac connect via NFS?
23:46 chriswonders It can but the last 3 versions of OS X have been particularly troublesome with NFS
23:46 JoeJulian The entire philosophy behind the Mac OS is that you give up features and control to Apple, who chooses what's best for you to ensure that you'll have completely trouble free use of their computer.
23:47 chriswonders I'm still evaluating Gluster's NFS with Mac but since I'm already hitting a few walls I thought I'd see if anyone knew about a FUSE client returning at some point.
23:47 JoeJulian If it were me, I'd install fedora on it.
23:48 chriswonders :) I'd love to. The industrial software we're using in this case is Mac-specific
23:48 JoeJulian Does http://osxfuse.github.io/ work?
23:48 glusterbot Title: Home - FUSE for OS X (at osxfuse.github.io)
23:49 chriswonders It does indeed but I couldn't find a Gluster component for it
23:50 JoeJulian I guess we need a Mac fan who can build packages for it....
23:50 chriswonders And any Google searching I do leads me to reports of there not being a current compatible OS X client for FUSE anymore
23:51 chriswonders This is the most I've found on the Gluster site http://www.gluster.org/community/documentation/index.php/Using_Gluster_with_OSX
23:51 glusterbot <http://goo.gl/gT2aoE> (at www.gluster.org)
23:51 JoeJulian If you can retrieve the source and compile it, it should work (in theory).
23:56 chriswonders I'll give it a go

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary