Camelia, the Perl 6 bug

IRC log for #gluster, 2013-05-07

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:04 cfeller semiosis: as for the remaining latency over 'ls', It looks like Apache is trying to be too smart. For every filename, it is looking for multiple files of the same name with .php .html, .htm, .php, .xhtml, .cgi, .pl file extensions.  Why it is doing that, I have no idea, but I'm seeing lstat() and stat() calls for those files. So I need to find a way to tell apache to just list the darn...
00:04 cfeller ...directories. 7 times more reads doesn't seem like it would  _completely_ account for the extra latency I'm seeing over 'ls', but it is something.
00:09 JoeJulian misses are expensive, so I woudln't be surprised.
00:12 robos joined #gluster
00:13 cfeller good point.
00:26 robos left #gluster
00:46 vpshastry joined #gluster
01:00 chirino_m joined #gluster
01:09 thomasle_ joined #gluster
01:12 nocterro joined #gluster
01:25 theron joined #gluster
01:40 harish joined #gluster
01:42 fidevo joined #gluster
01:44 kevein joined #gluster
02:03 theron joined #gluster
02:04 georgeh|workstat joined #gluster
02:22 glusterbot New news from newglusterbugs: [Bug 958781] KVM guest I/O errors with xfs backed gluster volumes <http://goo.gl/4Goa9>
02:24 nickw joined #gluster
02:38 vshankar joined #gluster
03:14 Supermathie JoeJulian: Does performance.client-io-threads affect NFS? I thought that'd only affect the fuse client. Think that might let me use >1 CPU?
03:17 Supermathie JoeJulian: I had tried setting "volume set gv0 performance.nfs.io-threads 8" but that just made the nfs server fail
03:33 JoeJulian Supermathie: I don't really think so, but it's easy to try.
03:38 isomorphic joined #gluster
04:01 sgowda joined #gluster
04:02 bala1 joined #gluster
04:03 rastar joined #gluster
04:04 inevity joined #gluster
04:08 theron joined #gluster
04:11 sjoeboo_ joined #gluster
04:12 vpshastry joined #gluster
04:18 saurabh joined #gluster
04:18 shylesh joined #gluster
04:22 lalatenduM joined #gluster
04:26 rastar1 joined #gluster
04:27 bharata joined #gluster
04:31 Susant joined #gluster
04:39 deepakcs joined #gluster
04:40 hchiramm_ joined #gluster
04:44 bala1 joined #gluster
04:49 mohankumar__ joined #gluster
05:01 piotrektt joined #gluster
05:03 satheesh joined #gluster
05:11 vpshastry1 joined #gluster
05:30 rgustafs joined #gluster
05:39 y4m4 joined #gluster
05:43 pithagorians joined #gluster
05:44 pithagorians hi. where can i find more  details about how glusterfs failover is done?
05:45 pithagorians i had a situation - 2 nodes in replica mode, the second one had a bad hdd and the entire cluster didn't work
05:51 sjoeboo_ joined #gluster
05:55 aravindavk joined #gluster
05:56 y4m4 joined #gluster
06:03 nicolasw joined #gluster
06:08 rastar joined #gluster
06:11 shireesh joined #gluster
06:18 GLHMarmot joined #gluster
06:20 jtux joined #gluster
06:31 satheesh joined #gluster
06:31 yinyin joined #gluster
06:37 kevein joined #gluster
06:44 ngoswami joined #gluster
06:45 samppah @latest
06:45 glusterbot samppah: The latest version is available at http://goo.gl/zO0Fa . There is a .repo file for yum or see @ppa for ubuntu.
07:06 ctria joined #gluster
07:09 ingard_ joined #gluster
07:11 guigui1 joined #gluster
07:13 ehg joined #gluster
07:19 pithagorians joined #gluster
07:31 andreask joined #gluster
08:04 dobber_ joined #gluster
08:05 50UACJJX5 joined #gluster
08:10 VSpike joined #gluster
08:11 VSpike Good morning. On Friday evening, I rebooted a VM host which contained 1 out of 3 gluster clients, and one out of two gluster servers. Clients 1 and 3 reference server 1 in /etc/fstab and Client 2 references Server 2. It was both servers #2 that were rebooted.
08:12 VSpike On reboot, client 2 failed to mount its gluster partition, and I tried for a while with no joy before dropping that client from the load balancer and resolving to look at it after the long weekend
08:13 VSpike This morning, client 2 mounts without a problem ... is there some reason why a gluster server would be unresponsive for some time after a reboot?
08:13 VSpike The server was up, connectable to, and running glusterd
08:13 a2 joined #gluster
08:13 VSpike ^connectable to with ssh
08:14 Norky joined #gluster
08:31 spider_fingers joined #gluster
08:38 vpshastry joined #gluster
08:41 hagarth joined #gluster
08:56 vimal joined #gluster
08:57 duerF joined #gluster
09:02 gbrand_ joined #gluster
09:08 manik joined #gluster
09:31 pithagorians hi. where can i find more  details about how glusterfs failover is done?
09:31 pithagorians i had a situation - 2 nodes in replica mode, the second one had a bad hdd and the entire cluster didn't work
10:01 saurabh joined #gluster
10:20 lh joined #gluster
10:21 aravindavk joined #gluster
10:22 shireesh joined #gluster
10:30 hagarth joined #gluster
10:32 edward1 joined #gluster
10:48 bala joined #gluster
10:54 kkeithley1 joined #gluster
10:56 jtux joined #gluster
10:57 spider_fingers joined #gluster
11:00 hagarth1 joined #gluster
11:00 saurabh joined #gluster
11:11 chirino joined #gluster
11:19 bala joined #gluster
11:20 theron joined #gluster
11:21 shireesh joined #gluster
11:22 rotbeard joined #gluster
11:24 glusterbot New news from newglusterbugs: [Bug 952693] 3.4 Beta1 Tracker <http://goo.gl/DRzjx>
11:30 aliguori joined #gluster
11:41 inevity joined #gluster
11:44 inevity joined #gluster
11:45 dustint joined #gluster
11:47 inevity how to trouble low write? before gluster write normal,now the throgput become.lower.
11:48 inevity troubleshoot
11:50 nicolasw joined #gluster
11:50 inevity use distru-repli volume.many small files write.at first 30Mb,now10Mb
11:55 spider_fingers joined #gluster
12:00 bala joined #gluster
12:09 aravindavk joined #gluster
12:19 fleducquede joined #gluster
12:19 vpshastry1 joined #gluster
12:29 karoshi joined #gluster
12:30 Susant left #gluster
12:35 ninkotech__ joined #gluster
12:41 inevity joined #gluster
12:48 bala joined #gluster
12:50 inevity joined #gluster
12:54 glusterbot New news from newglusterbugs: [Bug 948086] concurrent ls on NFS results in inconsistent views <http://goo.gl/Z5tXN>
12:56 inevity2 joined #gluster
13:02 aliguori joined #gluster
13:09 ingard_ hi guys
13:09 ingard_ did anyone here upgrade from 3.0.5 to latest?
13:09 ingard_ does anyone know if i need to go to 3.1 or 3.2 first and then to 3.3 ?
13:24 baul joined #gluster
13:26 hchiramm_ joined #gluster
13:34 andrewjs1edge joined #gluster
13:43 bugs_ joined #gluster
13:48 nueces joined #gluster
13:52 bennyturns joined #gluster
14:13 deepakcs joined #gluster
14:20 Supermathie "[2013-05-07 10:19:07.482143] E [afr-self-heal-data.c:763:afr​_sh_data_fxattrop_fstat_done] 0-gv0-replicate-5: Unable to self-heal contents of '/fleming2/db0/ALTUS_config/diagnostics/diag/​rdbms/altus/ALTUS/trace/ALTUS_ora_15385.trm' (possible split-brain). Please delete the file from all but the preferred subvolume." There needs to be a more automatic way of doing so - delete the named file and gluster just remakes it from the .gluster/* hardli
14:22 vpshastry joined #gluster
14:30 sander^work joined #gluster
14:37 smellis joined #gluster
14:41 MattRM joined #gluster
14:42 MattRM Hello.
14:42 glusterbot MattRM: Despite the fact that friendly greetings are nice, please ask your question. Carefully identify your problem in such a way that when a volunteer has a few minutes, they can offer you a potential solution. These are volunteers, so be patient. Answers may come in a few minutes, or may take hours. If you're still in the channel, someone will eventually offer an answer.
14:44 daMaestro joined #gluster
14:50 jbrooks joined #gluster
14:50 bala joined #gluster
14:57 vpshastry joined #gluster
15:01 jthorne joined #gluster
15:02 daMaestro joined #gluster
15:03 vpshastry left #gluster
15:09 spider_fingers joined #gluster
15:24 zaitcev joined #gluster
15:25 zaitcev_ joined #gluster
15:34 aliguori joined #gluster
15:34 aliguori joined #gluster
15:41 avati joined #gluster
15:46 bharata joined #gluster
15:48 tjstansell1 joined #gluster
15:48 tjstansell1 anyone have any idea why a gluster volume might show heal info with 1 entry of just "/" ?
15:49 a2_ joined #gluster
15:50 manik joined #gluster
15:51 tjstansell1 i see this in glustershd.log:
15:51 tjstansell1 [2013-05-07 15:47:52.703936] E [afr-self-heald.c:685:_link_inode_update_loc] 0-gvdata-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000)
15:55 glusterbot New news from newglusterbugs: [Bug 917686] Self-healing does not physically replicate content of new file/dir <http://goo.gl/O8NMq>
16:00 bchilds i have this in my sudoers "mapred ALL= NOPASSWD: /usr/bin/getfattr", is that safe? i'm wanting to run getfattr on files with non root user mapred
16:02 Supermathie bchilds: "safe"... certain getfattr commands run on fuse-mounted gluster volumes will wedge the getfattr program, other than that, should be ok.
16:02 Supermathie I don't think it wedges the mount... can't remember.
16:04 Supermathie JoeJulian: I turned on performance.client-io-threads but it didn't affect the nfs volume definition
16:07 Ramereth joined #gluster
16:08 bulde joined #gluster
16:10 piotrektt joined #gluster
16:32 pithagorians joined #gluster
16:33 Mo__ joined #gluster
16:38 JoeJulian toddstansell: Check .glusterfs/00/00/00000000-0​000-0000-0000-000000000001 on your bricks. It should be a symlink. If it's a directory, I think that can cause that error. Delete the erroneous directory and it should self-heal.
16:41 thomaslee joined #gluster
17:04 chirino_m joined #gluster
17:29 bulde joined #gluster
17:38 madphoenix joined #gluster
17:39 madphoenix hi all, i'm hoping somebody can enlighten me on volume performance settings.  does performance.cache_size apply at a per-brick level, or at the volume level?
17:47 jclift joined #gluster
17:50 ingard_ 13:12 < ingard_> did anyone here upgrade from 3.0.5 to latest?
17:50 ingard_ 13:12 < ingard_> does anyone know if i need to go to 3.1 or 3.2 first and then to 3.3 ?
17:50 ingard_ anyone?
18:00 Supermathie ingard_: http://gluster.org/community/documentation/​index.php/Gluster_3.0_to_3.2_Upgrade_Guide
18:00 glusterbot <http://goo.gl/uCUfu> (at gluster.org)
18:00 Supermathie ingard_: http://vbellur.wordpress.com/2012/​05/31/upgrading-to-glusterfs-3-3/
18:00 glusterbot <http://goo.gl/qOiO7> (at vbellur.wordpress.com)
18:01 kkeithley| I'm doubtful than very many people have jumped straight from 3.0.x to 3.3.x. Volfiles have changed. I think between 3.0->3.1 and again from 3.1->3.2.   3.2->3.3 the volfiles haven't changed but the on-disk layout changed.
18:10 jurrien joined #gluster
18:18 ingard_ yes i know this much
18:18 ingard_ so the question is which approach would be better
18:19 ingard_ i've got to upgrade 80 storage boxes and 20-30 clients
18:19 ingard_ its a major pain in the butt
18:20 H__ i just went through that with less boxes
18:20 H__ the 3.2.5->3.3.1 hop i mean
18:21 Supermathie ingard_: maintenance window, backups, clusterssh, restore
18:21 montyz1 joined #gluster
18:21 ingard_ backups? for 30 storage boxes
18:21 ingard_ er 80*
18:21 H__ gluster config backups
18:21 H__ not the data, i hope
18:21 Supermathie Well, within the constraints of what you've given us to work with, yes ;)
18:22 ingard_ right, no because that would take the rest of my lifetime
18:22 H__ also to me, 3.3 is a lot slower than 3.2
18:22 H__ for setups with many files
18:23 ingard_ we've got a shit ton of files
18:23 ingard_ :>
18:23 ingard_ anyway
18:23 H__ how many ? 1M 10 M 100 M 1000 M ?
18:23 ingard_ Supermathie: i've been through both of those blog posts and then some
18:23 montyz1 I'm new to gluster and doing some profiling.  %-latency Avg-latency Min-Latency Max-Latency                calls        Fop
18:23 montyz1 14.44      513.62       25.00     1122.00                 1549   READDIRP
18:23 montyz1 79.65      131.79       40.00    13655.00                33305     LOOKUP
18:23 montyz1 is latency measured in milliseconds?
18:23 ingard_ H__: i dunno. we've got close to 5 petabyte
18:24 ingard_ of data
18:24 montyz1 what is a LOOKUP Fob?
18:24 H__ ingard_: that says nothing about the amount of directories and files to me
18:26 ingard_ H__: depends i guess
18:27 semiosis montyz1: my lay understanding is that a lookup is done before opening a file, to find the file location on disk or in this case in the cluster
18:28 Supermathie montyz1: Procedure LOOKUP searches a directory for a specific name and returns the file handle for the corresponding file system object.
18:29 ingard_ Make sure to mount the volume from only one client (not nfs mount) and also all the bricks are up and running. Then traverse the whole volume using the following command:
18:29 ingard_ #find /mount/glusterfs >/dev/null
18:29 ingard_ holy crap
18:29 ingard_ that is going to take a while
18:30 montyz1 I don't suppose the fact that most of my bricks are doing large amounts of LOOKUP would indicate misconfiguration, would it?
18:32 Supermathie montyz1: LOOKUP == open() (kind of. also, accessing /glustervol/foo/bar/quux/qax/quack/foo.tst -> 6 LOOKUPs (I think))
18:37 semiosis montyz1: how do you know thats a large amount?
18:37 semiosis how large is large? :)
18:39 montyz1 oh, just by percentage
18:39 montyz1 using the profiler
18:40 montyz1 80% of the latency is due to LOOKUP
18:41 semiosis well i havent used the profiler myself, for all i know that could be normal for your hardware/network
18:42 semiosis bbl
18:42 montyz1 we're running gluster on AWS and I'm trying to figure out if the performance is normal or if we have some configuration error.  We have two gluster servers with 8 bricks each.
18:43 Supermathie montyz1: AHAH DETAILS!
18:44 Supermathie montyz1: http://community.gluster.org/q/for-amazon-ec2-shou​ld-i-use-ebs-should-i-stripe-ebs-volumes-together/
18:44 glusterbot <http://goo.gl/GJzYu> (at community.gluster.org)
18:44 Supermathie "latency will occasionally spike 10-100x the normal latency"
18:48 semiosis i wrote that
18:48 montyz1 yeah, the max latency is ~100x of the avg
18:48 semiosis if that matters, use ebs optimized instances + provisioned iops ebs
18:49 montyz1 where could I read that?  I'm having trouble finding more information about profiling on the gluster site
18:49 semiosis however, for me at least, using glusterfs distribution (many bricks per server, like montyz1) i dont notice the latency spikes
18:49 semiosis my workload is highly parallel though, so if latency spikes it only affects a small pct of my workers
18:50 montyz1 I'll look into the ebs optimized instances & provisioned iops ebs too.
18:51 montyz1 Ultimately I'm trying to figure out why my java application is slow.  I can't tell if glusterfs is performing as expected or not (because I don't know what is reasonable to expect)
18:51 semiosis montyz1: i have to go afk for a bit but will be back in an hour or so. would like to help more later
18:51 montyz1 thanks for your help, if I can narrow it down to something more specific I'll ping back here
18:52 Supermathie montyz1: NFS?
18:53 montyz1 yes, NFS
18:55 montyz1 oh, sorry, I was wrong
18:55 montyz1 no NFS involved
18:56 Supermathie montyz1: I recently learned that a FUSE mount effectively has a queue depth of 1. i.e. SLOWWWWW
18:58 montyz1 I don't think we're doing FUSE
18:58 duerF joined #gluster
18:59 Supermathie montyz1: ? Either you're doing NFS or FUSE or gluster-lib for access....
19:01 montyz1 oh, my sys guy says we are using FUSE
19:06 montyz1 I think some more tests are in order.  I'll see how much faster my app goes if I bypass gluster, and if there is a significant difference I'll see about NFS instead of FUSE.
19:06 montyz1 thanks for your help!
19:06 Supermathie There's a bunch of undocumented optimizations for NFS as well.
19:18 VSpike I have two gluster servers and three clients. On Friday evening I rebooted a VM host which contained server #2 and client #2. Clients 1 and 3 reference server 1 in /etc/fstab and Client 2 references Server 2...
19:18 kushnir-nlm joined #gluster
19:18 VSpike On reboot, client 2 failed to mount its gluster partition, and I tried for a while with no joy before dropping that client from the load balancer and resolving to look at it after the long weekend
19:18 VSpike This morning, client 2 mounted without a problem ... is there some reason why a gluster server would be unresponsive for some time after a reboot?
19:19 VSpike The server was up, connectable to with ssh, and running glusterd
19:19 y4m4 joined #gluster
19:25 kushnir-nlm Hello everyone! I have a pretty straight forward question: I have two RHEL 6 Gluster 3.3.1 servers with a replicated volume. I have 6 web servers as Gluster clients mounting over Fuse. And, I have an haproxy load balancer in front of the web servers. Whenever I shutdown or reboot one of my Gluster servers (it doesn't matter which one), I take about 30-60 seconds of downtime on my web servers.
19:25 kushnir-nlm Is that normal behavior, or is there a proper graceful shutdown method that I am not following?
19:26 glusterbot New news from newglusterbugs: [Bug 960725] nfs-server propogates unnecessary truncate() calls down to storage <http://goo.gl/1RJBs>
19:30 kushnir-nlm Wow. Noone? :)
19:31 Supermathie kushnir-nlm: You could try lowing the brick comm timeout...
19:33 kkeithley| 42 seconds to be precise. Yes, that's normal.
19:34 Supermathie kushnir-nlm: Perhaps lower brick comm timeout to 1s, reboot, then restore it to orig value
19:35 kushnir-nlm Thanks for the replies guys. Are there any down sides to running long-term with a lower timeout?
19:47 Supermathie kushnir-nlm: http://gluster.org/community/documentation/index​.php/Gluster_3.2:_Setting_Volume_Options#network.ping-timeout "This reconnect is a very expensive operation..."
19:48 glusterbot <http://goo.gl/NugxI> (at gluster.org)
19:50 toddstansell JoeJulian: thanks for the reply from 09:38 this morning.  I checked and those are symlinks on both bricks:
19:50 toddstansell lrwxrwxrwx  1 root root  8 May  3 08:54 00000000-0000-0000-0000-000000000001 -> ../../..
19:51 toddstansell pwd
19:51 kushnir-nlm Supermathie: Yep, I saw that. I'm not sure what "expensive" means. Expensive in terms of CPU? IO? My servers are quad cores with 16GB RAM. Backing storage is SSD. Network is 10GbE. Where would I see the expense?
19:52 montyz joined #gluster
19:52 Supermathie kushnir-nlm: No idea :)
19:53 toddstansell hm... interestingly, I wanted to rename some bricks, so i took my 2-node replica to a 4-node adding 2 new bricks, then taking it back down to 2 bricks removing the original 2.
19:53 kushnir-nlm Supermathie: Heh. Ok. Guess I'll let you know in a minute. :)
19:53 toddstansell I see that the root directory of my brick shows trusted.afr.gvdata-client-{0,1,2,3} even though I only have 2 bricks now.
19:53 * Supermathie lowers the network.ping-timeout to 5s on Discourse....
19:55 Supermathie I never considered that before either.
19:55 JoeJulian eww
19:55 Supermathie Running a similar config in one spot.
19:55 JoeJulian @ping-timeout
19:55 glusterbot JoeJulian: The reason for the long (42 second) ping-timeout is because re-establishing fd's and locks can be a very expensive operation. Allowing a longer time to reestablish connections is logical, unless you have servers that frequently die.
19:56 toddstansell one could argue that if servers don't frequently die, you never re-estabish anything and therefore don't have any "expense".
19:56 kushnir-nlm ...or web clients that don't like 500 errors :)
19:56 toddstansell so lowering the timeout simply helps your availability.
19:57 toddstansell fyi, we've lowered ours to 5 seconds as well ... but we just have a 2-node replica setup ... and relatively low requirements (but higher availability requirements)
19:58 pithagorians hi. where can i find more  details about how glusterfs failover is done?
19:58 pithagorians i had a situation - 2 nodes in replica mode, the second one had a bad hdd and the entire cluster didn't work
19:58 JoeJulian lowering timeout doesn't do anything for your availability unless things are expected to die unexpectedly.
19:58 JoeJulian If you're shutting down services, the ping-timeout has no affect.
19:59 toddstansell apparently not the way the centos rpms work... a shutdown causes the other node to hang for 40ish seconds...
19:59 toddstansell so we lowered it to 5 seconds to minimize the impact
19:59 kushnir-nlm JoeJulian: How do I make things die expectedly? Stop glusterd service and then reboot?
20:00 JoeJulian Never does that on mine. The better solution would be to fix your init.
20:00 JoeJulian To die unexpectedly, pull the power plug(s) on your servers.
20:00 toddstansell fix it how? i'd think the stock init would work properly
20:00 JoeJulian It does
20:01 JoeJulian So something else must be broken... Check your K* in the rc.d stuff.
20:01 kushnir-nlm JoeJulian: What about to die -expectedly-? I.e. clean shutdown without hang? Should I stop glusterd service, or glusterd and glusterfsd service, and then reboot server?
20:02 JoeJulian If your network is stopped before glusterfsd or your firewall goes up before that, then it'll look like a hard shutdown.
20:02 JoeJulian kushnir-nlm: stop glusterd. Stop glusterfsd.
20:03 kushnir-nlm JoeJulian: Thanks. Tryin. Will update in a sec.
20:04 jikz joined #gluster
20:04 JoeJulian In centos, /etc/rc.d/rc0.d has K80glusterd, K80glusterfsd, K90network, K92iptables. That works correctly in my installations. Do you have something different?
20:06 andreask joined #gluster
20:06 JoeJulian btw... I do take servers down mid-day occasionally for kernel updates and the like. Not even a flutter in availability.
20:07 JoeJulian Now, otoh, when I was moving clients from one switch to another temporarily for a firmware upgrade on the switch, I have dropped cables and taken longer than 5 seconds to get them plugged back in.
20:08 kushnir-nlm On RHEL 6, having installed 3.3.1 RPMS from http://download.gluster.org/pub/gluster/glu​sterfs/3.3/3.3.1/RHEL/epel-6Server/x86_64/, I have no K*glusterd, I have K80glusterfsd, K84network
20:08 glusterbot <http://goo.gl/ARb7i> (at download.gluster.org)
20:09 JoeJulian I suspect, then, that you also have no glusterd in /etc/rc.d/rc3.d
20:09 kushnir-nlm Correct. So, those are probably missing from that RPM set...
20:10 JoeJulian Regardless, though, the glusterfsd ,,(processes) are the bricks that are relevant to the ping-timeout discussion.
20:10 glusterbot the GlusterFS core uses three process names: glusterd (management daemon, one per server); glusterfsd (brick export daemon, one per brick); glusterfs (FUSE client, one per client mount point; also NFS daemon, one per server). There are also two auxiliary processes: gsyncd (for geo-replication) and glustershd (for automatic self-heal). See http://goo.gl/hJBvL for more information.
20:10 JoeJulian kushnir-nlm: No, it's just that you haven't set glusterd to start on boot. chkconfig glusterd on
20:11 toddstansell i see K80glusterd, K90network, K92iptables
20:11 kushnir-nlm JoeJulian: I have S20glusterd, K80glusterfsd
20:12 petan joined #gluster
20:12 toddstansell I don't see a separate K80glusterfsd... but I see K80glusterd kills glusterfsd too
20:12 toddstansell this is 3.3.2qa1
20:12 kushnir-nlm JoeJulian: Having shut down glusterd and glusterfsd and then started only glusterd, I see that glusterd seems to start glusterfsd processes.
20:12 JoeJulian Ah, ok.
20:12 JoeJulian I wasn't aware that kkiethley changed that for 3.3.2
20:13 JoeJulian So it could be a bug.
20:13 JoeJulian I'm kind of in the middle of something here. Would you mind fpasting the init script so I can give it a quick look?
20:13 Supermathie JoeJulian: The init.d file provided in 3.3.1 only has the "# chkconfig: 35 20 80" line. It has no lines that tell chkconfig when to stop glusterd
20:13 kushnir-nlm JoeJulian: But, I can confirm that stopping -both- glusterd and glusterfsd, and -then- rebooting, results in a clean shutdown with no timeouts.
20:14 JoeJulian Supermathie: The third entry, 80, is that.
20:15 Supermathie JoeJulian: That's the priority. But 'chkconfig glusterd on' results in only the S* links being created. The init.d script in the master branch is much nicer
20:16 jag3773 joined #gluster
20:16 JoeJulian I don't doubt it. The 2 part script was legacy for the old hand-configured volumes to be able to continue to work.
20:17 toddstansell the init script for 3.3.2qa1 from my box is here: http://fpaste.org/10935/67957776/
20:17 glusterbot Title: #10935 Fedora Project Pastebin (at fpaste.org)
20:17 JoeJulian thanks
20:18 baskin joined #gluster
20:20 baskin hi guys
20:20 toddstansell which looks the same as https://github.com/gluster/glusterfs/blob/r​elease-3.3/extras/init.d/glusterd-Redhat.in
20:20 baskin can i ask for a little help
20:20 glusterbot <http://goo.gl/yRhpL> (at github.com)
20:21 JoeJulian baskin: As yet, that is unclear.
20:21 Supermathie toddstansell: Yeah, that's the script that won't create the Kill links
20:21 kushnir-nlm .. /etc/init.d/glusterd from 3.3.1 RPMS at http://fpaste.org/10937/
20:21 glusterbot Title: #10937 Fedora Project Pastebin (at fpaste.org)
20:22 JoeJulian It should. 35 20 80 means at runlevels 3 and 5, start at priority 20. Kill at priority 80.
20:22 JoeJulian hello
20:22 glusterbot JoeJulian: Despite the fact that friendly greetings are nice, please ask your question. Carefully identify your problem in such a way that when a volunteer has a few minutes, they can offer you a potential solution. These are volunteers, so be patient. Answers may come in a few minutes, or may take hours. If you're still in the channel, someone will eventually offer an answer.
20:22 JoeJulian baskin: ^^
20:23 baskin yes sorry
20:23 toddstansell i see kill scripts in rc{0,1,6}.d only for glusterd
20:24 baskin is there any recomended way to mount a glusterfs volume on ubuntu (with the glusterfs client)? I mean other than the fstab...
20:24 kushnir-nlm JoeJulian: /etc/init.d/glusterfsd from 3.3.1 RPMS is at http://fpaste.org/10938/58161136/
20:24 glusterbot Title: #10938 Fedora Project Pastebin (at fpaste.org)
20:24 JoeJulian toddstansell: For the script that you pasted for me, that would be correct.
20:24 Supermathie Ahhhhhh
20:25 Supermathie With that script, if you 'chkconfig glusterd on' only the start links get created. If you 'chkconfig --add glusterd' then *both* the start and stop links are created.
20:25 JoeJulian Sorry, in case I'm being unclear. That's changed from 3.3.1 to 3.3.2. In 3.3.1 you had two kill scripts.
20:25 JoeJulian Ok, that's a bug. It should have been added on install.
20:26 kushnir-nlm K. Cool. Now I know... and knowing is half the battle. :)
20:27 kushnir-nlm Is 3.3.1 still the recommended version for production environments?
20:27 Supermathie JoeJulian: the RPM scripts are correct. *I* am working from source and I goofed by doing 'chkconfig glusterd on'
20:27 JoeJulian https://github.com/gluster/glusterfs/b​lob/release-3.3/glusterfs.spec.in#L325
20:27 glusterbot <http://goo.gl/87AZu> (at github.com)
20:27 JoeJulian That line should have added it during the server install.
20:28 JoeJulian Supermathie: Ah, ok. Got it.
20:28 Supermathie So next time someone complains that the K* links aren't added, you'll know why :D
20:28 JoeJulian :D
20:29 toddstansell so then i'm confused why rebooting my box would have resulted in the other node hanging, if these init scripts are theoretically set up correctly...
20:29 toddstansell which kill scripts get fired when you reboot a box? (i'm used to a solaris init world, so i keep getting confused how the linux init stuff fires)
20:30 JoeJulian reboot is runlevel 6, so /etc/rc.d/rc6.d
20:30 Supermathie toddstansell: Output of "ls -al /etc/rc*.d/*gluster*" ?
20:30 JoeJulian shutdown is runlevel 0
20:30 toddstansell thanks.
20:30 toddstansell rc6.d is the important one for us then
20:30 JoeJulian In fact, I got in the habit of "init 6" from back when "reboot" did so. Now.
20:31 JoeJulian Or, if I was really lazy: sync;sync;reboot
20:31 Supermathie JoeJulian: Or psychotic: killall -9
20:31 wushudoin joined #gluster
20:31 JoeJulian kill -9 0
20:32 wushudoin left #gluster
20:33 JoeJulian I wish I had the hardware to do some of the tests I'd like to try for testing day.
20:33 dialt0ne joined #gluster
20:34 jthorne joined #gluster
20:34 wushudoin joined #gluster
20:34 kushnir-nlm So, follow-up question: In what sense is the connect/disconnect associated with the network.ping-timeout expensive?
20:35 dialt0ne so i've got snapshots from my production gluster volume setup in a test environment and when i try to re-create the volume i get "/brick/0/data or a prefix of it is already part of a volume"
20:35 glusterbot dialt0ne: To clear that error, follow the instructions at http://goo.gl/YUzrh or see this bug http://goo.gl/YZi8Y
20:35 JoeJulian long ping timeout: unplug a network cable, move it to another switch (dropping it on the way) total impact, 10 second pause.
20:35 JoeJulian ... for that one client. ^^
20:36 kushnir-nlm Sorry, I don't follow.
20:36 JoeJulian short ping timeout: same cable transaction. All FD's and locks have to be re-established. All servers increase load and slow all queries for 30 to 90 seconds affecting all clients.
20:37 ingard_ JoeJulian: do you know what the recommended route to upgrade from 3.0.5 would be?
20:37 ingard_ it seems i _can_ go from 3.0 to 3.2 and from there to 3.3
20:37 kushnir-nlm Ahh... Ok.
20:37 wushudoin left #gluster
20:38 JoeJulian ingard_: you're going to need to create a volume using the cli. You can go straight to 3.3. Keep your same brick configuration when you create the new volume and you won't have to move any data.
20:38 dblack joined #gluster
20:38 msvbhat joined #gluster
20:38 kshlm|AFK joined #gluster
20:38 kshlm|AFK joined #gluster
20:39 JoeJulian dialt0ne: one sec while I fire up my time machine... ;)
20:40 JoeJulian http://irclog.perlgeek.de/g​luster/2013-05-02#i_7011125
20:40 glusterbot <http://goo.gl/7aROx> (at irclog.perlgeek.de)
20:40 JoeJulian dialt0ne: ^
20:41 ingard_ JoeJulian: so in brief, shut everything down, upgrade to 3.3.latest? and simply create a volume using the cli with the same bricks as my original volume?
20:41 JoeJulian Correct. Same number of replica subvolumes (replica N in your cli configuration).
20:41 JoeJulian @brick order
20:41 glusterbot JoeJulian: Replicas are defined in the order bricks are listed in the volume create command. So gluster volume create myvol replica 2 server1:/data/brick1 server2:/data/brick1 server3:/data/brick1 server4:/data/brick1 will replicate between server1 and server2 and replicate between server3 and server4.
20:42 dialt0ne yeah, i was looking for those logs, couldn't find them on www.gluster.org thanks ;-)
20:43 JoeJulian You're welcome. It's in the topic, btw.
20:43 dialt0ne oof. headshot
20:43 JoeJulian hehe
20:44 ingard_ JoeJulian: so what about the "find /mnt/gluster" cmd it says on the wiki i need to do when going from 3.0 to 3.2 ?
20:44 JoeJulian Do "gluster volume heal $vol full" instead.
20:50 toddstansell so if i kill the glusterfsd process for one of my bricks directly, there should be almost no hang on my client mount, correct?
20:51 toddstansell or does glusterd also need to be stopped?
20:51 JoeJulian toddstansell: kill -15, correct. kill -9 all bets are off.
20:51 toddstansell right
20:57 toddstansell hm... maybe this is contributing to some of the weird things I'm seeing: one node of my 2-node replica volume shows this: Number of Bricks: 0 x 4 = 2; the other node shows this: Number of Bricks: 1 x 2 = 2
20:58 JoeJulian yeah, that's wierd...
21:00 toddstansell and interestingly, i had tried to do 'gluster volume reset gvhome network.ping-timeout' to reset it back to defaults, and it looks like it worked on my first host, but the second it didn't.  "Operation failed on admin02.mgmt"
21:00 toddstansell trying to set it again, resulted in "Error, Validation Failed; Set volume unsuccessful"
21:00 toddstansell i think my cluster is confused.
21:01 JoeJulian Pick one to be considered sane. stop glusterd on the other. rsync /var/lib/glusterd/vols from the good one to the bad one.
21:02 toddstansell should i stop the glusterfsd's too?
21:02 toddstansell on the "weird" side?
21:02 JoeJulian You can, but it shouldn't be necessary.
21:02 toddstansell ok
21:02 JoeJulian Once the vol configs are in sync, you can kill -HUP the glusterfsd on the "bad" and they'll check for configuration changes.
21:03 JoeJulian After you've started glusterd of course.
21:04 toddstansell and does it matter that these hosts are also mounting these via fuse? do those glusterfs processes interact with this data at all?
21:05 toddstansell /var/lib/glusterfs/vols, that is.
21:05 kushnir-nlm Sorry guys, I think I missed the reply on this. Is 3.3.1 still the recommended version for production, or is 3.3.2 QA better?
21:05 Lui1 left #gluster
21:05 JoeJulian toddstansell: I'd probably kill -HUP the glusterfs processes as well. It doesn't hurt anything.
21:05 clutchk1 joined #gluster
21:06 JoeJulian kushnir-nlm: 3.3.2qa just hit qa testing today. If you need to launch tomorrow, I'd install 3.3.1 and I'd use this ,,(yum repo)
21:06 glusterbot kushnir-nlm: kkeithley's fedorapeople.org yum repository has 32- and 64-bit glusterfs 3.3 packages for RHEL/Fedora/Centos distributions: http://goo.gl/EyoCw
21:06 toddstansell for us, 3.3.2qa1 is better because it fixes a self-heal bug that causes timestamps to get corrupted if self-healing to the primary brick in a volume...  but we're accepting the risk of running qa code to get that fix.
21:07 toddstansell hoping a final 3.3.2 can be released soon that we can upgrade to :)
21:07 JoeJulian Otherwise, there's several bugs that are fixed in 3.3.2. If you have a week to test, I would try that.
21:11 elyograg left #gluster
21:26 portante|afk a2_: u there?
21:26 glusterbot New news from newglusterbugs: [Bug 960752] Update to 3.4-beta1 kills glusterd <http://goo.gl/69M5f>
21:27 portante avati_: ^^^
21:33 pithagorians hi. where can i find more  details about how glusterfs failover is done?
21:33 pithagorians i had a situation - 2 nodes in replica mode, the second one had a bad hdd and the entire cluster didn't work
21:34 kushnir-nlm .
21:36 kushnir-nlm JoeJulian: Can I just yum update from 3.3.1-1 to 3.3.1-14? Do I need break/reestablish my volume or anything?
21:37 toddstansell JoeJulian: hm... so far that hasn't really fixed anythiung.  it still says 0x4=2
21:38 toddstansell interestingly, in vols/gvhome/bricks/admin02.mgmt:-brick-home, it shows listen-port=0 on both nodes.
21:38 toddstansell whereas admin01.mgmt:-brick-home has listen-port=24011
21:43 avati_ portante: ping
21:44 portante avati_: does the gluster-swift.git repo have GLUSTER_REFSPEC variable set in the hook that notifies Jenkins
21:44 portante am I even asking the right question?
21:44 portante The jenkins job expects that variable to be filled in
21:44 avati_ you mean GERRIT_REFSPEC?
21:44 portante yes
21:44 avati_ it should be set automatically
21:45 portante hmmm
21:45 avati_ let me check
21:46 portante I just changed the default to not use master, but a branch that does not exist to see if it is taking the default value
21:46 portante but now that a regression job is running, it is going to be a few minutes before we can verify
21:47 dialt0ne JoeJulian:  when you say "uses the same translators" do you mean the translators have to have the same name? configured the same? http://irclog.perlgeek.de/g​luster/2013-05-02#i_7011177
21:47 glusterbot <http://goo.gl/NtL7s> (at irclog.perlgeek.de)
21:48 avati_ portante: console log says it is fetching the patch
21:49 avati_ ooh wait
21:50 avati_ Fetching upstream changes from ssh://build@review.gluster.org/gluster-swift.git
21:50 avati_ Commencing build of Revision f534d66a4a29ebf7ca070f95cf33b57b4b7283a3 (origin/master)
21:50 avati_ Checking out Revision f534d66a4a29ebf7ca070f95cf33b57b4b7283a3 (origin/master)
21:50 avati_ [workspace] $ /bin/bash /tmp/hudson7894317356380681782.sh
21:50 avati_ that's the commit id of the parent (current head)
21:50 portante yes, that is not the patch revion
21:50 portante yes
21:51 portante so I am wondering where it gets that revision from, because the build parameters does show the right revion number
21:51 avati_ if you see smoke and rh-bugid jobs, in those very lines, you see the commit of the patch *to be tested*, not the parent (current head)
21:51 portante yes
21:52 portante http://www.mdisc.com/mdisc-technology/
21:52 glusterbot Title: M-Discâ„¢ Technology » The M-DISCâ„¢ (at www.mdisc.com)
21:53 avati_ why don't you add "echo $GERRIT_REFSPEC" to see what value is coming in?
21:53 portante http://build.gluster.org/job/glust​er-swift-unit-tests/6/parameters/?
21:53 glusterbot <http://goo.gl/AIqyJ> (at build.gluster.org)
21:54 portante Those parameters look right, but it is not checking out the GERRIT_PATCHSET_REVISION
21:54 portante GERRIT_REFSPEC also looks correct
21:58 avati_ right..
21:58 portante okay, gotta head home, I'll pick this up in a little bit from home ...
22:06 semiosis [15:53] * Supermathie lowers the network.ping-timeout to 5s on Discourse....
22:06 semiosis !!!
22:08 dialt0ne :-(
22:08 dialt0ne [2013-05-07 22:04:06.922947] E [posix.c:4119:init] 0-cad-fs1-posix: mismatching volume-id (fb3dd1fa-8978-4e9b-9ecf-137cf2344a21) received. already is a part of volume 5502df3d-bfb3-4e6b-8e31-94d66c96be74
22:14 semiosis Supermathie: still around?
22:34 hagarth joined #gluster
23:13 Shdwdrgn joined #gluster
23:29 Tobarja left #gluster
23:31 koodough joined #gluster
23:34 ninkotech__ joined #gluster
23:41 dialt0ne left #gluster
23:42 jag3773 joined #gluster

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary