Camelia, the Perl 6 bug

IRC log for #gluster, 2012-10-12

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
01:06 Bullardo joined #gluster
01:23 andrewbogott joined #gluster
01:26 andrewbogott joined #gluster
01:33 oneiroi joined #gluster
01:41 glusterbot New news from newglusterbugs: [Bug 843792] Fix statedump code in write-behind xlator <https://bugzilla.redhat.com/show_bug.cgi?id=843792>
01:43 kevein joined #gluster
02:16 sunus joined #gluster
03:01 neofob joined #gluster
03:17 shylesh joined #gluster
03:22 kshlm joined #gluster
03:33 wushudoin joined #gluster
03:41 neofob my sketchy dependency graph to build glusterfs on ARM CuBox http://bit.ly/SUBmJZ
03:44 shylesh joined #gluster
04:11 glusterbot New news from newglusterbugs: [Bug 861335] GlusterFS client crashed due to Segmentation fault. <https://bugzilla.redhat.com/show_bug.cgi?id=861335>
04:16 Bullardo joined #gluster
04:30 vpshastry joined #gluster
04:39 jays joined #gluster
04:53 deepakcs joined #gluster
05:08 Bullardo joined #gluster
05:08 sgowda joined #gluster
05:26 faizan joined #gluster
05:27 adechiaro joined #gluster
05:27 Bullardo joined #gluster
05:44 hagarth joined #gluster
05:48 ankit9 joined #gluster
05:55 adechiaro joined #gluster
05:59 ramkrsna joined #gluster
06:02 faizan joined #gluster
06:19 pkoro joined #gluster
06:26 hagarth joined #gluster
06:27 ngoswami joined #gluster
06:33 rgustafs joined #gluster
06:41 mohankumar joined #gluster
06:45 saz_ joined #gluster
06:52 ekuric joined #gluster
06:53 dobber joined #gluster
06:53 mohankumar joined #gluster
06:53 vimal joined #gluster
06:53 mdarade1 joined #gluster
07:00 Azrael808 joined #gluster
07:02 sunus1 joined #gluster
07:03 ctria joined #gluster
07:18 mdarade1 joined #gluster
07:19 mdarade1 left #gluster
07:20 ankit9 joined #gluster
07:20 Nr18 joined #gluster
07:20 tjikkun_work joined #gluster
07:26 andreask joined #gluster
07:35 badone_home joined #gluster
07:41 xymox joined #gluster
07:42 glusterbot New news from newglusterbugs: [Bug 865696] Publish GlusterFS 3.3 Administration Guide HTML files on a server instead of HTML tarball <https://bugzilla.redhat.com/show_bug.cgi?id=865696>
07:46 badone_home joined #gluster
07:47 lkoranda joined #gluster
08:13 shylesh joined #gluster
08:22 hagarth joined #gluster
08:29 badone_home joined #gluster
08:34 stickyboy joined #gluster
08:35 vpshastry left #gluster
08:35 ankit9 joined #gluster
08:36 Triade joined #gluster
08:37 manik joined #gluster
08:37 vpshastry joined #gluster
08:42 faizan joined #gluster
08:44 kaney777 joined #gluster
08:44 kaney joined #gluster
08:44 tryggvil joined #gluster
08:45 kaney777 joined #gluster
08:46 kd joined #gluster
08:51 kaney777 left #gluster
09:05 lh joined #gluster
09:05 lh joined #gluster
09:18 red_solar joined #gluster
09:29 tryggvil joined #gluster
09:37 badone_home joined #gluster
09:49 ramkrsna joined #gluster
09:55 duerF joined #gluster
10:02 vpshastry1 joined #gluster
10:10 tryggvil joined #gluster
10:12 ramkrsna joined #gluster
10:12 ramkrsna joined #gluster
10:13 glusterbot New news from newglusterbugs: [Bug 848543] brick directory is automatically recreated, e.g. when disk not mounted <https://bugzilla.redhat.com/show_bug.cgi?id=848543>
10:18 vpshastry joined #gluster
10:29 xymox_ joined #gluster
10:41 hagarth well, that's some split
10:45 sripathi joined #gluster
10:45 vpshastry joined #gluster
10:45 duerF joined #gluster
10:45 manik joined #gluster
10:45 ankit9 joined #gluster
10:45 stickyboy joined #gluster
10:45 shylesh joined #gluster
10:45 Nr18 joined #gluster
10:45 vimal joined #gluster
10:45 mohankumar joined #gluster
10:45 dobber joined #gluster
10:45 rgustafs joined #gluster
10:45 adechiaro joined #gluster
10:45 deepakcs joined #gluster
10:45 jays joined #gluster
10:45 oneiroi joined #gluster
10:45 a2 joined #gluster
10:45 nightwalk joined #gluster
10:45 jbrooks joined #gluster
10:45 copec joined #gluster
10:45 Daxxial_ joined #gluster
10:45 rwheeler joined #gluster
10:45 chandank|work joined #gluster
10:45 jdarcy joined #gluster
10:45 bulde joined #gluster
10:45 RNZ joined #gluster
10:45 zwu joined #gluster
10:45 glusterbot joined #gluster
10:45 cattelan joined #gluster
10:45 tc00per|away joined #gluster
10:45 octi joined #gluster
10:45 crashmag joined #gluster
10:45 tripoux joined #gluster
10:45 RobertLaptop_ joined #gluster
10:45 NuxRo joined #gluster
10:45 flind_ joined #gluster
10:45 _Bryan_ joined #gluster
10:45 imcsk8 joined #gluster
10:45 FU5T joined #gluster
10:45 dec joined #gluster
10:45 lkthomas joined #gluster
10:45 stigchristian joined #gluster
10:45 Psi-Jack joined #gluster
10:45 frakt_ joined #gluster
10:45 mtanner joined #gluster
10:45 penglish joined #gluster
10:45 sr71 joined #gluster
10:45 dblack joined #gluster
10:45 semiosis joined #gluster
10:45 ndevos joined #gluster
10:45 johnmark joined #gluster
10:45 raghavendrabhat joined #gluster
10:45 gluslog joined #gluster
10:45 rz___ joined #gluster
10:45 chrizz- joined #gluster
10:45 hagarth_ joined #gluster
10:45 _br_ joined #gluster
10:45 xavih joined #gluster
10:45 Shdwdrgn joined #gluster
10:45 gm____ joined #gluster
10:45 morse joined #gluster
10:45 primusinterpares joined #gluster
10:45 ladd joined #gluster
10:45 eightyeight joined #gluster
10:45 Revo joined #gluster
10:45 jiffe98 joined #gluster
10:45 JoeJulian joined #gluster
10:45 plantain joined #gluster
10:45 er|c joined #gluster
10:45 tjikkun_work joined #gluster
10:45 ramkrsna joined #gluster
11:19 Humble joined #gluster
11:19 redsolar joined #gluster
11:23 kkeithley joined #gluster
11:23 stigchristian_ joined #gluster
11:25 stigchristian_ I have a disk chassis with 36x 3TB disk and will create 2 RAID6 volumes with 18 disks each. I`m storing only big-media files. What strip size is recommended?
11:25 ankit9 joined #gluster
11:36 Azrael808 joined #gluster
11:36 Daxxial_1 joined #gluster
11:40 edward1 joined #gluster
11:42 mgebbe_ joined #gluster
11:42 4JTAAHGQA joined #gluster
11:51 lh joined #gluster
11:51 lh joined #gluster
11:52 balunasj joined #gluster
12:03 vpshastry left #gluster
12:05 ramkrsna joined #gluster
12:08 oneiroi joined #gluster
12:24 tryggvil joined #gluster
12:33 lh joined #gluster
12:33 lh joined #gluster
12:40 ankit9 joined #gluster
12:44 zArmon joined #gluster
12:46 faizan joined #gluster
12:47 abyss^_ option in mount _netdev is no longer support? So what option should I use istead of this?
12:48 linux-rocks joined #gluster
12:58 hagarth joined #gluster
13:00 nueces joined #gluster
13:14 rwheeler joined #gluster
13:15 Azrael808 joined #gluster
13:16 xymox joined #gluster
13:31 Nr18 joined #gluster
13:37 rgustafs joined #gluster
13:39 TheHaven joined #gluster
13:41 bennyturns joined #gluster
13:42 Nr18 joined #gluster
13:44 wnl_work joined #gluster
13:45 wnl_work i am annoyed at gluster. when one node in a replicated cluster fails (system hang), gluster on the other node also hangs
13:46 saz_ joined #gluster
13:46 ndevos well, depends, but likely only until the timeout (42 secs default) has passed
13:46 wnl_work this is well past 42 seconds
13:47 wnl_work close to 5 minutes by now
13:47 wnl_work the peer shows as disconnected
13:47 wnl_work im getting ready to reboot the second node just to get things working again
13:47 ndevos and you're doing the acces on the local server, and it contains all the bricks needed?
13:47 wnl_work yes
13:48 wnl_work first node is now halted, and the second still hasnt recovered
13:49 ndevos hmm, sounds strange, that should not happen for all I know, maybe you have something in the logs mentioning call_bail (timeout for that was 30 minutes, reduced to 10 recently)
13:49 ndevos but that would affect gluster peer/volume operations, not filesystem ones iirc
13:49 wnl_work lots of "disconnected" messages
13:50 ndevos well, thats normal if one server is non-responsive
13:50 wnl_work nothing in the logs of significance prior to that
13:51 wnl_work All subvolumes are down. Going offline until atleast one of them comes back up.
13:52 wnl_work but one of the replicate bricks is on the local machine
13:52 wnl_work ah! looks like glusterfsd died
13:53 wnl_work no i take that back, maybe not
13:53 Triade joined #gluster
13:53 wnl_work had to reboot the box
13:54 wnl_work the whole reason i used gluster was to make this sort of situation automaticaly recoverable.
13:54 wnl_work this didnt happen with earlier versions
13:54 wnl_work very disapointed
13:57 ndevos I'm not sure what could have caused that, but it really should fail-over without issues...
13:57 wnl_work well, it didnt
13:58 wnl_work im saving the logs. no idea if that will help tell me anything
13:59 stopbit joined #gluster
13:59 sripathi joined #gluster
14:02 JZ_ joined #gluster
14:03 y4m4 joined #gluster
14:05 Daxxial_ joined #gluster
14:07 abyss^_ Why when I try to mount I get: unknown option _netdev (ignored), what new option should I use?
14:12 wnl_work if youre using _netdev then i believe you need to have the netfs service turned on to do the mounts
14:13 glusterbot New news from newglusterbugs: [Bug 865812] glusterd stops responding after ext4 errors <https://bugzilla.redhat.com/show_bug.cgi?id=865812>
14:16 xymox joined #gluster
14:16 ndevos abyss^_: you can ignore that message, it will be gone in an upcoming release, see Bug 827121
14:16 glusterbot Bug https://bugzilla.redhat.com​:443/show_bug.cgi?id=827121 unspecified, unspecified, ---, csaba, MODIFIED , [3.3.0] Mount options "noauto" and "_netdev" should be silently ignored
14:17 ndevos and yeah, wnl_work is correct, the _netdev option gets intepreted by the netfs service (and rc.sysinit)
14:19 wN joined #gluster
14:21 abyss^_ ndevos: thank you
14:21 semiosis wnl_work: using replication?  please ,,(pasteinfo)
14:21 glusterbot wnl_work: Please paste the output of "gluster volume info" to http://fpaste.org or http://dpaste.org then paste the link that's generated here.
14:22 semiosis also afaik _netdev is not a mount option, but rather an option used by the initscripts to order mounts at boot time
14:22 manik joined #gluster
14:22 semiosis ah nice find on the bug ndevos
14:23 semiosis oops, that comment about _netdev was for abyss^_, not wnl_work :)
14:23 * semiosis late to the party
14:23 wnl_work semiosis: http://fpaste.org/37te/
14:23 glusterbot Title: Viewing output of gluster volume info by wnl_work (at fpaste.org)
14:24 semiosis and speaking of parties... https://plus.google.com/u/0/eve​nts/cnenefar7ta9neao8psrsd68i9g
14:24 semiosis glusterbot: where's my title?!?!
14:25 semiosis wnl_work: hmm using replication yet one server failing causes the client to hang indefinitely
14:25 semiosis that's not right
14:25 wnl_work thats what i experienced.  resarting gluster did not help. i had to reboot the server.
14:25 semiosis wnl_work: btw, are those elastic IPs?
14:26 wnl_work yes
14:26 semiosis ok
14:26 wnl_work and now, dammit, theyre telling me a bunch of images are missing
14:26 wnl_work ive never had gluster screw me over like this before
14:26 semiosis glusterbot: meh
14:26 glusterbot semiosis: I'm not happy about it either
14:27 semiosis wnl_work: pastie some logs, around the time of the hang, please
14:28 wnl_work i will try, but right now im trying to figure out why the site doesnt have half its images
14:28 semiosis ok
14:32 bulde1 joined #gluster
14:34 wushudoin joined #gluster
14:35 wnl_work semiosis: which logs you looking for?
14:35 wnl_work oh, and the scenario was: one aws instance hung (totally unresponsive to anything). the other instance showed the peer as disconnected but any attempts to access the gluster file system hung. i had a whole slew of httpd processes stuch in disk wait
14:36 semiosis client logs... /var/log/glusterfs/client-mount-point.log
14:38 wnl_work from etc-glusterfs-glusterd.log..... this cant be good: [2012-10-12 09:52:31.718635] W [glusterfsd.c:831:cleanup_and_exit] (-->/lib/i686/nos
14:38 wnl_work egneg/libc.so.6(clone+0x5e) [0x20c2be] (-->/lib/i686/nosegneg/libpthread.so.0 [0x6ab
14:38 wnl_work 889] (-->/usr/sbin/glusterd(glusterfs_sigwaiter+0x1ff) [0x804c33f]))) 0-: received signum (15), shutting down
14:39 wnl_work oh wait....that COULD be from when i restarted gluster
14:39 stigchristian_ I have a disk chassis with 36x 3TB disk and will create 2 RAID6 volumes with 18 disks each. I`m storing only big-media files. What strip size is recommended?
14:40 semiosis that's the glusterd log, not a client log
14:40 semiosis and yes looks like reboot
14:43 wnl_work semiosis: im looking at the client log and i see multiples of these, repeated several times a minute:  I [client.c:2090:client_rpc_notify] 0-wwwfiles-client-1: disconnected
14:43 wnl_work is that normal?
14:43 semiosis not normal
14:43 wnl_work thats been going on since august
14:43 glusterbot New news from newglusterbugs: [Bug 865825] Self-heal checks skip pending counts that they shouldn't <https://bugzilla.redhat.com/show_bug.cgi?id=865825>
14:44 semiosis wnl_work: what version of glusterfs is this?
14:45 wnl_work heres the client logs. i have removed most of the "disconnected" messages: http://web2.csedigital.com​/logs/shared-www-files.log
14:45 wnl_work glusterfs 3.3.0 built on Jul 18 2012 19:54:01
14:47 wnl_work im still getting those disconnected messages on both peers
14:47 semiosis are those elastic IPs associated with your instances correctly?
14:47 semiosis i think if they were not this is how it would look... just a guess
14:48 JoeJulian Is "ec2-50-16-242-219.compute-1.amazonaws.com" elastic? Looks rather fixed to me.
14:48 semiosis also just as a basic sanity check, make sure DNS resolution *works* at all on these hosts.  i've seen weird failures with the EC2 resolvers
14:49 semiosis JoeJulian: elastic means it can be assigned to instances in ec2
14:49 wnl_work JoeJulian: its the dns name that refers to an elastic ip, but on the instance itself (using the amazon resolvers) it resolves to the current net 10 address
14:49 JoeJulian Ah
14:49 semiosis JoeJulian: elastic, as opposed to truly dynamic where you just get a random IP/hostname and have no control whatsoever
14:49 JoeJulian That "DNS resolution failed" is the obvious problem, of course.
14:50 semiosis wnl_work: check those elastic ip associations & let me know
14:50 JoeJulian How do the elastic ips get assigned? dhcp?
14:50 semiosis JoeJulian: EC2 API calls
14:51 semiosis JoeJulian: you make a call to get an elastic IP, one is randomly chosen and provisioned into your account.  then you can make "associate" and "dissociate" calls to put that IP on (or remove it from) any of your instances
14:51 wnl_work semiosis: not sure how an elastic ip could be improperly assigned.
14:52 JoeJulian From within the vm though...
14:52 semiosis and you own that IP for the lifetime of your account, unless you "release" it back to amazon
14:52 semiosis wnl_work: pebkac :)
14:52 wnl_work what do you want me to check? the ips are assigned to the correct servers. that i can confirm
14:52 semiosis JoeJulian: from within the vm?  what?
14:52 JoeJulian never mind. off to the dr.
14:52 wnl_work dns is working on both servers...i will confirm again. but if the lookup was returning the wrong ip then gluster wouldnt work at all i would think
14:54 daMaestro joined #gluster
14:54 wnl_work confirmed. dns properly looks up those names and returns the intenal address
14:56 semiosis wnl_work: i agree with JoeJulian, dns resolution is the obvious issue.  gotta figure out why that is failing
14:56 wnl_work obvious issue or which? the disconnected messages or the lack of failover?
14:57 wnl_work and it sure seems like its working right now.
14:58 ankit9 joined #gluster
14:59 semiosis first things first, get gluster resolving names correctly, then we can tackle those other issues, if they still exist
14:59 semiosis wnl_work: as an experiment, try mounting a new glusterfs client of this volume to some other directory... just to get a nice clean log file, then pastie that log file please
15:00 semiosis also please provide the mount command you use for that here
15:00 semiosis mount it somewhere unimportant, like /tmp/glustest
15:03 wnl_work semiosis: since i cant tell if it is resolving them incorrectly, im not sure how to get them to resolve correctly. afaict, they are resolving correctly.
15:04 semiosis i'll try to help you with that... can you please do that experiment?
15:04 wnl_work yes, one moment
15:04 semiosis thx
15:04 wnl_work still trying to track down missing files. right now i dont know which to hate on worse: gluster or drupal
15:05 semiosis i feel your pain
15:06 wnl_work or the site's editors that keep uploading the same images over and over again
15:07 wnl_work fyi: the mount uses the name "localhost". only the peer connections use the dnsname
15:07 semiosis got it
15:08 atrius joined #gluster
15:08 wnl_work http://fpaste.org/V8vh/
15:08 glusterbot Title: Viewing test mount by wnl_work (at fpaste.org)
15:09 semiosis ok this is really helpful!
15:10 wnl_work yay   \o/
15:10 semiosis could you do the same experiement & give the log from the other peer please?
15:12 wnl_work one moment
15:14 wnl_work same mount command, mount point is /tmp/glustest2:  http://fpaste.org/9g9x/
15:14 glusterbot Title: Viewing Paste #242849 (at fpaste.org)
15:16 ankit9 joined #gluster
15:16 semiosis ah, sorry... backing up a step... are your servers also clients?
15:16 wnl_work yes
15:17 wnl_work i always mount from localhost
15:17 semiosis ok, so did you run the experiment on both machines?  or twice on the same machine?
15:17 wnl_work i will double check but im pretty sure it was on both machines
15:17 semiosis ok just wondering because of the change of mount point...
15:17 wnl_work yes. once on each machine
15:18 semiosis ok good, thx
15:18 wnl_work i did that to help keep track of which was whcih
15:18 wnl_work can i unmount them now?  :)
15:18 semiosis sure
15:19 semiosis so on the machine ec2-50-16-242-219.compute-1.amazonaws.com (aka wwwfiles-client-0 internally) seems like there is a missing glusterfsd (brick export daemon) process
15:19 semiosis ,,(processes)
15:19 glusterbot the GlusterFS core uses three process names: glusterd (management daemon, one per server); glusterfsd (brick export daemon, one per brick); glusterfs (FUSE client, one per client mount point; also NFS daemon, one per server). There are also two auxiliary processes: gsyncd (for geo-replication) and glustershd (for automatic self-heal). See http://goo.gl/hJBvL for more information.
15:19 semiosis that would explain the disconnected messages
15:20 semiosis on that machine, go to /var/log/glusterfs/bricks and pastie the last 20 or so lines from the brick log there
15:20 semiosis lets see why it died
15:20 hagarth joined #gluster
15:20 wushudoin joined #gluster
15:23 wnl_work okay...
15:24 wnl_work theyre empty
15:24 wnl_work -rw------- 1 root root 0 Oct  7 04:02 gluster-brick1.log
15:25 wnl_work whoa!
15:25 semiosis heh, odd
15:25 semiosis ?
15:26 wnl_work on that machine (the one with the missing glusterfsd) there is at least one file in the brick that is not showing up in the file system
15:26 wnl_work which may have just saved my butt
15:27 semiosis nothing on that brick is showing up on the glusterfs volume!
15:27 semiosis the brick is dead
15:27 semiosis your volume is running on one brick right now
15:28 semiosis if you restart glusterd on this machine it will try to re-start the brick
15:28 wnl_work well thats curious
15:28 semiosis which should produce some logs
15:28 wnl_work how can i tell that from gluster volume info  ?
15:28 semiosis you can't :)
15:28 semiosis client logs
15:28 wnl_work well thats a shame.  :)
15:28 semiosis and process table listings
15:28 semiosis i think there's another command (new in 3.3.0) that does
15:28 wnl_work but to make this more interesting, thats the server that failed. if im not running on that brick, why did gluster still hang?
15:28 semiosis maybe gluster volume status or something liek that
15:29 wnl_work youre right
15:29 wnl_work only one brick is showing up in gluster volume status
15:29 wnl_work i would never have seen that if i didnt know what to look for
15:29 wnl_work that explains A LOT
15:30 edward1 joined #gluster
15:30 wnl_work but i still dont know WHY
15:30 wnl_work glusterfsd isnt running
15:30 semiosis if you restart glusterd on that machine it will try to start the missing glusterfsd process
15:31 semiosis that will produce log messages in the brick log you found to be empty
15:31 semiosis if it fails to start, the log will say why
15:31 semiosis but if it starts ok, then just do a ,,(repair) and you should be in business
15:31 glusterbot http://www.gluster.org/community/do​cumentation/index.php/Gluster_3.1:_​Triggering_Self-Heal_on_Replicate
15:31 semiosis oh wait, you're using 3.3.0, so the self-heal daemon should automatically heal things
15:31 * semiosis is still on 3.1 :)
15:32 wnl_work right
15:32 wnl_work but its been this way since augist?
15:32 wnl_work august?
15:32 wnl_work okay....hang on cuz i gotta drop this server out of the lb
15:33 wnl_work actually....priorities....i have to get the missing files back first... give me about 15-20 minutes
15:33 kkeithley wnl_work: what linux distribution?
15:34 wnl_work centos 5.8
15:35 semiosis wnl_work: ok good luck
15:35 semiosis and, i know this is no help now, but maybe for the future... nagios!
15:35 semiosis i use my ,,(puppet) module to set up nagios checks on all the various glusterfs-related processes, log files, and mount points, so i know if something dies or produces error logs
15:35 glusterbot (#1) https://github.com/semiosis/puppet-gluster, or (#2) https://github.com/purpleidea/puppet-gluster
15:36 semiosis (the first one, obvs.)
15:40 semiosis wnl_work: oh and about teh name resolution issue... since a fresh client resolved OK, probably just unmounting/remounting your clients will shake things loose.  that experiment reconfirmed what you already knew, that dns was working ok
15:42 wnl_work semiosis: yeah im monitoring the health of the mountpoint, but not of all the individual processes
15:43 semiosis monitor all the things... twice.
15:44 semiosis brb
15:45 kkeithley wnl_work: you may want to consider using my yum repo. It has "epel" rpms for el-5 that you can use on CentOS.
15:45 kkeithley @yum3.3 repos
15:45 glusterbot kkeithley: I do not know about 'yum3.3 repos', but I do know about these similar topics: 'yum33 repo', 'yum3.3 repo'
15:45 kkeithley @yum3.3 repo
15:45 glusterbot kkeithley: kkeithley's fedorapeople.org yum repository has 32- and 64-bit glusterfs 3.3 packages for RHEL/Fedora/Centos distributions: http://goo.gl/EyoCw
15:48 wnl_work kkeithley: your repo is fedorapeople.org ?
15:49 tc00per kkeithley: Are there any howto's or best-practice documents to read that explain steps to upgrade GlusterFS across a cluster while maintaining availability of data?
15:49 wnl_work because im already using fedorapeople.org :)
15:50 tc00per kkeithley: environment is dist-repl
15:50 elyograg stigchristian_: your question about stripe size .. are you talking about the RAID stripe size or something for gluster's stripe mode?  It's my understanding that very very few people shiould be using gluster's striping.  If you're asking about a RAID stripe size for big media files, the answer is generally "as big as it'll go."
15:58 tc00per In case anybody cares or is curious. The rpm dependencies that exist in the rpms on bits.gluster.com are absent in the rpms on kkeithley's repo.
16:00 wnl_work does the glusterfsd service need to be enabled?
16:00 semiosis tc00per: depends on the upgrade.  if you're going to 3.3 from <3.3, then that's a definite no -- see ,,(3.3 upgrade notes)
16:00 glusterbot tc00per: http://vbellur.wordpress.com/2012/​05/31/upgrading-to-glusterfs-3-3/
16:01 elyograg wnl_work: it's not enabled on any of my testbed machines.  I looked at it once and from what I could tell, only "stop" does anything.
16:01 wnl_work okay, good.
16:02 semiosis tc00per: other upgrades *may* work, but it's not widely tested.  generally in the 3.2 series it was recommended to upgrade all servers first, before any clients.  however clients *must* be unmounted/remounted to upgrade, so keeping things running would involve moving traffic from one to another
16:02 semiosis so even the most seamless upgrade from the user's POV would involve careful planning & execution by the admin
16:03 semiosis wnl_work: no, glusterd manages the glusterfsd processes
16:03 seanh-ansca joined #gluster
16:03 kkeithley tc00per: dunno, maybe someone like johnmark or semiosis can point you at something specific. I know there's supposed to be a growing set of faqs at gluster.org.
16:03 semiosis you shouldn't manage those processes with any initscript, or at all really, unless you're doing maintenance
16:03 semiosis in normal operation glusterd does all that for you
16:04 kkeithley tc00per: which dependencies are missing?
16:04 tc00per Also, it 'looks' like the 3.3.0 client mounts glusterfs volume from 3.3.1 server(s) with only a ' I ' class message in the client log. Is there (will there be) a mechanism to warn clients of incompatible client/server combination?
16:05 semiosis tc00per: in the past, servers needed to be upgraded first.  i discovered that "the hard way" though :)
16:06 johnmark semiosis: heya
16:06 semiosis buenas dias
16:06 wnl_work semiosis: before i do any more tinkering i am backing everything up. then i will try to restart gluster on the server with the failed brick
16:06 johnmark semiosis: so for our online Q&A today, we were thinking of having comments in #gluster
16:06 elyograg i'm very glad that I am starting with 3.3, I hand't realized that upgrading from earlier versions would require taking everything donw.
16:06 johnmark and taking questions that way
16:07 wnl_work semiosis: ive copied the missing files from the non-participating brick in to the glusterfs. hopefully when that brick gets rejoined this wont screw things up. self-heal should figure all that out, right?
16:07 kkeithley the RPMs on bits.gluster.org were built with the glusterfs.spec.in in the source. The RPMs in my repo are built from the Fedora glusterfs.spec. In fact most of them are built on the Fedora build system (koji). I'm pretty sure the Fedora glusterfs.spec has the same (or more) dependencies as the glusterfs.spec.in in the gluster source, but if they don't, that's a bug.
16:07 johnmark semiosis: so people would see the video feed via my youtube channel and comment here, and we'll monitor the discussion for Q's
16:08 tc00per kkeithley: the 'bits' version of glusterfs depended on a deprecated libcrypto.so.6, your rpm correctly depends on libcrypto.so.10, also glusterfs-server depended on a 'compat' version of libreadline5, yours depends on libreadline.so.6. Your RPMS don't depend on deprecated versions of these but on 'current' ones. Your rpms are good. The bit's ones are 'bad'... :)
16:08 semiosis wnl_work: it "should"
16:09 semiosis wnl_work: and hopefully it will :)
16:09 semiosis johnmark: or we could use #gluster-meeting for that
16:09 kkeithley then we should take down the ones on bits.
16:09 johnmark semiosis: that was my next question
16:09 johnmark semiosis: because it can get busy in here :)
16:09 semiosis yeah
16:10 johnmark semiosis: aight. I'll send you and JoeJulian a note with the game plan
16:10 tc00per The same is true for the 'ga' rpms on bits. I was looking at applying those with yum upgrade *.rpm when I found the symlink bug biting me BEFORE you published the 3.3.1 rpms.
16:10 semiosis @qa releases
16:10 glusterbot semiosis: The QA releases are available at http://bits.gluster.com/pub/gluster/glusterfs/ -- RPMs in the version folders and source archives for all versions under src/
16:10 neofob joined #gluster
16:14 tryggvil joined #gluster
16:14 tc00per semiosis: Is there a gluster disasters site anywhere? perhaps with 'anonymous' tales of things you should NEVER do with gluster? Strange thing... when I Google 'gluster disasters' I get a link to a blog article on www.gluster.org but I cannot any link to get the ...more content. Is this a known issue?
16:15 tc00per For example... http://www.gluster.org/2011/11/asteroid​s-nuclear-war-and-data-center-outages-s​urviving-big-disasters-by-being-small/
16:15 johnmark kkeithley: agreed. spec filesshould be in sync
16:15 glusterbot Title: Asteroids, nuclear war and data center outages: Surviving big disasters by being small | Gluster Community Website (at www.gluster.org)
16:15 semiosis @lucky gluster disasters
16:15 glusterbot semiosis: http://community.gluster.org/a/glusterfs-geo​-replication-restoring-data-from-the-slave/
16:15 semiosis oh
16:16 semiosis tc00per: not that i know of
16:16 semiosis well, actually, i did write one about how to cause ,,(split-brain)
16:16 glusterbot (#1) learn how to cause split-brain here: http://goo.gl/nywzC, or (#2) To heal split-brain in 3.3, see http://joejulian.name/blog/fixin​g-split-brain-with-glusterfs-33/ .
16:16 johnmark tc00per: ah, crap. when we migrated the old blog we lost the ability to link to "more content"
16:16 semiosis that's kinda like a "don't do that" but i framed it as "do this to see what happens"
16:16 johnmark tc00per: we'll have to make a point of ensuring the old data is migrated, too
16:16 JoeJulian When I cause a disaster, I'm a bit too busy fixing it to document it.
16:17 JoeJulian Once I've fixed it, usually so many things have happened I can't remember them all.
16:17 tc00per JoeJulian: :)
16:17 semiosis my advice... don't fail.  and if you must fail, don't fail epically.
16:17 mspo always takes notes during an outage
16:17 wushudoin joined #gluster
16:18 mspo you always need a timeline anyway
16:18 * m0zes would recommend *not* using fileservers of vastly different specs.
16:18 johnmark semiosis: "I don't always fail. But when I do, I fail epically."
16:19 johnmark heh
16:19 tc00per johnmark: Thanks... would be a shame to loose all that stuff. I'm sure we noobz could learn something from it.
16:19 johnmark tc00per: indeed. even if a lot of it is out of date, there are still some gems in there
16:19 semiosis haha
16:21 tc00per So anybody want to predict what will happen 'when' I upgrade glusterfs on my client machine running 3.3.0 and connected to a 3.3.1 server WHILE it has a glusterfs mount mounted?
16:21 JoeJulian Files modified during the outage will be self-healed.
16:21 tc00per More fun... 'during an active write operation'.
16:22 wnl_work semiosis: you want me to restart glusterd ?
16:22 semiosis wnl_work: yep
16:23 elyograg are there any caveats or instructions on upgrading from 3.3.0 to 3.3.1?  Is it enough just to yum upgrade and then reboot?  I have noticed that when I stop glusterd, not everything dies, so rebooting seems slightly safer.
16:23 wnl_work okay.  both bricks are showing up now
16:23 elyograg or maybe stop glusterd and then reboot.
16:23 wnl_work but why didnt it start when the system was booted?
16:25 semiosis wnl_work: that would've been recorded in the brick log, but you found that to be empty...
16:26 semiosis odd
16:26 wnl_work okay...now i have one server going south
16:27 wnl_work is a self-heal normally this disruptive?
16:28 wnl_work dammit
16:28 mohankumar joined #gluster
16:29 Mo__ joined #gluster
16:29 wnl_work load avg is now 35, and procs are stuck in D
16:29 wnl_work any ideas? this is on the server that WAS good until i restarted gluster on the other server
16:30 wnl_work lots of complaints about split brain
16:30 kkeithley elyograg: nope, no caveats. yup update/upgrade is all you should need.
16:31 wnl_work well that didnt go well
16:31 nhm joined #gluster
16:32 blendedbychris joined #gluster
16:32 blendedbychris joined #gluster
16:33 semiosis wnl_work: it's self-healing, that could explain the excessive load
16:33 wnl_work its screwed is what it is
16:34 semiosis you can reduce the number of background heals by setting cluster.background-self-heal-count to something low, like 2
16:34 JoeJulian wow... 3.3.1 does the self-heal much faster than  3.3.0 did. Something's definitely fixed.
16:34 wnl_work too late now
16:34 wnl_work the whole cluster is fucked
16:35 kkeithley s/yup/yum/
16:35 glusterbot What kkeithley meant to say was: elyograg: nope, no caveats. yum update/upgrade is all you should need.
16:36 wnl_work 12:36:36 up  2:36,  2 users,  load average: 113.10, 66.47, 28.69
16:36 wnl_work reminds me of my NFS days
16:37 johnmark ouch
16:37 semiosis wnl_work: did you reduce the bkg self heal count?
16:37 JoeJulian wnl_work: Now many clients?
16:37 JoeJulian s/Now/How/
16:37 glusterbot What JoeJulian meant to say was: wnl_work: How many clients?
16:37 wnl_work 2
16:37 semiosis JoeJulian: he has two servers-also-clients
16:37 wnl_work where do i set it?
16:37 JoeJulian So probably not very many open fd...
16:38 semiosis gluster volume set <volname> cluster.background-self-heal-count 2
16:38 wnl_work the load avg was from all the http processes stuck in disk wait
16:38 semiosis yep
16:38 semiosis been there
16:38 wnl_work okay, set
16:39 wnl_work this will get propagated to the other server when it comes back up?
16:39 tc00per OK... yum update of client WHILE WRITING to glusterfs mounted volume succeeded without anything hitting the client log. As far as I can tell right now (using iotop on master node and dstat on client) files are still moving back and forth in my write/read/delete test.
16:40 JoeJulian tc00per: I just upgraded 2 of my 3 servers (replica 3) from 3.3.0 to 3.3.1 with all my vms running. No problemo.
16:40 semiosis um, you have a server down?
16:40 JoeJulian (vm images hosted on a volume)
16:40 semiosis wnl_work: how did that happen?
16:41 elyograg tc00per: given how linux keeps using old files in running processes when you delete/replace the files, that's not very surprising.
16:41 semiosis wnl_work: sorry to say but when you make changes to a volume while a peer is down you get Peer Rejected when that peer comes back online
16:42 wnl_work well right now that might be a good thing
16:42 JoeJulian It should have failed the change.
16:42 semiosis wnl_work: you'll need to either run that same change on the other peer when it comes back up, or possibly have to reset volume options on both... not sure
16:42 semiosis JoeJulian: oh?
16:42 JoeJulian If there's a peer in the volume that's not responding, it should fail the change.
16:43 tryggvil joined #gluster
16:44 semiosis wnl_work: well, did it fail the change?
16:44 semiosis or did it let you set the option while the other peer was down?
16:44 glusterbot New news from newglusterbugs: [Bug 865858] Potentially unnecessary file opens/closes performed around xattr read/writes <https://bugzilla.redhat.com/show_bug.cgi?id=865858>
16:44 wnl_work cluster.background-self-heal-count: 2
16:44 wnl_work still getting soaring load averages
16:44 elyograg I got both my gluster test servers upgraded to 3.3.1 with no issues.
16:45 JoeJulian nice
16:45 JoeJulian I do like it when stuff just works.
16:45 Nr18 joined #gluster
16:46 elyograg I had an active client write going when I updated the second one.  It kept chugging along just fine, and when the server came back up, looks like it synced up nicely.
16:47 wnl_work does this all look like normal self-heal?  http://fpaste.org/qw4w/
16:47 glusterbot Title: Viewing volume log by wnl_work (at fpaste.org)
16:47 wnl_work THIS IS NOT WORKING
16:47 wnl_work processes are getting hung in disk wait
16:48 semiosis wnl_work: you still have a dead server?
16:48 wnl_work both are up
16:48 semiosis wnl_work: can you even ls your glusterfs client mount point?
16:48 wnl_work define "dead"
16:48 wnl_work NO
16:48 wnl_work it hangs
16:48 wnl_work which means my websites are dead
16:48 semiosis thoughtyou said you had a down server
16:48 semiosis no shit
16:48 wnl_work not anymore
16:48 wnl_work it came back up
16:49 wnl_work and when it did, self heal started and things went to shit again
16:49 wnl_work can i stop glusterd on one of the servers?
16:49 semiosis sure you can but that's probably not going to help...
16:50 wnl_work at this point it cant hurt
16:50 JoeJulian Are you dealing with huge files, or are they all pretty small.
16:50 wnl_work mostly small, but theres probably a few large ones
16:50 semiosis large meaning a megabyte or a gigabyte?
16:50 JoeJulian Then this pain shouldn't last long. I would wait it out if it were me.
16:50 elyograg hmm.  working on the UFO servers now.  It wiped out changes to my proxy-server.conf file.  did create a .rpmsave, but it's got the same config -- my config is gone.  A few of the other files got rpmsaved too, but the rpmsave has today's date and time.
16:50 semiosis wnl_work: what size of instances are these? small?
16:51 wnl_work medium
16:51 wnl_work i would never run a prod side on a small
16:51 semiosis and your bricks... ebs?
16:51 wnl_work yes
16:51 semiosis medium may be enough for running normally, but having that extra CPU in a large helps for self-heals, when you have lots of threads doing heavy work
16:52 JoeJulian kkeithley: ^^ elyograg's comment
16:52 wnl_work self-heal shouldnt tank the file system
16:52 wnl_work no matter how little cpu power we have
16:52 semiosis file a bug
16:52 glusterbot https://bugzilla.redhat.com/en​ter_bug.cgi?product=GlusterFS
16:53 kkeithley about his upgrade? yup, saw it
16:53 semiosis sounds like a reasonalbe feature request to me
16:53 wnl_work i was better off with only one brick
16:53 semiosis wnl_work: another option you can set is "cluster.data-self-heal-algorithm full"
16:54 semiosis that reduces CPU usage by not comparing file data, just syncing over the whole file
16:54 tc00per elyograg: do you know if the yum update restarted the glusterd process or is it your old glusterd/glusterfsd still running?
16:54 semiosis wnl_work: maybe reduce the background heal count to 1 as well
16:54 JoeJulian tc00per: It killed all the glusterfsd and restarted glusterd (which restarted all the glusterfsd)
16:55 semiosis wnl_work: make sure peer status says "Peer In Cluster (Connected)" on both peers, not Peer Rejected
16:55 tc00per On my client update while writing the client kept using the 'old' FUSE mount and didn't 'change' anything until I umount/mount'd it again.
16:55 JoeJulian Correct
16:56 tc00per JoeJulian: nice... so, theoretically (and it sounds like... in practice), with a dist-repl system we can update the nodes in turn/sequence without affecting availability of the data.... nice. :)
16:56 wnl_work semiosis: it was
16:56 JoeJulian tc00per: I believe I saw something about Amar working on a way to handle live upgrades of clients, but that'll be a ways off.
16:56 hurdman left #gluster
16:58 wnl_work at this point i think id be better off buiding a whole new volume
16:58 JoeJulian semiosis, wnl_work: Are you sure that reducing the self-heal thread count will help? Won't the clients waiting on self-heals on the files that they're currently trying to open have to wait for their turn at that thread?
16:59 semiosis JoeJulian: i assumed once teh bkg self heal was scheduled the client could continue working with the good copy
16:59 wnl_work okay, i am back to one server because that appears to work
17:00 semiosis JoeJulian: what's the point of having background self healing if that's not how it works?
17:00 wnl_work how can i bring the other server back in to the cluster without the brick for that volume?
17:02 Bullardo joined #gluster
17:02 kkeithley johnmark: what machine is that?
17:02 elyograg bug filed.
17:04 jiffe98 upgrading from 3.3.0 to 3.3.1 can be done by just restarting gluster* services after install?
17:07 elyograg if I decide to go with fedora/btrfs for my production install, I may have to implement it before Fedora 18 gets released.  How much grief would I cause myself by installing Fedora 18 alpha?  Would I be able to get away with just doing 'yum upgrade' after it gets released, or would I want to do an upgrade install?
17:07 manik joined #gluster
17:08 JoeJulian I'd probably do a yum distro-sync after release, but it should theoretically be okay.
17:09 elyograg haven't heard of distro-sync, will have to look into that.
17:10 wushudoin joined #gluster
17:14 glusterbot New news from newglusterbugs: [Bug 865867] gluster-swift upgrade to 3.3.1 wipes out proxy-server config <https://bugzilla.redhat.com/show_bug.cgi?id=865867>
17:17 samppah @latest
17:17 glusterbot samppah: The latest version is available at http://goo.gl/TI8hM and http://goo.gl/8OTin See also @yum repo or @yum3.3 repo or @ppa repo
17:18 andrewbogott joined #gluster
17:19 neofob so each exported brick has two open ports; would it be better in throughput to have many "bricks" on same server consolidated as one dir (via LVM, software/hardware raid) and exported?
17:21 Bullardo joined #gluster
17:22 wnl_work i have a volume that is replicated on two bricks. can i remove one of the bricks?  can i do that with only one peer running?
17:24 blendedbychris joined #gluster
17:24 blendedbychris joined #gluster
17:24 tcooper_brb left #gluster
17:31 kkeithley replicated on two bricks, meaning replica 3, or just replica 2?
17:32 wnl_work replica 2
17:32 wnl_work replicated across two bricks (does that make more sense?)
17:34 manik joined #gluster
17:35 tc00per joined #gluster
17:36 Technicool joined #gluster
17:36 dialt0ne joined #gluster
17:37 neofob JoeJulian: ping!
17:38 JoeJulian neofob: Yes?
17:38 andrewbogott joined #gluster
17:38 Daxxial_ joined #gluster
17:38 dialt0ne hi. is there a simple way to rename a peer? by this answer here, i'd guess not. http://community.gluster.org/q/how-to-​change-ip-address-in-a-created-volume/
17:38 glusterbot Title: Question: how to change ip address in a created volume (at community.gluster.org)
17:39 neofob JoeJulian: i have a question about bricks management
17:39 dialt0ne i'm stuck because the tld for the company is changing, so i'll lose the previous dns name
17:39 neofob so each exported brick has two open ports; would it be better in throughput to have many "bricks" on same server consolidated as one dir (via LVM, software/hardware raid) and exported?
17:40 andrewbogott_afk left #gluster
17:42 JoeJulian neofob: I look at that question from the disaster recovery standpoint, myself.
17:44 semiosis dialt0ne: i think you can just probe it by the new ,,(hostname)
17:44 glusterbot dialt0ne: I do not know about 'hostname', but I do know about these similar topics: 'hostnames'
17:44 semiosis ,,(hostnames)
17:44 glusterbot Hostnames can be used instead of IPs for server (peer) addresses. To update an existing peer's address from IP to hostname, just probe it by name from any other peer. When creating a new pool, probe all other servers by name from the first, then probe the first by name from just one of the others.
17:44 glusterbot New news from newglusterbugs: [Bug 864963] Heal-failed and Split-brain messages are not cleared after resolution of issue <https://bugzilla.redhat.com/show_bug.cgi?id=864963>
17:44 JoeJulian dialt0ne: Not "simple" but not impossible. You /can/ change it with a simple sed command but glusterd/glusterfsd will have to be stopped  and your volumes unmounted when you do it.
17:45 semiosis dialt0ne: *** my answer only applies if that peer doesnt have volumes
17:45 semiosis :)
17:45 semiosis JoeJulian is right
17:45 semiosis as usual
17:45 JoeJulian hehe
17:45 JoeJulian Could you tell my wife that?
17:48 neofob JoeJulian: the reason i ask because i have 3 bricks (3 sata drives) exported on each server; i have 2 servers in replica; so i have 6 daemons running on each server; a lot of cpu
17:48 rwheeler joined #gluster
17:51 dialt0ne yeah, i have a volume
17:52 dialt0ne it's a simple volume just two hosts
17:52 dialt0ne so i took it down to 1 replicate and removed one host. then detached the peer
17:53 dialt0ne and then i was able to re-probe the peer and it found the new hostname
17:53 dialt0ne but when i try to re-add the brick and replica with the new hostname, it says: /www/gfs or a prefix of it is already part of a volume
17:53 glusterbot dialt0ne: To clear that error, follow the instructions at http://joejulian.name/blog/glusterfs-path-or​-a-prefix-of-it-is-already-part-of-a-volume/
17:54 dialt0ne faq'd!
17:58 bulde1 joined #gluster
18:03 JoeJulian neofob: I have a lot more than that. 15 volumes, 4 bricks per server for 60 glusterfsd instances per server. :D Biggest problem for me was ram.
18:04 neofob JoeJulian: ok, it's good to know; i just wonder what the common practice out there
18:04 Venkat joined #gluster
18:05 * JoeJulian doesn't agree with "common practice" as necessarily "best for your given use case."
18:05 tc00per Is there a place (in logs perhaps) where I can pull information to create a 'logical' map of which bricks on the gluster nodes are 'connected' in the repl sets in a dist-repl configuration?
18:06 JoeJulian gluster volume status
18:08 JoeJulian And if you're planning on doing anything with that data, you might like to know about the --xml option.
18:08 johnmark semiosis: had to cancel. will fill you in on details
18:09 Technicool joined #gluster
18:14 bjoernt joined #gluster
18:16 bjoernt Morning guys. I've a little problem, my geo replication stopped due to a network issue and I cant get it to resync again, even if I stop and start the geo replication. The state is always ok, the SSH tunnel is ok. All seems good but it doesn't resync the files back. I guess it has something to do that I didn't had the indexer on in the first place. What can I do ?
18:17 penglish JoeJulian: I don't see an RSS to "give me all the blog" on your site
18:18 penglish </featurerequest>
18:29 tc00per JoeJulian: xml output of gluster volume status looks good. Question... Is the 'connection' information, ie. Brick # in cli output determined ONLY by the port number(s) of the bricks? They clearly match in my small environment right now but I'm wondering what it 'should' look like AFTER to move/add bricks when adding a new node to the dist/repl-2 configuration.
18:31 wushudoin joined #gluster
18:31 dialt0ne so i think all my hostname renaming worked. this look horribly wrong? http://pastebin.com/i8WLtUMH
18:32 glusterbot Please use http://fpaste.org or http://dpaste.org . pb has too many ads. Say @paste in channel for info about paste utils.
18:32 dialt0ne faq'd! again!
18:32 dialt0ne ok, take two http://fpaste.org/1yf1/
18:33 glusterbot Title: Viewing gluster hostname rename by dialt0ne (at fpaste.org)
18:33 andreask joined #gluster
18:35 bennyturns joined #gluster
18:36 Technicool joined #gluster
18:36 JoeJulian penglish: http://joejulian.name/blog/feeds/rss/ (or atom if you prefer)
18:36 glusterbot Title: Joe Julian's BlogHow to expand GlusterFS replicated clusters by one serverSecurityMetrics PCI testing fails again (at joejulian.name)
18:36 penglish JoeJulian: Thanks!
18:38 balunasj joined #gluster
18:48 bjoernt JoeJulian: can you help in my case : I've a little problem, my geo replication stopped due to a network issue and I cant get it to resync again, even if I stop and start the geo replication. The state is always ok, the SSH tunnel is ok. All seems good but it doesn't resync the files back. I guess it has something to do that I didn't had the indexer on in the first place. What can I do ?
18:49 JoeJulian No clue... The "first place" being when it was working before your network issue?
18:50 bjoernt yes I did the first initial sync which is now incomplete. The index wasn't turned on
18:50 bjoernt during the initial sync
18:50 JoeJulian I didn't even know that was possible.
18:51 JoeJulian Let me ,,(rtfm)...
18:51 glusterbot Read the fairly-adequate manual at http://gluster.org/community/doc​umentation//index.php/Main_Page
18:53 bjoernt Well I mean the option geo-replication.indexing: on
18:53 bjoernt was not set. now is there a index folder but it's empty
18:54 JoeJulian indexing should be on by default. If you didn't set it "off" then it was on even if it doesn't show in volume options.
18:54 bjoernt no i didnt set it off
18:56 bjoernt the index folder is .glusterfs/indices ?
18:56 JoeJulian So "gluster volume geo-replication $volume $slave_target" reports OK huh...
18:57 JoeJulian s/target/target status/
18:57 glusterbot What JoeJulian meant to say was: So "gluster volume geo-replication $volume $slave_target status" reports OK huh...
18:57 bjoernt yes
18:57 bjoernt MASTER               SLAVE                                              STATUS
18:57 bjoernt ----------------------------------------​----------------------------------------
18:57 bjoernt content1_hondatech   ssh://ec2-user@gluster1.uw1.aws.internetbra​nds.com::content1_hondatech OK
18:57 bjoernt [2012-10-12 10:01:21.133691] I [syncdutils:142:finalize] <top>: exiting.
18:57 bjoernt [2012-10-12 10:05:12.682849] I [monitor(monitor):21:set_state] Monitor: new state: starting...
18:57 bjoernt [2012-10-12 10:05:12.686101] I [monitor(monitor):80:monitor] Monitor: ------------------------------​------------------------------
18:57 bjoernt [2012-10-12 10:05:12.686346] I [monitor(monitor):81:monitor] Monitor: starting gsyncd worker
18:57 bjoernt [2012-10-12 10:05:12.809170] I [gsyncd:354:main_i] <top>: syncing: gluster://localhost:content1_hondatech -> ssh://ec2-user@gluster1.uw1.aws.internetbra​nds.com::content1_hondatech
18:57 bjoernt [2012-10-12 10:05:23.400630] I [master:284:crawl] GMaster: new master is 9176d98a-8865-48ba-9674-a06c7efd01a6
18:57 bjoernt [2012-10-12 10:05:23.401185] I [master:288:crawl] GMaster: primary master with volume id 9176d98a-8865-48ba-9674-a06c7efd01a6 ...
18:57 bjoernt [2012-10-12 10:06:13.376688] I [monitor(monitor):21:set_state] Monitor: new state: OK
18:57 bjoernt [2012-10-12 10:06:23.78577] I [master:272:crawl] GMaster: completed 56 crawls, 0 turns
18:57 bjoernt [2012-10-12 10:07:23.252258] I [master:272:crawl] GMaster: completed 57 crawls, 0 turns
18:57 bjoernt [2012-10-12 10:08:24.202939] I [master:272:crawl] GMaster: completed 58 crawls, 0 turns
18:57 JoeJulian use fpaste
18:58 bjoernt http://fpaste.org/O1rF/
18:58 glusterbot Title: Viewing Paste #242912 (at fpaste.org)
19:00 bjoernt When the network error happened : http://fpaste.org/biby/
19:00 glusterbot Title: Viewing Paste #242916 (at fpaste.org)
19:00 JoeJulian bjoernt: Is it likely that it could be crawling the entire master volume in 1 minute (exactly)?
19:00 tc00per True? I need a 'new' port for each repl-N brick-mirror regardless of whether each peer actually uses each port.
19:01 bjoernt hmm 1 minute is quite short even if it is on SSDs
19:02 bulde1 joined #gluster
19:02 JoeJulian tc00per: Could you please rephrase the question?
19:05 wN joined #gluster
19:06 JoeJulian bjoernt: I have two avenues of thought. It's rescanning the tree and hasn't come up with anything that's out of sync yet in which case it should resolve by just waiting, or there's some unknown breakage in which case I would stop glusterd, kill gsyncd and any associated glusterfs instances (and I'm only guessing that there is one) then restart glusterd and see if it fixes itself.
19:06 bjoernt ok let me try that.
19:08 vizlabst joined #gluster
19:08 bjoernt left #gluster
19:08 bjoernt joined #gluster
19:09 bjoernt Thats strange : ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-P7Aj1U/gsycnd-ssh-%r@%h:%p ec2-user@xxxx /usr/local/libexec/glusterfs/gsyncd --session-owner 9176d98a-8865-48ba-9674-a06c7efd01a6 -N --listen --timeout 120 gluster://localhost:content1_hondatech
19:09 bjoernt I mean the  /usr/local/libexec/glusterfs/gsyncd in the path
19:09 bjoernt command line
19:10 JoeJulian Yeah, it's probably right. I filed a bug about that a long time ago.
19:11 vizlabst I'm sure someone has asked, but is download.gluster.org down?
19:11 tc00per See... http://fpaste.org/9eTv/ There are three mirrors in this 3 x 2 = 6 dist-repl configuration. Each peer has only two bricks and uses only two ports though three ports (24009-11) are in use. It appears a common port is assigned to each brick mirror pair. With respect to iptables config it looks like I should plan on new port(s) for each addition of peers/bricks in a slowly built out system. Is this essentially true?
19:11 glusterbot Title: Viewing gluster volume status by tc00per (at fpaste.org)
19:12 JoeJulian vizlabst: It is.
19:12 vizlabst Ok, thanks.
19:12 tc00per JoeJulian: See ^^^
19:12 JoeJulian My mirror's down as well since I'm temporarily on a rackspace vm while I replace my colo server.
19:13 JoeJulian tc00per: Yep, though 3.4's going to mix things up a bit wrt port assignment again.
19:13 JoeJulian @ports
19:13 glusterbot JoeJulian: glusterd's management port is 24007/tcp and 24008/tcp if you use rdma. Bricks (glusterfsd) use 24009 & up. (Deleted volumes do not reset this counter.) Additionally it will listen on 38465-38467/tcp for nfs, also 38468 for NLM since 3.3.0. NFS also depends on rpcbind/portmap on port 111.
19:16 tc00per JoeJulian: OK... good to know.
19:16 vizlabst left #gluster
19:16 dialt0ne left #gluster
19:17 tc00per Is there a way to root-squash with FUSE mount?
19:17 JoeJulian no
19:17 tc00per Planned?
19:18 JoeJulian I wonder if we've set a ,,(roadmap) factoid yet...
19:18 glusterbot I do not know about 'roadmap', but I do know about these similar topics: 'rdma'
19:18 JoeJulian @learn roadmap as See http://gluster.org/community/doc​umentation/index.php/Planning34
19:18 glusterbot JoeJulian: The operation succeeded.
19:19 JoeJulian You should also search bugzilla to see if the enhancement request has already been filed, and file a bug if it hasn't.
19:19 glusterbot https://bugzilla.redhat.com/en​ter_bug.cgi?product=GlusterFS
19:20 bjoernt shouldn't I see a mounted directory on the gluster slave over the mount broker ? I only see the brick
19:21 xymox joined #gluster
19:21 JoeJulian No, the slave doesn't mount the directory.
19:21 bjoernt but how it gets access to the gluster volume ?
19:21 JoeJulian It's truly just a fancy rsync target.
19:23 bjoernt but how does rsync get access to a distributed gluster volume ? It can only see the brick on the local machine
19:23 H__ joined #gluster
19:24 chandank|work joined #gluster
19:28 bjoernt JoeJulian: oh wait. everything is fine.  just mounted the volume on the slave and the content looks ok. I just looking into the brick and not the distributed volume. Stupid. Well it's friday
19:31 samppah howdy
19:31 bjoernt BTW: I guess the /usr/local/libexec bug is not so critical because I start gsyncd through commands env in the .ssh/authorized for the sync keys
19:31 samppah is anyone here using glusterfs over nfs to store vm images for rhev or ovirt?
19:31 bjoernt No, not yet.
19:31 bjoernt we probably trying it to DR our netapp
19:32 bjoernt but I'm not a fan, it it works in general. wouldn't expect to get it working
19:32 samppah disaster recovery?
19:32 bjoernt yes
19:32 samppah how come?
19:32 samppah if i may ask :)
19:33 bjoernt but it's probably really slow because network access times adding up. I trying it in the next few days
19:34 samppah yeh, native glusterfs is too slow for our use currently but probably setting up some tests with nfs
19:36 bjoernt ahh you mean the nfs client in gluster. Yeah i tried that but didn't find a performance advantage over NFS for file creating. But with NFS you'll get a FScache which caches attributes (good my PHP stuff here) which speeds up file operations
19:36 JoeJulian nfs isnt' going to be any faster for vm images.
19:37 samppah hmm? really? my initial tests seem bit different but i still have lots of testing to do
19:39 JoeJulian Well, I should have said it so emphatically. I mean that I wouldn't expect it to be faster.
19:40 Technicool joined #gluster
19:41 samppah hehe, ok :)
19:45 glusterbot New news from newglusterbugs: [Bug 865914] glusterfs client mount does not provide root_squash/no_root_squash export options <https://bugzilla.redhat.com/show_bug.cgi?id=865914>
19:46 RoyK left #gluster
19:58 JoeJulian Ah, you're kidding me... the symlink bug didn't make 3.3.1?
20:02 wN joined #gluster
20:02 tc00per Doh... just getting to that test now. :(
20:03 semiosis who uses symlinks anyway?
20:03 * semiosis runs
20:03 JoeJulian Maybe not... thought it may be broken symlinks that have a problem...
20:06 H__ i'm about to add a 2nd server with bricks to a production volume and start a rebalance. Any special things I need to look out for ?
20:06 JoeJulian IRC trolls.
20:06 H__ always on the look out for those ;-)
20:06 tc00per Symlinks... ;)
20:07 H__ there are some symlinks on it. what about those ?
20:07 H__ it's 3.2.5 btw
20:07 tc00per Umm, they might... "break".
20:08 JoeJulian Nah, 3.2 doesn't have the .glusterfs tree
20:08 JoeJulian 3.2 is up to 3.2.7, btw...
20:09 tc00per OK... then add your new peer before you upgrade to 3.3.
20:11 H__ i can't upgrade to 3.3 anytime soon due to the service down requirement
20:12 H__ I'll look into what 3.2.6 and 3.2.7 fix. any geo-replication fixes in there ? we cannot use that because of a bug (forgot which one, sorry)
20:22 johnmark kkeithley: ping
20:27 * nhm trolls
20:33 tc00per JoeJulian: I could not replicate the problem I saw before in 3.3.0 where my symlinks broke with a node/brick add with my dist-repl setup.
20:33 JoeJulian excellent
20:33 tc00per Perhaps my problem was not related to the symlinks bug.
20:34 blendedbychris joined #gluster
20:34 blendedbychris joined #gluster
20:34 tc00per If indeed it wasn't fixed in 3.3.1.
20:34 JoeJulian It does seem to be fixed, though I was seeing similar errors for one of the thunderbird .lock symlinks.
20:35 JoeJulian Those don't link to anything real, so I'll have to re-test that later.
20:45 glusterbot New news from newglusterbugs: [Bug 865926] Gluster/Swift plugin reads /etc/swift/swift.conf and /etc/swift/fs.conf for every Swift REST API call <https://bugzilla.redhat.com/show_bug.cgi?id=865926>
20:46 foo_ joined #gluster
20:50 blendedbychris joined #gluster
20:50 blendedbychris joined #gluster
20:53 johnmark jdarcy: ping
20:55 tc00per Consider 4 x 2 = 8 dist/repl configuration, I want to remove a peer (lets call it p-2) and go to 3 x 2 = 6. Is the sequence 'described/diagrammed' here... http://fpaste.org/B2eX/ essentially correct? If not, what is missing or would you change?
20:55 glusterbot Title: Viewing Remove peer in dist/repl gluster cluster by tc00per (at fpaste.org)
20:57 JoeJulian tcooper|lunch: Not sure I'd bother with the first two reblanaces.
21:08 tryggvil joined #gluster
21:11 elyograg exchange certificate updated.  always coming down to the wire on this stuff.
21:16 JoeJulian When I ls a directory on an xfs partition, I don't think it's supposed to hang indefinitely... (not gluster related)
21:19 H__ indeed. You see that now ?
21:19 JoeJulian Yep, I have a directory that I cannot read the contents. If I try, the process hangs.
21:19 JoeJulian Kind-of not cool.
21:20 noob2 joined #gluster
21:20 noob2 i know some of you have tried ovirt before.  have any of you had fedora 17 lock up with a kernel panic after installing ovirt 3.1?
21:21 chandank|work I am also having strange problem with fedora 17 libvirt.
21:22 noob2 ok so it's not just me
21:22 noob2 i added the nodes, right after they reboot they kernel panic
21:22 noob2 it's great haha
21:22 noob2 thanks chandank, i'm heading out now
21:29 elyograg tcooper|lunch: i think joe is probably right.  if you use the 'remove-brick start' method, that effectively does a rebalance in order to migrate the data off.  by chance is your volume more than half full?
21:30 tc00per JoeJulian: What would happen to the data on the p-1-C and p-3-B bricks if/when I added them back into the cluster as E? Don't I need to move the data off of these bricks to keep copies somewhere (in repl. 2 mode) first?
21:30 badone_home joined #gluster
21:30 * elyograg inform's JoeJulian's wife of his serial correctness.
21:31 elyograg stray apostrophes.  yay.
21:32 JoeJulian tc00per: Like elyograg was saying, the remove-brick moves the data to other bricks. You'll want to make sure there's nothing on them when you re-add them.
21:32 elyograg tc00per: if your volume is more than half full, you'll likely run into bug 862347 ... you just have to repeat the 'remove-brick start' until it runs error free.
21:32 glusterbot Bug https://bugzilla.redhat.com​:443/show_bug.cgi?id=862347 medium, unspecified, ---, sgowda, ASSIGNED , Migration with "remove-brick start" fails if bricks are more than half full
21:34 tc00per OK... thanks guys. I understand the extra rebalance ops are not really needed and that I need to 'clean' my bricks before I re-add them.
21:35 elyograg tc00per: sounds right to me.  note that i'm probably no more an expert than you are, it's just that I've been messing with this exact thing in the last few days on my testbed.
21:37 elyograg off to the store.  waiting for the word that the first grandkid has been born is hungry work.  Now that it's done...
21:38 tc00per elyograg: ...which is exactly what I'm doing as well... test, test, test... then decide where to go next.
21:50 JoeJulian @thp
21:50 glusterbot JoeJulian: There's an issue with khugepaged and it's interaction with userspace filesystems. Try echo never> /sys/kernel/mm/redhat_transparent_hugepage/enabled . See https://bugzilla.redhat.com/​show_bug.cgi?id=GLUSTER-3232 for more information.
21:55 Daxxial_1 joined #gluster
22:01 tryggvil joined #gluster
22:09 tc00per My thought before on 'matching ports' for replicated bricks isn't the case. See this... http://fpaste.org/w4sO/
22:09 glusterbot Title: Viewing gluterfsd ports for bricks by tc00per (at fpaste.org)
22:10 tc00per 2x24009, 3x24010, 2x24011, 1x24012... who's paired with whom?
22:15 H__ About "rebalance step 1: layout fix in progress: fixed layout 382" is there a way to know 'how far it is' ? percentage wise ?
22:15 JoeJulian "gluster volume info" if it's a replica 2 volume, each 2 in sequence.
22:15 JoeJulian H__: Nope
22:15 badone_home joined #gluster
22:15 H__ pity , ok. what does the number mean actually ?
22:15 elyograg on that bug with volumes more than half full, I am wondering if I can perhaps abort the remove-brick before everything fills up, then start it again ... repeating that cycle until the used space on the brick is less than the available space on each of the other bricks.
22:16 elyograg more stuff to test on monday.
22:16 JoeJulian H__: I think it's fixed the layout on that many directories.
22:16 elyograg I *might* log in this weekend and just do it.
22:17 H__ so a directory count on the main server would give it some percentage meaning ?
22:19 JoeJulian Sounds reasonable. Let me know if that works out. I've not tested that theory myself.
22:20 elyograg In case anyone cares, don't build a testbed using Intel's DH67BL motherboard, at least not if you have four drives per server.  On reboot, it sometimes comes up without all four drives present.  When it happens, a power cycle is required to fix it.
22:23 badone_home joined #gluster
22:26 duerF joined #gluster
22:27 JoeJulian I had something similar happen on Monday. Replaced a mirrored drive on an old 3ware controller. At some point the older drive left the mirror. On reboot the controller didn't find the new drive and we booted from the stale one.
22:28 tc00per JoeJulian: Yup... you are correct sir.
22:41 quillo joined #gluster
23:08 Daxxial_ joined #gluster
23:33 badone joined #gluster
23:39 rcattelan joined #gluster
23:50 badone_home joined #gluster
23:51 y4m4 joined #gluster
23:58 badone_home joined #gluster

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary